Clojure

This page discusses some of the issues specific to Clojure.

Infix pairing characters in Clojure

Clojure already uses (), [], {}, #{}, #(), and <>. It would be possible to use {}, but this would be backwards-incompatible with existing Clojure code.

Thus the more-likely options for a reader infix notation are #[] or a non-ASCII unicode pair.

However, using non-ASCII characters might be relatively easy for them to support. Clojure normally loads source files in a way that forces them to be interpreted as UTF-8, which makes supporting Unicode much easier. In particular, source file src/jvm/clojure/lang/Compiler.java routine "loadFile" has returns loading from "new InputStreamReader(f, RT.UTF8)". It's possible to do indirect loading where additional magic is necessary to force configuration of the encoding, as discussed here:
https://stackoverflow.com/questions/1431008/enabling-utf-8-encoding-for-clojure-source-files
On non-Windows systems this should be enough. Many Windows tools unfortunately assume that the system is using a non-standard local convention (like Windows-1252). However, users can insert the UTF-8 BOM at the beginning of most such files, forcing them to be correctly interpreted (hex 0xEF 0xBB 0xBF); while Unicode says this should NOT be necessary, it can be used in many cases to deal with Windows' brokenness. Wheeler has confirmed that vim quietly retains the UTF-8 BOM, and presumably most other tools do too.

Second, if so, what Unicode pair would serve best? These pages presume to list pairing characters in Unicode:

There are a lot of pairing characters, but there are issues with many:

  • Many Chinese punctuation chars are full-width, which look odd when combined with the so-called half-width characters in Western fonts.
  • Support for some of the mathematical characters in some fonts seems dicey. That said, it may be easier to get people to fix their fonts.
  • Some characters are hard to distinguish from otehrs. For example, the "left/right angle bracket with dot" pair has such a tiny dot on some fonts that it would be missed.

These look like the best options (if your display can handle them!):

  • «x + 1» : Left/right-pointing double-angle quotation mark, U+AB/U+BB. These are very well-supported (e.g., they are used for French quotations and are in Latin-1), and in many cases are the easiest to enter. There is a risk of them being too similar to the comparison operators < and >, but this doesn't seem too bad. Nested example: fact«n * «n - 1»»
  • ⦃x + 1⦄ : Left/Right white curly bracket, U+2983/U+2984. These are nice-looking, they are similar to {}, and yet easily distinguished from them and other characters. However, most fonts do not support them. Nested example: fact⦃n * ⦃n - 1⦄⦄

Alternatives:

  • ⟪x + 1⟫ : Mathematical left/right double angle bracket, U+27EA/U+27EB. Look good, but may not be universally supported in fonts.
  • ⟦x + 1⟧ : Mathematical left/right square bracket, U+27e6/U+27e7. Look good, but may not be universally supported in fonts.
  • ⦑x + 1⦒ : Left/Right angle bracket with dot. U+2992/U+2993. Not universally supported in fonts. The dot is hard to see, so this is probably a bad choice.
  • 【x + 1】: Left/right black lenticular bracket, U+3010/U+3011. Chinese, so they are "full width" (and thus space odd with western letters).
  • 《x + 1》 : Left/right double angle bracket, U+300a/U+300b. Chinese, so they are "full width" (and thus space odd with western letters).

Unicode Input

Using Unicode characters requires some input method. A description of Unicode input is at: https://en.wikipedia.org/wiki/Unicode_input

The following are for Windows, which in some ways is the hardest.

On Windows many applications have application-specific mechanisms. Windows programs that use the RichEdit control (like WordPad) let you enter the hex digits followed by alt-x. So "a" "b" "alt-x" will create U+AB. Emacs and vim have other mechanisms.

You can also hold down ALT, and while holding it down, type 0 followed by the DECIMAL value and release on the numeric keypad. This depends on the input language. So "Hold-ALT 0 1 7 1 Release-ALT" inserts the left, and "Hold-ALT 0 1 8 7 Release-ALT" inserts the right. This works on laptops without numeric keypaids, just use the Fn key.

The most general solution in Windows is to first set the registry key HKEY_CURRENT_USER\Control Panel\Input Method EnableHexNumpad string type (REG_SZ) to 1. Then reboot. Then you can type "ALT + hex-of-Unicode release-ALT" (only the + on the numeric keypad works). This also works on laptops without a numeric keypad too, just use the Fn key. That really should be the default; it's much faster and simpler.

Other issues

There is a macro implementation of infix for Clojure here: https://github.com/tristan/clojure-infix - like all macro solutions, it doesn't work well when defining macros themselves. The disclaimer is damning: "I have updated the code so it runs on clojure 1.6 using leiningen to start the repl, but as I've not worked on this or even clojure since v1.1, I cannot guarantee the code makes any sense in the clojure 1.6 world, nor provide any support for it."

Unfortunately, the Clojure BDFL Rich Hickey appears to be hostile to the idea of support infix, even though people want it: https://groups.google.com/forum/#!topic/clojure/2rMAejsJYZ4

One approach might be to propose adding just basic curly-infix, and if that's not accepted, don't even bother with higher levels. It's not clear what to do with mixed operations due to its scoping rules; it might be best to error out for now, and get something working.


Related

Wiki: Join

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.