Readable Lisp S-expressions Wiki

Readable Lisp/S-expressions with infix, functions, and indentation

Brought to you by: dwheeler

Precedence

There's a reasonable case to be made for precedence, especially for +, -, *, and /. Even when these operators don't mean add, subtract, multiply, and divide, people using them as infix operators would typically be willing to agree on their precedence. And these operators are certainly widely used.

But the arguments against adding precedence are quite compelling. Any precedence system harms iconicity, and because it's hard to agree on the precedence rules, any precedence system would be difficult to get accepted. Besides, it turns out that precedence is far less important than people thought - requiring people to explicitly group operators turns out to be relatively painless.

Thus, curly-infix has been intentionally defined to not include precedence.

That said, it'd be possible to add precedence support, either to this or a future version of curly-infix, as long as the list of operators is a fixed definition that is independent of a particular Lisp (e.g., focused on "pure" math functions). The key is that, in a Lisp, any precedence rules must be absolutely fixed. Remember, the reader is used for program data, not just for programs, which often mixed together. In many cases they're split into multiple files that are loaded, and reloaded, at relatively arbitrary times relative to each other. If you have to manipulate a table, or declare precedence for operators before reading them, the effort of managing the table becomes too complex (it'd be better to use nfx, which more flexible than a table while requiring fewer built-in mechanisms). Remember, a Lisp program may manage several different languages and meta-levels at once, all mixed together; precedence variation in this circumstance is too confusing. In particular, if you have to declare operator precedence, it becomes unworkable; there's often no way to be sure that the declaration was read and applied before the operator is seen. And it becomes a true nightmare if the precedence level definitions disagree (a scoped operator like nfx can deal with this; a Lisp reader is a poor substitute).

If support for precedence is important for acceptance of curly-infix, we could add precedence. So, if precedence is important for its acceptance, or in some future version precedence is considered important, here's a modified curly-infix rule to provide support for precedence (this was the best that David A. Wheeler could come up with). Perhaps most important, this is proof that accepting curly-infix now does not prevent support for precedence in the future.

In this version, you still have to specially call unary functions, e.g.: (- x), -(x), or -{x}... you can't just say {- - x}, since that looks like you're subtracting "x" from "-". In compensation, this rule is extremely flexible - you can use a large number of different symbols for operators. I think that "you must specially call all unary functions" is an easy rule to remember and apply.

The "precedence" form of the curly-infix would add a new rule: "If a curly-infix list isn't simple, if the list meets the "math" ruleset requirements, use its mapping (see below). Otherwise, return the list with the "nfx" symbol added to the front (as is done now)."

The new "math" ruleset is as follows:

All even-numbered parameters must be operator symbols, there must be an odd number of parameters, and there must be at least five parameters (it must be strictly infix).
The operators supported, in precedence order (high to low), are:
- Subscript/down-arrow: sub {Unicode: ↓, ⇓}
- Exponentiation/Superscript/up-arrow: **, ^, sup {Unicode: ↑, ⇑} (right-associative)
- Multiplication/division: *, /, div..., mod..., quo... {Unicode: ÷, ×}
- Addition/subtraction: +, -
- Bitwise and: bit...and, log...and, &
- Bitwise or/xor: bit...or, log...or, |
- Comparison:
  - Ranged: \< , \<= , >=, > {Unicode: ≥, ≤, ∋, ∉, ⊄, ⊂, ⊆, ⊃, ⊇)
  - Unranged: =, ==, !=, \<>, =/=, eq..., in, is {Unicode: ≠, ≈, ≅}
- Logical conjunction: and {Unicode: ∩, ∧}
- Logical/exclusive disjunction: or, xor, eor {Unicode: ∪, ∨, ⊕}
- Implication/Right Arrows/Doubled arrows: ->, =>, <->, <=>, -->, ==>, <-->, <==> {Unicode: ↔, ⇔, →, ⇒}
- Definition/Assignment/Left arrow: <-, <--, <==, :=, ::= {Unicode: ≡ ←, ⇐} (right-associative)
Other operators are re-compared by removing from its symbol all "-" surrounded by alphanumerics, all alphanumerics, and then all other characters not in an operator above (so "char<=?" is accepted as "<="). If there's no match, it does not meet the "math" ruleset.
When the same operator is repeated, the operator simply gains another operand, so {a + b + c * d} is the "+" operator applied to three operands: a, b, and (* c d).
If different ranged comparison operators are used in sequence, they are joined by "and" as follows: {left compare1 middle compare2 right...} maps to (and (compare1 left middle) (compare2 middle right) ...).
Other different operators of the same precedence are interpreted in left-to-right order unless noted otherwise. Thus, {a + b - c} maps to {{a + b} - c}.
The "Unicode" list applies only to systems that support Unicode characters beyond ASCII.

Examples:

{a + b * c} maps to (+ a (* b c)).
{a <= b < c} maps to (and (<= a b) (< b c))
{q -> r and s} maps to (-> q (and r s))
{x bitand flags = overheated} maps to (= (bitand x flags) overheated)

Rationale:

A key radical difference of this precedence approach, and past approaches, is that this does not try to have one notation support both actual infix operators "a OPERATOR b" and unary/prefix operators. A {...} doesn't switch to a different incompatible language, it just means that the immediate contents are truly infix operators. You can have unary operators and prefix symbols, but instead of using a special notation, you just use normal Lisp notation such as {(- a) * b} or neoteric-expressions such as {-(a) * b}.

This means precedence curly-infix works well with simple curly-infix notation. The result is that the symbol set you can use is maximally flexible - you can use any symbol as an infix operator, whenever you wish. Since Lisps process many languages, with many operators, this is valuable. The trade-off is a more rigid grammar, since only even parameters can be symbols. But this is not really rigid to the human reader, since Lisp s-expressions and neoteric-expressions do a fine job with function calls and unary operations.

Duplicating the capability of Lisp s-expressions or neoteric-expressions, which already support prefix notation well, would be absurd.

Many other Lisp-based systems look at all parameter locations for operators, but such systems are easily confused. E.G., should {- x - x !} be interpreted as (- x x !) or (! (- x x))? Traditional infix systems rely on having a fixed set of operators that are already declared, but it would be unwise for a Lisp to presume that. Thus, simple curly-infix allows any operator to be used as infix. That's a great advantage. But if we then add precedence, if we then did naive left-to-right, an expression that wasn't simple could easily be confused with the wrong expression. Suffix notation isn't handled, but suffix operators are rare, and people generally accept using prefix notation for them. E.G., people accept using factorial(..) for factorials.

Here are many other comments:

I originally started with a short list, but since everything is fixed, it's hard to stop, and if you're going to do it at all, you may as well do it "right".
You can use other infix operators, they just don't have a defined precedence. So for the rest, you have to explicitly group them using {...}, making them simple curly-infix expressions.
These operators were chosen because they are widely used in math (not just Lisps), are common among all Lisps where they exist at all, and their precedence can be relatively easily agreed on regardless of the underlying semantic. In particular, being able to mix addition, subtraction, multiplication, and division is useful.
The distinct precedences of "and" and "or", and their relative precedence, are per C and its many progeny.
Exclusive-or is pretty common among Lisps; we may as well allow it, and make its precedence the same as the related "or".
Being able to string together ranged comparisons like {low < x <= high} is useful, and has a demonstrably positive impact on languages like Python that support them.
I earlier specified left-to-right for everything, to simplify the rules for both humans (so they have fewer rules to learn) and the implementation. But once you go whole-hog, you may as well use the full math conventions, and right-to-left is conventional for exponentiation and assignment/definition.
The unsupported-operator rule provides flexibility without a configurable table. It originally just removed leading characters repeatedly. But this caused the wrong thing to happen in many cases, e.g., "band". This version is relatively simple and it does handle "char>=?", "int", "integer", "long-integer>", and so on nicely. It's not perfect, e.g., it does not handle symbols with "-" just before the operator such as "long-integer->". It does do weird things with symbols like <head> (which becomes \<>), but if you use that as an infix operator, you shouldn't be surprised that it's odd! You could define aliases for operators that don't work well.
Lisps are often used for proofs, so adding implication operators like => and -> is defensible. They also prevent misunderstandings with operators like >=.
The "..." means 0 or more characters match, and allow for many other operators to be used. For example, the "eq..." rule supports the varying equality symbols of Scheme (eq?, eqv?, equal?), Common Lisp, and many others.
The special form for ranged comparisons allows {a <= x < b}. It does complicate the mapping, and it could be argued that it's not worth it. It's limited to ranged comparisons, because assignments and equalities sometimes overlap their symbols, and misinterpreting them would be a real problem. The special mapping rule isn't needed for unranged equalities, since mapping to (= a b c) is expected to do the right thing (and if not, it's probably better to make it explicit). That does mean that {a <= b = c} won't work, but that would be quite rare in practice, and easily handled with {{a <= b} and {b = c}}.
Perhaps ranged and unranged should have different precedences, since their mapping is different.
Most programming languages treat division with the same precedence as multiplication, so I've done that. This is not universal; as pointed out in http://en.wikipedia.org/wiki/Order_of_operations , some treat multiplication with a higher precedence than division (Physical Review journals, Course of Theoretical Physics by Landau and Lifshitz, the Feynman Lectures on Physics, Worlfram Alpha). But treating them with the same precedence is easier to remember and more common.
The "definition/assignment" operators include the character ":", which can be a problem in some Lisps and might not usable. In general, assignment is a problem; different languages use the same symbols for assignment, equality, and implication. In practice, most Lisps emphasize using prefix notation for assignment (like "define", "set!", and "setq"), so we'll presume that ambiguous symbols have other meanings.
The ⊃ is widely used for implication, and for sets. Here mathematicians can't agree. This version chooses the set meaning.
There's no precedence listed here for for-all (forall, ∀) and there-exists (exists, ∃). Actually, they aren't even infix operators (they're prefix form), so they don't really belong in a strictly-infix processor. Their syntax varies among math texts, also, so mathematicians would be unsurprised to find special syntactic rules for them. I believe it'd be better to view them syntactically as prefix operators, just like functions. This means this would be a legitimate sweet-expression: "∀ x {x in integer and x > 2} {f(x) > g(x)}".
There is no "?", so Scheme operators with "?" in them are re-checked correctly without the "?".
The bitwise "and" and "or" are critically important in some operations. You can just surround them with {...} and they work just fine as simple curly-infix expressions, but some tasks involve many bit operations and comparisons, making precedent useful. Sadly, there's no cross-Lisp standard symbol for "bitwise and" and "bitwise or", but bit... is used in Scheme and some others. Common Lisp, ACL2, and emacs lisp use log..., though that is arguably wrong; these are bitwise operations, not "logical" operations on simple true and false. Still, due to common use, the log... versions are included. It'd be nice to have a punctuation-only symbol for them, but & and | are often notational problems in a Lisp. You want their precedence to be tighter than comparison, so that {x bitwise-and flags = z} would do the right thing; that would be different than C, but C programs that use bitwise operators with comparisons usually have to override the precedence anyway, as their precedence is admittedly a mistake. See "The Development of the C Language" by Dennis M. Ritchie.
The "sub" is debatable. But many math notations have some way of indicating subscripts, and math systems for describing math all include a way to do them. The "sub" seems to be especially common. "sup" was added to be consistent. Someone somewhere might use "sub" for "subtract", but normally people use "-" for that.
There are many other infix operators that are used in the world. But that doesn't mean they can't be used; it just means they have "no precedence" and have to be explicitly grouped. Requiring that the rest be specially grouped causes no real hardship.

If you want another operator beyond those supported in math, then use another {...} or define nfx. I think it's important to not add operators specific to one particular Lisp implementation, or there will be no end to tweaking it. Symbol recomparison handles many cases; symbol aliases can make them easier to take advantage of. Simple infix lists cover a lot of ground, anyway.

A math reader doesn't lose generality, as long as we stick primarily to math. It does harm homoiconicity, since it inserts new lists.

The good news is that, if math (or something like it) were added later, all existing code written using simple curly-infix expressions will continue to work as-is. We expect nearly all curly-infix expressions, when there is no precedence, to be simple, and when it's not, most "nfx" processors would generate the same order in actual use. As a result, this new "precedence" form of curly-infix is something that could be added later, as it is unlikely to impact code that used the simpler rule. Right now, the real problem is getting any infix implemented at all, so pushing to get the simpler semantic accepted seems more likely to succeed.

Of course, all of this could be partly implemented in some "nfx", though that would mean that the lists as-read would not directly receive this structure.

So if there is pushback on having no precedence support at all, this is an alternative. However, even a simple precedence mechanism is far more complicated than the simple curly-infix approach we've proposed, and then you have the argument about simple versus complete (and what that means).

Thus, I propose to continue with the much simpler rule we have, and then switch to a precedence system if we get too much pushback from the simpler rule. Or alternatively, make it clear that perhaps someday there could be a more sophisticated curly-infix system, but that it's more important to get basic functionality today.

Comments welcome.

Comments by almkglor

My first reaction was "\<expletive deleted>".

My second reaction was "\<expletive deleted>".

My third reaction was "This \<expletive deleted> is too complex."

My final reaction is "NO".

My final reaction is "NO"

Comments by dwheeler

Oh, I'm not recommending this precedence system. Indeed, this is a great example of why precedence systems haven't been widely accepted in Lisps; they get ugly as soon as you look into the details:

If you have an operator registration systems, where you register the precedence of operations, then you end up with bugs if the definition isn't read first. Even worse, you have trouble combining code when the operators have different precedence.
But if you don't have a registration system, that means you have to have pre-set fixed rules for operators. Which at least avoids those problems, and is consistent with what most other languages do (most languages have fixed precedence systems). At that point, it's hard to argue that it should be a short list, and you end up with complex systems like this.

I think the important point from this page is that this is proof that accepting curly-infix now does not prevent support for precedence in the future. It also explains why we did NOT propose it :-).