Readable Lisp S-expressions Wiki

Readable Lisp/S-expressions with infix, functions, and indentation

Brought to you by: dwheeler

Solution

As discussed in the [Problem] page, many developers find standard Lisp notation hard to read. Early Lisp was even harder to read, so abbreviations were added; one example is being able to abbreviate (quote x) as 'x. Our approach is to add additional abbreviations for common cases, so that expressions in any Lisp-based language will be even easier to read. Well-formatted S-expressions should work as-is, but if you use our abbreviations, the result should be easier for humans to understand.

Unlike most past efforts to make Lisp more readable, our approach is generic (the notation does not depend on an underlying semantic) and homoiconic (the underlying data structure is clear from the syntax). We believe these are necessary conditions for a readable Lisp notation. Previous efforts, like McCarthy's M-expressions, failed because they lacked these properties.

Readable abbreviation specification

We have three notation tiers, each of which builds on the previous one. Curly-infix starts with traditional s-expressions and adds support for traditional infix notation; neoteric-expressions add support for traditional function notation; and sweet-expressions reduce the number of explicit parentheses needed (by deducing them from indentation).

You do not need to use all three tiers. In particular, if you don't want to use an indentation-sensitive language, you don't need to.

Here's the complete specification of these abbreviations (where "⇒" means "maps to"):

Curly-infix-expressions (c-expressions): A c-expression is an s-expression or an expression surrounded by curly braces {...} called an infix list.
- A simple infix list {a op b op c op ...} represents one operation in infix order, that is, (op a b c ...). It has (1) an odd number of parameters, (2) at least 3 parameters, and (3) all even parameters are the same symbol (aka eq? or eq). It maps to "(even-parameter odd-parameters)". E.g., {n <= 2} ⇒ (<= n 2), and {7 + 8 + 9} ⇒ (+ 7 8 9).
- The empty infix list {} maps to (), the escaping infix list {e} maps to e, and the unary-op infix list {e1 e2} maps to (e1 e2). Thus, {$} ⇒ $, and {- x} ⇒ (- x).
- Infix lists with "." as the first contained symbol have an unspecified mapping.
- Other infix lists are mixed and map to "($nfx$ parameters)". E.g., {2 + 3 * 4} ⇒ ($nfx$ 2 + 3 * 4)
- By intent, there is no precedence; just use another list. E.g., {2 + {3 * 4}} ⇒ (+ 2 (* 3 4))
- There are two variations of curly-infix notation, "basic" and "full". In "basic" c-expressions, each element in a regular or infix list is another basic c-expression. In "full" c-expressions, each element is a neoteric-expression, as defined below; that means inside {} you can use f(x) to represent (f x).
- Full c-expressions are defined for Scheme in SRFI 105.
Neoteric-expressions (n-expressions): This (normally) includes curly-infix-expressions, and adds special meanings to some prefixed expressions.
- An e(...) maps to (e ...). E.g., f(1 2) ⇒ (f 1 2), exit() ⇒ (exit), and read(. port) ⇒ (read . port).
- An e{} maps to (e), otherwise, e{...} maps to (e {...}). E.g., f{n - 1} ⇒ (f {n - 1}) ⇒ (f (- n 1)), and g{- x} ⇒ (g (- x)).
- An e[...] maps to ($bracket-apply$ e ...)
- In the above, "e" is any datum expression. There must be no whitespace between e and the open paired character.
- An unprefixed "( . e)" must evaluate as "e".
- These recurse left-to-right. E.g., f{n - 1}(x) ⇒ f({n - 1})(x) ⇒ (f (- n 1))(x) ⇒ ((f (- n 1)) x)
- A system that implements neoteric expressions without curly-infix-expressions is a "simple neoteric-expression reader".
Sweet-expressions (t-expressions): Includes neoteric-expressions, and deduce parentheses from indentation. Here are the basics:
- A line with content consists of one or more n-expressions, separated by one or more spaces or tabs.
- If a line is indented more than the previous line, that line is a child line, and the previous line is a parent to that child.
- Later lines with the same indentation as the child are also children of that parent, until there is an intervening line with the parent’s indentation or less.
- A line with only one n-expression, and no child lines, represents itself. Otherwise, the line represents a list; each n-expression on the line is an element of the list, and each of its child lines represents an element of the list (in order).
- An empty line ends the expression; empty lines before expressions are ignored.
- Indentation processing does not occur inside ( ), [ ], and { }, whether they are prefixed or not; they're just neoteric-expressions. This makes sweet-expressions highly backwards-compatible.
Sweet-expression rule clarifications:
- Lines with only a ;-comment are completely ignored - even their indentation (if any) is irrelevant.
- You can indent using one-or-more space, tab, and/or exclamation point (!) characters.
- A line with only spaces and tabs is an empty line. A line with at least one "!" is ignored if it only has indent characters.
- An expression that starts indented enables "indented-compatibility" mode, where indentation is completely ignored (that line switches to neoteric-expressions).
Sweet-expressions also include these advanced capabilities:
- A \\ (aka SPLIT) starts a new line at the current indentation. If it's immediately after indentation (aka GROUP in that case), it represents no symbol at all (at that indentation) - this is useful for lists of lists.
- A $ (aka SUBLIST) in the middle of list restarts list processing; the right-hand-side (including its sub-blocks) is the last parameter of the left-hand side (of just that line). If there's no left-hand-side, the right-hand-side is put in a list. This useful abbreviation from Haskell simplifies many common cases.
- A leading traditional abbreviation (quote, comma, backquote, or comma-at) or datum comment "#;", at the beginning of a sweet-expression line, and followed by space or tab or end-of-line, is that operator applied to the following sweet-expression. Otherwise, it applies to the next neoteric-expression.
- The markers “<*” and “*>” surround a collecting list, and MUST accept a list of 0 or more un-indented sweet-expressions. The “<*” and “*>” represent opening and closing parentheses, but restart indentation processing at the beginning of the line. A collecting list is only terminated by its matching “*>” and not by a blank line. These are valuable for defining libraries and for short let-style constructs.
- The marker “$$$” is reserved for future use.
Sweet-expression examples are shown below. For Scheme, sweet-expressions are defined in SRFI 110.

See [Rationale] for why these rules are the way they are, and [Retort] if you were told that Lisp's s-expression notation can't be improved on.

Beginning an expression with indentation causes that line's indentation to be ignored, improving backwards compatibility. We recommend that editors highlight these lines as warnings, to reduce the risk of their accidental use. It might be also useful for an editor to highlight blank lines (as they separate expressions) and lines beginning at the left column.

Our goal is to devise general abbreviations that others can build on if they choose. Individual implementations may have additional abbreviations that are useful for their semantics, or extensions to the rules listed above. For example, the Scheme and Common Lisp sample implementations of sweet-expressions also implement line continuation (use "\\" at the end of a line, after at least one expression, and the next line will be a continuation if it has at least the same indentation level).

This is version 1.0 of our notation specification.

Quick Examples

Here are some examples of the result, as well as what they map to (which is what you would have had to type before):

Sweet-expression	(Awkward) S-expression
define fibfast(n) if {n < 2} n fibup(n 2 1 0)	(define (fibfast n) (if (< n 2) n (fibup n 2 1 0)))
define fibup(max count n1 n2) if {max = count} {n1 + n2} fibup max {count + 1} {n1 + n2} n1	(define (fibup max count n1 n2) (if (= max count) (+ n1 n2) (fibup max (+ count 1) (+ n1 n2) n1)))
define factorial(n) if {n <= 1} 1 {n * factorial{n - 1}}	(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

Sweet-expression

(Awkward) S-expression

define fibfast(n)
  if {n < 2}
    n
    fibup(n 2 1 0)

(define (fibfast n)
  (if (< n 2)
    n
    (fibup n 2 1 0)))

define fibup(max count n1 n2)
  if {max = count}
    {n1 + n2}
    fibup max {count + 1} {n1 + n2} n1

(define (fibup max count n1 n2)
 (if (= max count)
  (+ n1 n2)
  (fibup max (+ count 1) (+ n1 n2) n1)))

define factorial(n)
  if {n <= 1}
    1
    {n * factorial{n - 1}}

(define (factorial n)
  (if (<= n 1)
    1
    (* n (factorial (- n 1)))))

Notice that infix operations and function calls are much easier to read, and notice how much easier it is to read when there are fewer parentheses. Now you don't need to use indentation tools to keep the code indented correctly; the indentation is part of the code itself.

More Examples

See the [Examples] page for many more examples, including with Lisp macros (they work just fine).

The code distribution comes with "sweeten.sscm"; this is a program that translates traditional S-expressions into sweet-expressions, and is itself written using sweet-expressions.

"Letterfall" by Alan Manuel Gloria is a game written using sweet-expressions; you can see it here: https://github.com/AmkG/letterfall

Advantages

Here are some advantages of these notations:

Generic: The notation does not depend on an underlying semantic. If you create a new meta-language, you can keep using these notations.
Homoiconic: The underlying data structure is clear from the syntax. This is vitally important for debugging.
Backwards-compatible: Normally-formatted Lisp expressions keep working unchanged.
Easy to read: These notations add syntax that is easier to read by most developers.
Rigorous yet easily implemented specification: The entire stack of notations is LL(1), as verified using ANTLR. It's been implemented three times (Java/ANTLR, Scheme, and Common Lisp), showing that the notation is quite implementable and not tied to just one system.
Usable: Real programs have been implemented using these notations.

Credits

David A. Wheeler was the initiator of the "readable" project. He created the initial three tiers, and developed first drafts of the specifications and code.
Alan Manuel K. Gloria made a number of specification suggestions (e.g., adding the SUBLIST "$") and wrote significant pieces of the implementation.
Egil Möller developed SRFI-49, an indentation semantic for Scheme and sample implementation. The sweet-expression indentation semantics and implementation are based on this.

Also, thanks to the many other mailing list participants who provided feedback.

Trying them out

Read the [Install-howto] to learn how to install the software. [Prepackaged] lists some cases where the software is prepackaged for your use.

Then look one of our two tutorials:

[Scheme-tutorial] - Scheme-focused tutorial
[Common-lisp-tutorial] - Common-Lisp-focused tutorial

We think you'll like these notations once you try them.

Wiki: Common-lisp-tutorial
Wiki: Examples
Wiki: Goals
Wiki: Home
Wiki: Hubris
Wiki: Install-howto
Wiki: Join
Wiki: Modifications-0.3
Wiki: Names
Wiki: Prepackaged
Wiki: Problem
Wiki: Rationale
Wiki: Retort
Wiki: Scheme-tutorial