Sweet-expressions (t-expressions)


David A. Wheeler

Alan Manuel K. Gloria


This is a draft Scheme Request for Implementation (SRFI) for SRFI ???. To see an explanation of each status that a SRFI can hold, see here.

To provide input on this SRFI, please mail to <srfi minus ??? at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list. This SRFI contains all the required sections, including an abstract, rationale, specification, design rationale, and reference implementation.

Related SRFIs

SRFI-49 (Indentation-sensitive syntax), SRFI-105 (Curly-infix-expressions)


Many software developers find Lisp s-expression notation inconvenient and unpleasant to read, in part because of the many parentheses required in typical use. Even those who like Lisp s-expression syntax typically use indentation to show structure, and not just parentheses, because structure is not obvious in raw un-indented s-expressions. This SRFI defines an indentation-sensitive syntax, inspired by Python; indentation-sensitive syntax eliminates the need for many parentheses and the need to synchronize indentation with parentheses. The previous SRFI-49 defines an indentation-sensitive syntax, but in practice it is somewhat awkward, and by itself it lacks support for infix notation (e.g., {a + b}) and prefix formats (e.g., f(x)).

This SRFI defines sweet-expressions (aka t-expressions), an indentation-sensitive syntax developed from experience in implementing real Scheme programs. It is derived from SRFI-49 and includes (by reference) SRFI-105. Unlike most past efforts to make Lisp more “readable”, the sweet-expression notation is generic (the notation does not depend on an underlying semantic), homoiconic (the underlying data structure is clear from the syntax), and backwards-compatible with well-formatted s-expressions.

For example, the following sweet-expression:

define factorial(n)
  if {n <= 1}
    {n * factorial{n - 1}}

Maps to the following s-expression:

(define (factorial n)
  (if (<= n 1)
    (* n (factorial (- n 1)))))

A sweet-expression reader would accept either form. Note that the sweet-expression notation was developed by the “Readable Lisp S-expressions Project”.


The large number of parentheses required by Lisp syntax is the butt of many jokes in the software development community and has been widely criticized. The Jargon File says that Lisp is “mythically from ‘Lots of Irritating Superfluous Parentheses’”. Linus Torvalds commented about some parentheses-rich C code, “don’t ask me about the extraneous parenthesis. I bet some LISP programmer felt alone and decided to make it a bit more homey.” Larry Wall, the creator of Perl, says that, “Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in. (Other than that, it’s quite a nice language.)”. Shriram Krishnamurthi says, “Racket [(a Scheme implementation)] has an excellent language design, a great implementation, a superb programming environment, and terrific tools. Mainstream adoption will, however, always be curtailed by the syntax. Racket could benefit from [reducing] the layers of parenthetical adipose that [needlessly] engird it.”

Since Lisp programs and data are often written using indentation anyway, it seems reasonable to create an indentation-sensitive syntax that would eliminate the need for many of these “superfluous” parentheses, as well as eliminating the need to keep the indentation and parentheses synchronized.

This is not a new observation. Lisp advocate Paul Graham says, regarding Lisp syntax, “A more serious problem [in Lisp] is the diffuseness of prefix notation... We can get rid of (or make optional) a lot of parentheses by making indentation significant. That’s how programmers read code anyway: when indentation says one thing and delimiters say another, we go by the indentation. Treating indentation as significant would eliminate this common source of bugs as well as making programs shorter. Sometimes infix syntax is easier to read. This is especially true for math expressions. I’ve used Lisp my whole programming life and I still don’t find prefix math expressions natural... I don’t think we should be religiously opposed to introducing syntax into Lisp, as long as it translates in a well-understood way into underlying s-expressions. There is already a good deal of syntax in Lisp. It’s not necessarily bad to introduce more, as long as no one is forced to use it.”

The indentation-sensitive syntax defined in this SRFI is generic (the notation does not depend on an underlying semantic) and homoiconic (the underlying data structure is clear from the syntax). We believe previous efforts to improve the readability of Lisp s-expressions, such as McCarthy’s M-expressions, failed because they failed to be generic or homoiconic. It often difficult to easily access new capabilities (such as those defined by macros) in notations that are not generic or homoiconic. In contrast, because sweet-expressions are generic and homoiconic, they can be easily used with other constructs such as quasiquoting and macros. In short, if a capability can be accessed using s-expressions, then they can be accessed without any additional effort using sweet-expressions. The indentation processing is simply an abbreviation, in much the same way that 'x is an abbreviation for (quote x).

SRFI-49 defines an indentation-sensitive syntax. Unfortunately, it has not been widely deployed among Scheme implementations. We believe this is, at least in part, because of various problems and limitations in it, described in more detail below. Nevertheless, SRFI-49 represented an important first step in devising an indentation-sensitive syntax; sweet-expressions are derived from SRFI-49 and retain most of its syntax.

The notation defined in this sweet-expressions SRFI is intentionally designed so that, unlike Python, it works exactly the same way in both interactive mode (e.g., a REPL) and when processing a file. A difference in modes could cause failure when cutting-and-pasting between a file and an interactive session. Since many users often switch between the REPL and files, such a difference was considered unacceptable.

This notation is simple and straightforward. It basically defines a few additional abbreviations for s-expressions based on real-world experience.

See the design rationale for a detailed discussion on how and why it is designed this way.


The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Sweet-expressions” (aka “t-expressions”) deduce parentheses from indentation. A sweet-expression reader MUST interpret its input as follows when indentation processing is active:

  1. An indented line is a parameter of its parent.
  2. Later terms on a line are parameters of the first term.
  3. A line with exactly one term, and no child lines, is simply that term; multiple terms are wrapped into a list.
  4. An empty line ends the expression; empty lines before expressions are ignored.
  5. Terms are neoteric-expressions as defined in SRFI-105. Thus {a + b} maps to (+ a b), f(...) maps to (f ...), and f{...} with non-empty content maps to (f {...}).
  6. When reading begins, indentation processing is active, but indentation processing is disabled inside ( ), [ ], and { }, whether they are prefixed or not (inside they’re a sequence of whitespace-separated neoteric-expressions).

Sweet-expression rule clarifications:

  1. You can indent using one or more of the indent characters, which are space, tab, and exclamation point (!).
  2. An unescaped “;” not in a string (still) introduces comments that end at the end of the line or file.
  3. Lines with only a ;-comment (preceded by 0 or more indent characters) are completely ignored - even their indentation (if any) is irrelevant.
  4. A line with only indentation is an empty line.
  5. An expression that starts indented enables “indented-compatibility” mode, where indentation is completely ignored. Instead, a sequence of white-space separated neoteric-expressions is read until the first end of line or end of file.
  6. Scheme’s #; datum comment comments out the next neoteric expression, not the next sweet expression.
  7. Block comments (#|...|#) are removed, but if it begins immediately after the indent (if any), the indentation at the beginning of the block comment is used.

The sweet-expression advanced features are defined as follows:

  1. The marker \\ is specially interpreted. If any terms precede it on the line, it is called SPLIT, and it MUST be interpreted as if it started a new line at the current indentation. If no terms precede \\ on the line, it is called GROUP, and it represents no symbol at all at that indentation (GROUP is useful for lists of lists).
  2. The marker $ (aka SUBLIST) MUST restart list processing. If $ is preceded by any terms on the line, the right-hand-side (including its sub-blocks) is the last parameter of the left-hand side (of just that line). If there’s no left-hand-side, the right-hand-side is put in a list.
  3. A leading traditional abbreviation (quote, comma, backquote, or comma-at), followed by space or tab, MUST be interpreted as that operator applied to the entire sweet-expression that follows.
  4. The markers “<*”, “*>”, and “$$$” are reserved for future use.

??? TODO: Should we also spec the markers <* and *> ?

The markers for the advanced sweet-expression features MUST only be accepted as such when indentation processing is active, and character sequence MUST NOT be considered one of those markers if it does not begin with exactly the marker’s first character. For example, {$} MUST NOT be interpreted as the SUBLIST marker; instead, it MUST be interpreted as the symbol $.

A sweet-expression reader is a datum reader that can correctly read and map sweet-expressions as defined above (including the advanced sweet-expression features). An implementation of this SRFI MUST accept the marker #!sweet followed by a whitespace character in its standard datum readers (e.g., read and, if applicable, the default implementation REPL). This marker (including the trailing whitespace character) MUST be consumed and considered whitespace. After reading this marker, the reader MUST accept sweet-expressions in subsequent datums read from the same port, overriding any conflicting marker (such as #!curly-infix followed by whitespace) until some other conflicting marker is given.

Implementations of this SRFI MAY implement sweet-expressions in their datum readers by default, even when the marker is not (yet) received. Portable applications SHOULD include the #!sweet marker before using sweet-expressions, typically near the top of a file. Portable applications SHOULD NOT use this marker as the very first characters of a file (e.g., it could be preceded by a newline), because they might be misinterpreted on some platforms as an executable script header.

Implementations MAY provide the procedures sweet-read as a sweet-expression reader and/or neoteric-read as a neoteric-expression reader. If provided, these procedures SHOULD support an optional port parameter.

Implementations SHOULD enable a sweet-expression reader when reading a file whose name ends in “.sscm” (Sweet Scheme). Application authors SHOULD use the filename extension “.sscm” when writing portable Scheme programs using sweet-expressions.

A program editor MAY usefully highlight blank lines (as they separate expressions) and lines beginning at the left column (as these start new expressions). We RECOMMEND that program editors highlight expressions whose first line is indented, to reduce the risk of their accidental use.

A tool that reads sweet-expressions and writes out s-expressions SHOULD specially treat lines that begin with a semicolon when they are not currently reading an expression (e.g., no expression has been read, or the last expression read has been completed with a blank line). Such a tool SHOULD (when outside an expression) copy exactly any line beginning with semicolon followed by a whitespace, semicolon, or end-of-file. Such a tool SHOULD (when outside an expression) also copy lines beginning with “;#” or “;!” without the leading semicolon, and copy lines beginning with “;_” without either of those first two characters. Application authors SHOULD follow a semicolon in the first column with a whitespace character or semicolon if they mean for it to be a comment.

Note that, by definition, this SRFI modifies lexical syntax.

Backus-Naur Form (BNF)

The following BNF rules define sweet-expressions. It is intended to capture the specification above; in case of (unintentional) conflict, the specification text above governs. Semicolons introduce a comment which ends at the end of the line. Each rule is a production optionally followed by an expression.

Productions are defined in the form production (non-terminal) name, “::=”, and a sequence of terms that define the production. A term may be followed by * (0 or more times), + (1 or more times), or ? (0 or 1 times). The “|” symbol separates alternative branches; parentheses group. The construct (^ ...) matches any character except the character(s) listed (this construct does not match EOF). Productions may continue on a following line by indenting them beyond the end of their matching “::=”. A preventative production is defined using “!::=” instead of “::=”; a preventative production defines a term sequence that must not match a production of that name, even if it would have otherwise (see below). The notation avoids using < and >, since these characters are awkward to use properly in HTML. If more than one production matches, precedence is as follows:

  1. Longer productions and production branches (as measured by matching terms) take precedence over shorter ones. For example, given “demo-eol ::= CR LF | CR”, the “CR LF” will match if possible. The BNF below lists longer productions first to simplify making a recursive descent implementation have the same code order as the BNF.
  2. Terminals take precedence over non-terminals.
  3. If a non-terminal can be selected both directly and indirectly (because an alternative non-terminal begins with the same terminal), the direct reference to the non-terminal takes precedence. For example, given “demo-i ::= abb | demo-n” and “demo-n ::= abb | other”, when starting with demo-i an “abb” will match inside “demo-i” and not in “demo-n-expr”.

Terms may be non-terminals (these names are in lowercase) or terminals (uppercase). Unless otherwise noted, terminals are the name of one character in uppercase; e.g., SHARP means "#", BAR means "|", and PERIOD means “.”. The name EOF stands for end-of-file.

There are several special indentation-control terminals which act as guards, that is, they return true/false values. Note that these indentation-control terminals are not tokens in the traditional sense, because they are not consumed. Use of these terminals in a production implies that an implementation of the production must hold the starting indent when the production begins (so that they can be computed). These special indentation-control terminals are:

  1. INDENT: True if the current line indent string (current) is longer than the indent when this production started string (started) and the two strings are equal up to length(started).
  2. SAME: True if current is equal to started.
  3. DEDENT: True if current is shorter than started and the two strings are equal up to length(current). Technically we don’t need to test for this, but testing for it can detect some BNF or implementation defects.
  4. BADDENT: True if current is not equal to started up to min(length(current) length(started)). In other words, the current indent is not INDENT, SAME, nor DEDENT. This terminal is used to detect indent errors.

The BNF productions below are intentionally written so that they can be easily implemented using a recursive descent parser that corresponds to the given rules. In particular, the rules are given so that it would be easy to implement a parser that does not consume characters unless necessary and to not require multi-character unread-char (this makes it easy to reuse an underlying read procedure). However, no particular implementation approach is required. Unlike the SRFI-49 BNF, this BNF makes comment and whitespace processing explicit, to make comment and whitespace processing requirements clear.

Each production may be followed by an expression that computes the value produced by the production. The expression begins on the first line indented two spaces, and ends on a blank line or a line with a character in the first column. In an expression the symbols $1...$n are to be replaced by the 1:st ... n:th expression value returned by the corresponding production term. The symbol $last is the value of the last value returned by a production term that matched (this is only used when there is more than one term). The values of some productions are never used; these do not have corresponding expressions.

First, here are some utility procedures for use in rules:

(define (map-abbreviation c)
  (case c
    (#\'  'quote)
    (#\`  'quasiquote)
    (#\,  'unquote)              ; , not followed by @
    (#\@  'unquote-splicing)))   ; This represents ,@

Here are the actual BNF rules:

; ??? This is a VERY EARLY DRAFT of the BNF.  It's probably wrong,
; but the hope is that this is a start for a good BNF.
; This is different from previous approaches: INDENT, SAME, etc., are
; *guards* instead of *tokens* that are consumed.

; - Complete definitions for GROUP, SPLIT, #|...|#, etc.
; - Handle FORMFEED | VTAB | (IBM's) NEL
; - Handle EOF in weird places
; - Generate errors, e.g., illegal indents, initial "!"
; - Handle #! as discussed in SRFI-105 (curly-infix), including SRFI-22
; - Handle #!sweet, #!curly-infix, #!fold-case, #!no-fold-case, etc.
; - Define n-expr, etc. This will be done later.
;   Alan Manuel K. Gloria has done a lot of this, we'll bring that in later.
;   Check that work for improper lists, etc.

ichar   ::= SPACE | TAB | BANG ; indent char
hspace  ::= SPACE | TAB                     ; horizontal space
eolchar ::= CR | LF
eol     ::= CR | LF | CR LF | LF CR

not-eolchar ::= (^ eolchar) ; does not include EOF

whitespace  ::= hspace | eolchar
terminate-n-expr ::= whitespace | SEMICOLON | EOF

; "ichars" matches indentation characters.  This is fundementally
; ambiguous with INDENT, SAME, DEDENT, and BADDENT, but the latter are
; special terminals so they have higher priority.
; If you read this in, these characters will become the "latest" line indent.
ichars ::= ichar*

; The "$last" here matches the last MATCH, so ,@ will return the @ char
; that will later be used to map ,@ correctly.
; A ,@ will be  used where possible because longest rule matches.
abbrev ::= ' | ` | , | , @

; Block comments and commented datums (aka "special comments") are mostly
; ignored like other comments, but if they begin a line (after any indent)
; they define the indent.  Handle them carefully here.

; #| ... |# block comments.  These nest.  Stop on EOF instead of hanging.
block-comment ::= SHARP BAR
                  ( (^ BAR | SHARP) | block-comment |
                    SHARP+ ((^ BAR | SHARP) | EOF) |
                    BAR+   ((^ BAR | SHARP) | EOF) )*
                  (BAR+ SHARP | EOF)

commented-datum ::= SHARP SEMICOLON n-expr

special-comment  ::= commented-datum | block-comment
special-comments ::= special-comment hspace* special-comments?

; Line comment, not including the ending eol or EOF
lcomment ::= SEMICOLON not-eolchar*

; Note: This will attempt to read in an indent, resetting "latest" indent!
; This doesn't process special-comment, since we may need to process that
; specially to preserve its indent; head and i-expr handle special-comment.
comment-lines ::= ichars lcomment eol comment-lines?

; eol-comment-lines happen after some other constructs, so only hspace starts
; It may, but need not, include a line comment (;) or special comment.
; Note that this consumes special comments that don't start a line after indent
; Note: This may attempt to read in an indent, resetting "latest" indent!
; FIRST CHARACTER: hspace ; eol-chars EOF #
; The # is due to #|...|# or #;datum.
eol-comment-lines ::= hspace* special-comments? lcomment? eol? EOF
eol-comment-lines ::= hspace* special-comments? lcomment? eol comment-lines?

; The body handles the sequence of 1+ child lines.
; Note that i-expr will consume any line comments (line comments after
; content, as well as lines that just contain indents and comments).
; Note also that i-expr may set the the latest indent to a different value
; than the indent used on entry to body; the latest indent is compared by
; the special terminals DEDENT, SAME, and BADDENT.

body ::= i-expr SAME body ; Another body at the same indent level
  (cons $1 $last)

body-tail ::= i-expr BADDENT ; bad indent in line after a first body.
  (read-error $2 "Incorrect indentation")

body ::= i-expr DEDENT ; Done with body.
  (list $1)
  ; We're done!  We don't actually have to check DEDENT:
  ; - We couldn't match SAME, because that's matched above.
  ; - We couldn't match BADDENT, because that's matched above.
  ; - We couldn't match INDENT, because i-expr consumes those.
  ; But putting DEDENT here simplifies reasoning.  We recommend actually
  ; checking DEDENT in an implementation, because doing so could detect
  ; some implementation defects in the t-expression reader.

; The "head" is the production for 1+ n-expressions on one line.
; It never reads beyond the current line, so it doesn't need to keep track
; of indentation, and callers can depend on this.
head ::= n-expr hspace+ head
  (cons $1 $3)
head ::= n-expr hspace+ PERIOD hspace+ n-expr ; improper list
  (cons $1 $last)
  ; TODO???: Detect PERIOD hspace+ n-expr hspace+ n-expr
  ; (an improper list with junk after it)
head ::= n-expr hspace* ; Singleton.
  ; We handle hspace* here, because we have to read any hspace
  ; to see if there are more n-expressions on the line anyway.
  (list $1)

; A "head" is forbidden from including certain special markers.
; Preventative production definitions are not a normal feature of BNF.
; However, by specifying it this way, the BNF is much, much simpler;
; we can declare these here and then use markers in the i-expr production.
; You can implement this, if "head" is a procedure, by having the procedure
; return two values, an item representing the "special marker"
; ('none if none) and a list of any head items that were read until the
; special marker (this is the empty list if the special marker was first).
; Note that these only match to the LITERAL characters; {$} doesn't match!
head !::= DOLLAR  terminate-n-expr
head !::= BACKSLASH BACKSLASH terminate-n-expr
head !::= SHARP BAR ; #| ... |# at beginning of line (after indent) is special
head !::= SHARP SEMICOLON ; same here.
head !::= SHARP BANG

; "i-expr" (indented sweet expressions expressions)
; is the main production for sweet-expressions in the usual case.
; This can be implemented with one-character-lookahead by also
; passing in the "current" indent ("" to start), and having it return
; the "new current indent".  The same applies to body.
; If the line after a "head" has the same or smaller indentation,
; that will end this i-expr (because it won't match INDENT),
; returning to a higher-level production.

; Note that eol-comment-lines may read in an indent, which INDENT compares.
i-expr ::= head eol-comment-lines INDENT body ; child lines
  (append $1 $last)

; Error detection for incorrect indentation right after a head.
i-expr ::= head eol-comment-lines BADDENT ; bad indent after head.
  (read-error $3 "Incorrect indentation")

i-expr ::= head eol-comment-lines ; head with no child lines (SAME or DEDENT)
 (if (null? (cdr $1)) ; Check if singleton but handle improper lists
   (car $1) ; Single item, don't return a list

; The following overrides head processing of abbreviations
; (because it has higher precedence).  Therefore:
; ' a b c
; maps to '(a b c), and not to ('a b c).
i-expr ::= abbrev hspace+ i-expr
  (list (map-abbreviation $1) $last)
i-expr ::= abbrev hspace* lcomment? eol INDENT i-expr
  (list (map-abbreviation $1) $last)
i-expr ::= abbrev hspace* lcomment? EOF ; Weird case, do something plausible
  (list (map-abbreviation $1))
; NB: because of the hspace requirement, implementing these abbreviations
; means we must take over some of n-expr's work; if after finding QUOTE et al.
; we DON'T match the below (e.g., by finding an hspace),
; recurse into i-expr and add to the current head by
; applying the abbreviation on the item (if not a pair) or to the
; first list item (if a pair).
; Be sure we can repair: 'a  and  '(a b)  and  ''a  properly.

; An initial "\\", nothing on the line, and stuff indented: A list of lists
i-expr ::= BACKSLASH BACKSLASH eol-comment-lines INDENT body
i-expr ::= special-comments eol-comment-lines INDENT body

; An initial "\\" with nothing after it; a plausible separator.
i-expr ::= BACKSLASH BACKSLASH eol-comment-lines SAME i-expr
; An initial "\\" with nothing after it; a plausible separator.
i-expr ::= special-comments eol-comment-lines SAME i-expr

i-expr ::= BACKSLASH BACKSLASH eol-comment-lines (BADDENT | DEDENT)
  (read-error $3 "Incorrect indentation for \\")
i-expr ::= special-comments eol-comment-lines (BADDENT | DEDENT) i-expr
  (read-error $3 "Confusing special comment indentation")
  ; ??? Too restrictive?  Is there a better approach?

; Initial "\\" or special comment in front of another head - ignore it.
i-expr ::= BACKSLASH BACKSLASH hspace+ i-expr
i-expr ::= special-comments i-expr

; Form "a b \\ ;stuff \n ..."
; TODO: This is ambiguous with the later rule.
i-expr ::= head BACKSLASH BACKSLASH hspac* eol-comment-lines
  ; ??? TODO: Is this correct?!?
  (if (null? (cdr $1))
    (car $1)

; Something of form "a b \\ c d ..."
i-expr ::= head BACKSLASH BACKSLASH hspace+
  (if (null? (cdr $1))
    (car $1)

; Productions for the top production, a sweet-expression (t-expr):

; The following is the "usual case", drilling down to indentation processing.
t-expr ::= i-expr

; On EOF we just return that.
t-expr ::= EOF

; Specially handle blank lines and comments, which may be indented, before
; any content of a t-expression.  These need to be skipped.
; A t-expression will END with a blank line,
; but we must CONSUME and IGNORE blank lines *before* t-expression begins:
t-expr ::= hspace* lcomment? eol t-expr ; Recurse and try again here.
t-expr ::= hspace* lcomment? EOF

; Implement "initial hspace at top-level DISABLES indentation processing".
; If multiple n-expressions are on a line, separated by hspace, then
; this production will fire again on the next invocation.
; Alan Manuel K. Gloria figured out this simple (and clever) BNF construction.
t-expr ::= hspace+ n-expr

; Detect error condition - Although "!" is an indent character, we aren't
; supposed to indent the first line of a t-expression unless we're disabling
; indentation processing on that line, and only SPACE or TAB can do that.
; So a "!" is an error.  Let's report this easily-detected problem.
t-expr ::= hspace* BANG
  (read-error $2 "Cannot initially indent with !")

; The top production is t-expr (sweet-expression).


Here are some examples and their mappings. Note that a sweet-expression reader would accept either form in all cases, since a sweet-expression reader is for the most part a traditional s-expression reader with support for some additional abbreviations.

Sweet-expressions (t-expressions) s-expressions
define fibfast(n)  ; Typical function notation
  if {n < 2}       ; Indentation, infix {...}
    n              ; Single expr = no new list
    fibup(n 2 1 0) ; Simple function calls
(define (fibfast n)
  (if (< n 2)
    (fibup n 2 1 0)))
define fibup(max count n-1 n-2)
  if {max = count}
    {n-1 + n-2}
    fibup max {count + 1} {n-1 + n-2} n-1
(define (fibup max count n-1 n-2)
  (if (= max count)
    (+ n-1 n-2)
    (fibup max (+ count 1) (+ n-1 n-2) n-1)))
define factorial(n)
  if {n <= 1}
    {n * factorial{n - 1}}
(define (factorial n)
  (if (<= n 1)
    (* n (factorial (- n 1)))))
g -(cos(0)) factorial(7)
(g (- (cos 0)) (factorial 7))
aaa bbb
      ; Comment indent ignored
  cc dd
(aaa bbb
  (cc dd))
f ; Demo improper lists
  a . b
  (a . b))
' a b ; Demo abbreviations
  'c d e \\ f g h
'(a b
    ('c d e) (f g h))
ff ; Comments
  #| qq |# t1 t2
  t3 t4
    t5 #| xyz |# t6
    t7 #;t8(q) t9
  (t1 t2)
  (t3 t4
    (t5 t6)
    (t7 t9)))
; This BEGINS with an indent
  f(a) g(x)
(f a)
(g x)
define init(win area)
    $ style $ get-style win
    set! back-pen $ black style
    set! fore-pen $ white style
        config $ make-c area
        expose $ make-e area
      set! now expose
      dostuff config expose
(define (init win area)
    ((style (get-style win)))
    (set! back-pen (black style))
    (set! fore-pen (white style))
        (config (make-c area))
        (expose (make-e area)))
      (set! now expose)
      (dostuff config expose))))
define represent-as-infix?(x)
    pair? x
    is-infix-operator? car(x)
    list? x
    {length(x) <= 6}
(define (represent-as-infix? x)
    (pair? x)
    (is-infix-operator? (car x))
    (list? x)
    (<= (length x) 6)))

Design Rationale

This SRFI design rationale is unusually long, especially when you compare it to the simplicity of its specification. We have separated the design rationale from the overall rationale, as was previously done by SRFI-26, because it is easier to understand the design rationale after reading the specification.

General and homoiconic formats

There have been a huge number of past efforts to create readable formats for Lisp-based languages, going all the way back to the original M-expression syntax that Lisp’s creator expected to be used when programming. Generally, they’ve been unsuccessful, or they end up creating a completely different language that lacks the advantages of Lisp-based languages.

After examining a huge number of them, David A. Wheeler noticed a pattern: Past “readable” Lisp notations typically failed to be general or homoiconic:

See http://www.dwheeler.com/readable/readable-s-expressions.html for a longer discussion on past efforts. In any case, now that this pattern has been identified, new notations can be devised that are general and homoiconic - avoiding the problems of past efforts.

Sweet-expressions were specifically designed to be general and homoiconic, and thus have the possibility of succeeding where past efforts have failed.

Is it impossible to improve on s-expression notation?

Some Lisp developers act as if Lisp notation descended from the gods, and thus is impossible to improve. The authors do not agree, and instead believe that Lisp notation can be improved beyond the notation created in the 1950s. The following is a summary of a retort to those who believe Lisp notation cannot be improved, based on the claims in the Common Lisp FAQ and “The Evolution of Lisp” by Guy Steele and Richard Gabriel. Below are quotes from those who argue against improvement of s-expression notation, and our replies.

The Common Lisp FAQ says that people “wonder why Lisp can’t use a more ‘normal’ syntax. It’s not because Lispers have never thought of the idea - indeed, Lisp was originally intended to have a syntax much like FORTRAN...”.

This is an argument for our position, not for theirs. In other words, even Lisp’s creator (John McCarthy) understood that directly using s-expressions for Lisp programs was undesirable. No one argues that John McCarthy did not understand Lisp. This is strong evidence that traditional s-expression notation has problems; even Lisp’s creator thought its notation was poor.

“The Evolution of Lisp” by Guy Steele and Richard Gabriel (HOPL2 edition) says that, “The idea of introducing Algol-like syntax into Lisp keeps popping up and has seldom failed to create enormous controversy between those who find the universal use of S-expressions a technical advantage (and don’t mind the admitted relative clumsiness of S-expressions for numerical expressions) and those who are certain that algebraic syntax is more concise, more convenient, or even more natural...”.

Note that even these authors, who are advocates for s-expression notation, admit that for numerical expressions they are clumsy. We should also note that sweet-expressions do not try to create an “Algol-like” syntax; sweet-expressions are entirely general and not tied to a particular semantic at all.

That paper continues, “We conjecture that Algol-style syntax has not really caught on in the Lisp community as a whole for two reasons. First, there are not enough special symbols to go around. When your domain of discourse is limited to numbers or characters, there are only so many operations of interest, and it is not difficult to assign one special character to each and be done with it. But Lisp has a much richer domain of discourse, and a Lisp programmer often approaches an application as yet another exercise in language design; the style typically involves designing new data structures and new functions to operate on them - perhaps dozens or hundreds” and it’s just too hard to invent that many distinct symbols (though the APL community certainly has tried). Ultimately one must always fall back on a general function-call notation; it’s just that Lisp programmers don’t wait until they fail.”

But this is a failing argument. Practically all languages allow compound symbols made from multiple characters, such as >=; there is no shortage of symbols. Also, nearly all programming languages have a function-call notation, but only Lisp-based languages choose s-expressions to notate it, so saying “we need function call notation” do not excuse s-expressions. You do not need legions of special syntactic constructs; sweet-expressions allow developers to express anything that can be expressed with s-expressions, without being tied to a particular semantic or requiring a massive set of special symbols.

Then it said, “Second, and perhaps more important, Algol-style syntax makes programs look less like the data structures used to represent them. In a culture where the ability to manipulate representations of programs is a central paradigm, a notation that distances the appearance of a program from the appearance of its representation as data is not likely to be warmly received (and this was, and is, one of the principal objections to the inclusion of loop in Common Lisp).”

Here Steele and Gabriel are extremely insightful. Today we would say that s-expressions are “homoiconic”, and that is a rare trait among programming notations. This property, homoiconicity, is an important reason that Lisps are still used decades after their development. Steele and Gabriel are absolutely right; there have been many efforts to create readable Lisp formats, and they all failed because they did not create formats that accurately represented the programs as data structures. A key and distinguishing advantage of a Lisp-like language is that you can treat code as data, and data as code. Any notation that makes this difficult means that you lose many of Lisp’s unique advantages. Homoiconicity is critical if you’re going to treat a program as data. To do so, you must be able to easily “see” the program’s format. If you can, you can do amazing manipulations.

But what Gabriel and Steele failed to appreciate in their paper is that it’s possible to have both. Now that we understand why past efforts failed, we can devise notations that keep these key properties (generality and homoiconicity) - and succeed!

Many people have noted that there are tools to help deal with s-expressions, but this misses the point. If the notation is so bad that you need tools to deal with it, it would be better to fix the notation. The resulting notation could be easier to read, and you could focus your tools on solving problems that were not self-inflicted. In particular, “stopping to see the parentheses” is a sign of a serious problem - the placement of parentheses fundamentally affects interpretation, and serious bugs can hide there.

Others who have used Lisp for years, such as Paul Graham, see s-expressions as long-winded, and advocate for the use of “abbreviations” that can map down to an underlying s-expression notation. Sweet-expressions take this approach.

Why should indentation be syntactically relevant?

Making indentation syntactically meaningful eliminates many parentheses, eliminating the need for humans to keep track of them. Real Lisp programs are already indented anyway; currently tools (like editors and pretty-printers) are used to try to keep the indentation (used by humans) and parentheses (used by the computers) in sync. By making the indentation (which humans depend on) actually used by the computer as well, they are automatically kept in sync.

On Lisp’s Readability and Parenthesis Stacking shows one of the many examples of endless closing parentheses and brackets to close an expression, and the confusion that happens when indentation does not match the parentheses. bhurt’s response to that article is telling: “I’m always somewhat amazed by the claim that the parens ‘just disappear’, as if this is a good thing. Bugs live in the difference between the code in your head and the code on the screen - and having the parens in the wrong place causes bugs. And autoindenting isn’t the answer - I don’t want the indenting to follow the parens, I want the parens to follow the indenting. The indenting I can see, and can see is correct.”

An IDE can help keep the indentation consistent with the parentheses, but needing IDEs to use a language is considered by some a language smell. If you need special tools to work around problems with the notation, then the notation itself is a problem.

A solution, of course, is to make the indentation actually matter: Now you don’t need an endless march of parentheses, and indentation can’t be confusing because it is actually used.

“In praise of mandatory indentation...” notes that it can be helpful to have mandatory indentation:

It hurts me to say that something so shallow as requiring a few extra spaces can have a bigger effect than, say, Hindley-Milner type inference. - Chris Okasaki

Other languages, including Python, Haskell, Occam, and Icon, use indentation to indicate structure, so this is a proven idea. Other recently-developed languages like Cobra (a variant of Python with strong compile-time typechecking) have decided to use indentation too, so clearly indentation-sensitive languages are considered useful by many.

One problem with intentation as syntactically relevant is that some transports drop leading space and tab characters. As discussed in the indentation characters section, we have solved this as well.

There’s a lot of past work on indentation to represent s-expressions. Examples include:

What is the relationship between sweet-expressions and SRFI-49 (I-expressions)?

The sweet-expression indentation system is based on Scheme SRFI-49 (“surfi-49”), aka I-expressions. The basic rules of SRFI-49 (I-expression) indentation are kept in sweet-expressions; these are:

These basic rules seem fairly intuitive and do not take long to learn. We’re grateful to the SRFI-49 author for his work, and at first, we just used SRFI-49 directly.

However, SRFI-49 turned out to have problems in practice when we tried to use it seriously. For example, in SRFI-49, leading blank lines could produce the empty list () instead of being ignored, limiting the use of blank lines and leading to easy-to-create errors. As specified, a SRFI-49 expression would never complete until after the next expressions’s first line was entered, making interactive use extremely unpleasant. Lines with just spaces and tabs would be considered different from blank lines, creating another opportunity for difficult-to-find errors. The symbol group is given a special meaning, which is inconsistent with the rest of Lisp (where only punctuation has special syntactic meanings). The mechanism for escaping the group symbol was confusing. There were also a number of defects in both its specification and implementation.

Thus, based on experience and experimentation we made several changes to it. First, we fixed the problems listed above. We also addressed supporting other capabilities, namely, infix notation and allowing formats like f(x). We also found that certain constructs were somewhat ugly if indentation is required, so we added SUBLIST and SPLIT capabilities.

The very existence of SRFI-49 shows that others believe there is value in using syntactically-significant indentation. We are building on the experience of others to create what we hope is a useful and refined notation.

Why are sweet-expressions separate from curly-infix and neoteric-expressions as defined in SRFI-105?

Some Scheme users and implementers may not want indentation-sensitive syntax, or may not want to accept any change that could change the interpretation of a legal (though poorly-formatted) s-expression. For those users and implementers, SRFI-105 adds infix support and neoteric-expressions such as f(x), but only within curly braces {...}, which are not defined by the Scheme specification anyway. SRFI-105 makes it easier to describe the “leaves” of an s-expression tree.

In contrast, sweet-expressions extend SRFI-105 by making it easier to describe the larger structure of an s-expression. It does this by treating indentation (which is usually present anyway) as syntactically relevant. Sweet-expressions also allow neoteric-expressions outside any curly braces. By making sweet-expressions a separate tier, people can adopt curly-infix if they don’t want indentation to have a syntactic meaning or want to ensure that f(x) is interpreted as two separate datums (f and (x)).

Blank lines

In sweet-expressions, a blank line always terminates a datum, once an expression has started; if (another) expression has not started, blank lines are ignored. That means that in a REPL, once you’ve entered a complete expression, “Enter Enter” will always end it. The “blank lines at the beginning are ignored” rule eliminates a usability problem with the original SRFI-49 (I-expression) spec, in which two sequential blank lines before an expression surprisingly returned (). This was a serious usability problem. The sample implementation did end expressions on a blank line - the problem was that the spec didn’t clearly capture this.

Allowing a blank line to end an expression represents a trade-off between REPL use and use in a file. In a file, a top-level expression could be determined simply by noting that the next expression began on the left column. But this would be hideous to use in a REPL, because it would mean that the results of an expression would only be evaluated after the first (and possibly only) line of the next expression was entered. (Early Pascal I/O implementations had similar problems.) One solution is to have a special text marker that means “done” (e.g., “.” on a line by itself), but this makes interactive use much less pleasant, since users then have to repeatedly type the special “end-of-expression” marker. Sweet-expressions do allow quick execution of one-line commands by typing an indent character first, but users will often not know exactly how long an expression will be until it is done, so this does not help enough. In contrast, pressing Enter twice is quite easy (since the user’s finger is already on Enter to press it the first time). Thus, the blank line rule is intentionally chosen to help interactive users, at a mild cost to non-interactive users (who then cannot use blank lines without ending the expression).

It would be possible to have blank lines end an expression only in interactive use. In particular, Python does this, since it has different rules for interactive use and files. However, this means that you couldn’t cut-and-paste files into the REPL interpreter and use them directly. David A. Wheeler believes it’s important to have exactly the same syntax in both cases in a Lisp-based system, because in Lisp-based systems, switching between the REPL and files is extremely common. This would also cause confusion, since information would be interpreted differently depending on some mode switch. By making “Enter Enter” always end an expression, this inconsistency is avoided.

Of course, people sometimes want to have something like a blank line in the middle of an s-expression. The solution is that comment-only lines are completely ignored and not even considered blank lines. That means you can use comment-only lines for the purpose of separating sections in a single datum. The indentation of comment-only lines is ignored; that way, you don’t have to worry about keeping them indented the same way. We’ve found that in practice this works very well.

Since a line with only indentation may look exactly identical to a blank line, we decided to clearly state that “a line with only indentation is an empty line”. This eliminates some nasty usability problems that could arise if a “blank” line actually had some whitespace in it; a silent error like this could be hard to debug.

Indentation characters (! as indent)

Some like to use spaces to indent; others like tabs. Python allows either, and SRFI-49 allows either as well - you just have to be consistent. Sweet-expressions continues this tradition, and is defined so that people can use what they like. The only rule is that they must be consistent; if a line is indented with eight spaces, the next line cannot be indented with a tab.

One objection that people raise about horizontal whitespace characters is that they can get lost in many transports (HTML readers, etc.). In addition, sometimes there are indented groups that you’d like to highlight; traditional whitespace indentation provides no opportunity to highlight indented groups specially. When discussing syntax, users on the readable-discuss mailing list started to use characters (initially period+space) to show where indentation occurred so that they wouldn’t get lost or to highlight them. Eventually, the idea was hit upon that perhaps sweet-expressions needed to support a non-whitespace character for indentation. This is highly unorthodox, but at a stroke it eliminates the complaints some have about syntactically-important indentation (because it is lost by some transports), and it also provides an easy way to highlight particular indented groups.

At first, we tried to use period, or period+space, as the indent, as this was vaguely similar to its use in some tables of contents. But period has too many other traditional meanings in Lisp-like languages, including beginning a number (.9), beginning a symbol (.xyz), and as a special operator to set the cdr of a list. Implementation of period as an indent character is much easier if there is a way to perform two-character lookahead (e.g., with an unread-char function), but this is not standard in Scheme R5RS. Eventually the “!” was selected instead; it practically never begins a line, and if you need it, {!...} will work. The exclamation point is much easier to implement as an indent character, and it is also a great character for highlighting indented groups.

Disabling indentation processing with paired characters

Indentation processing is disabled inside (...), [ ... ], and { ... }. This was also true of SRFI-49, and of Python, and has wonderful side-effects:

This means that infix processing by curly-infix disables indentation processing; in practice this doesn’t seem to be a problem.

Disabling indentation processing with an initial indent

Initial indentation also disables indentation processing, which also improves backward compatibility and makes it easy to disable indentation processing where convenient.

This improves backward compatibility because a program that uses odd formatting with a different meaning for sweet-expressions is more likely to have initial indents. Even if this is not true, it’s trivially easy to add an initial indent on oddly-formatted old files. This provides a trivial escape, making it easy to support old files. Then even if you have ancient code with odd formatting, it would be likely to still “just work” if there is any initial indentation. We’d like this reader to be a drop-in replacement for read(), so minimizing incompatibilities is important.

There is a risk that this indentation will be accidental (e.g., a user might enter a blank line in the middle of a routine and then start the next line indented). However, this is less likely to happen interactively (users can typically see something happened immediately), and editors can easily detect and show where surprising indentation is occurring (e.g., through highlighting), so this risk appears to be minimal.

Disabling on initial indent also deals with a subtle potential problem in implementation. In a reader implementation, if we tried to just accept some indentation of the first line and use it as the starting point, we create problems. Typically readers return a whole value once that value has been determined, and in many cases it’s tricky to store state (such as that new indentation value) for an arbitrary port. By disabling indentation processing, we eliminate the need to store such state, as well as giving users a useful tool.

Since this latter point isn’t obvious, here’s a little more detailed explanation. Obviously, to make indentation syntactically meaningful, you need to know where an expression indents, and where it ends. If you read in a line, and it has the same indentation level, that should end the previous expression. If its indentation is less, it should close out all the lines with deeper or equal indentation. But we’re trying to minimize the changes to the underlying language, and in particular, we don’t want to change the “read” interface and we’re not assuming arbitrary amounts of unread-char. Scheme R5RS, for example, doesn’t have a standard unread-char at all. So let’s say you are trying to read the following:

! ! foo
! ! ! bar
! ! eggs
! ! cheese

You might expect this to return three datums: (foo bar), eggs, and cheese. It won’t, in a typical implementation; here’s why:

Some solutions:

So for all the reasons above, initial indent disables indentation processing for that line.

Grouping and splicing (\\)

SFRI-49 had a mechanism for defining lists of lists, using the symbol “group”. This was a valuable contribution, since there needs to be some way to show lists of lists.

But after use, it was determined that having an alphabetic symbol being used to indicate a special abbreviation was a mistake. All other syntactically-special abbreviations in Lisp are written using punctuation; having one that was not was confusing. This symbol is still called the GROUP symbol, and happens at the start of a line (after indentation)... it is just now respelled as \\.

For example, this GROUP symbol makes it easy to handle multiple variables in a let expression:

    variable1 my(value1)
    variable2 my(value2)
  do-stuff1 variable1
  do-stuff2 variable1 variable2

A different problem is that sometimes you’d like to have a set of parameters, where they are at the “same level” but writing them as indented parameters takes up too much vertical space. An obvious example is keywords in various Lisps; having to write this is painful:


David A. Wheeler created an early splicing proposal. After much discussion, to solve the latter problem, the SPLIT symbol was created, so that you could do:

  keyword1: \\ parameter1
  keyword2: \\ parameter2

At first the symbol \ was used for SPLIT, but this would cause serious problem on Lisps that supported slashification. After long discussion, the symbol \\ was decided on for both; although the number of characters in the underlying symbol could vary (depending on whether or not slashification was used), this was irrelevant and seemed to work everywhere. By using the same symbol for both GROUP and SPLIT, we reduced the number of different symbols that users needed to escape.

We dropped the SRFI-49 method for escaping the symbol by repeating it (group group); the {} escape mechanism is more regular, and makes it far more obvious that some special escape is going on.

Why does initial \\ mean nothing if there are datums afterwards on the same line?

Since “let” occurs in many programs, it would have been possible to define \\ to allow this:

! \\ var1 $ bar x
! !  var2 $ quux x
! nitz var1 var2

We discussed this, but after long discussion we decided on a defined semantic that means that “\\” is an empty symbol, making that expression exactly the same as:

! var1 $ bar x
! !  var2 $ quux x
! nitz var1 var2

We did this intentionally. It turns out that there are situations where you want a \\ as an empty symbol, even when text follows it on the line. An example is arc’s if-then-else, where there are logically pairs of items, but from a list semantic are at the same level. E.G.:

! condition1()
! \\ action1()
! condition2()
! \\ action2()
! \\ otherwise-action()

It’s easy to handle let* with an extra line, but there’s no easy way to insert a short pseudo-comment character in the front unless we do it this way.

The multi-line nature of let* turns out to be not a real problem, for 2 reasons:

  1. It turns out that in many “let*s” the variable settings can be put on one line. As of 2012-08-02 the “sweeten.sscm” has 305 non-blank, non-comment lines (as determined by grep -v '^$' sweeten.sscm | grep -v '^ *;' | wc -l). Of those, 13 lines use let or let*, and only one of those “lets” uses \\. It’s not worth optimizing a case that only happens approximately once in 300 lines.
  2. Using the abbreviations as intended is REALLY clear, even though it uses an extra vertical line.

So the savings for let aren’t significant, the semantics as designed are clear, and are intentionally using that notation for another purpose where it’s not as easy to use an alternative.

Traditional abbreviations

As with SRFI-49, a leading traditional abbreviation (quote, comma, backquote, or comma-at), followed by space or tab, is that operator applied to the sweet-expression starting at the same line. This makes it easy to abbreviations to complex indented structures. For example, a complex indented structure can be quoted simply by prefixing a single quote and space.

Sublist ($)

Alan Manuel Gloria noted that certain constructs were common and annoying to express, e.g., first(second(third(fourth))), and based on Haskell experience, suggested being able to write them as first $ second $ third(fourth). Again, the idea is that this is an abbreviation for a common-enough practice.

This is another example (like GROUP/SPLIT) of a construct that, when you need it, is incredibly useful. It’s not all that unusual to have a few processing or cleanup functions that take a single argument, and for all the “real work” to be nested in something else. This would require several levels of indentation without sublist, but they are easily handled with sublist.

An example is scsh, which has functions like “run” that are applied to another list. With sublist, this is easily expressed. For example, here’s a sweet-expression using scsh:

  run $ grep -i "xx.*zz" <(oldfile) >(newfile)

After discussion, sublist was accepted in July 2012.

Reserved markers

The markers <* and *> are reserved for future use. These may, at some point, have the following syntax as described in an email posted 2012-09-02:

<* ... *> are like ( ... ) in that they surround a list, but they do NOT turn off indentation processing. A “<*” resets the “indent” level to a 0-length string, skips all horizontal whitespace, and then restarts reading an expression with indentation processing still live. A blank line inside <* ... *> ends an expression WITHIN the <* ... *>, but stays within <* ... *>; only eof or active *> can close the <*. Once all that mapping is done, the range of <* ... *> is replaced with the mapped result.

These markers may be very useful in the future, e.g., for let expressions. However, we have not included them at this time, because these markers can complicate processing. Other markers can be read and handled with one-character lookahead, and cause a simple return in some cases. However, processing the closing *> requires potentially a return up multiple levels, which requires a different implementation approach than is otherwise required. It is also unclear that this is really necessary, and we are hesitant to add yet more syntax beyond the $ and \\ markers.

The marker $$$ is also reserved for potential future use. No particular semantics for this symbol are agreed on. It could be used in the future to implement ENLIST, an older proposed semantic where if it occurs at the beginning of a line, it immediately inserts a list level, regardless of whether it is by itself or with other elements on the line. Another speculative semantic (that has not yet been significantly discussed) might be to implement the following kind of mapping:

OriginalMaps to
let $$$ var1 initial(value1)
(let ((var1 (initial value1)))
  $$$ var1 initial(value1)
      var2 initial(value2)
  ((var1 (initial value1))
   (var2 (initial value2)))

These mappings are not required in sweet-expression implementations, but it seems prudent to have a few symbols available for future expansion. Thus, the symbols <*, *>, and $$$ must be escaped (e.g., using {...}) if they are used in an indentation-processing context.

Special semicolon values

As described in the specification, a tool that reads sweet-expressions and writes out s-expressions SHOULD specially treat certain lines that begin with semicolons.

The semicolon + whitespace rule is given so some comments - particularly the ones about major new components - are likely to be included in a translation from sweet-expressions to s-expressions (namely, any comments that precede an expression). This can greatly simplify examining the generated s-expression. the rules about “;#”, “;!”, and “;_” make it easier to write shell scripts and similar constructs with embedded sweet-expressions; these lines can invoke some Scheme interpreter, possibly via a shell.

This text is limited to only apply to lines outside of any sweet-expression. This is intentional, because this makes implementation easy on top of an existing existing sweet-expression reader. The top-level tool can simply see if a line begins with semicolon, and if it does, handle it specially; if a line starts with any other character, it can call the sweet-expression reader to handle it. There is no requirement to copy block comments, or comments inside a sweet-expression datum, because this would be much more complicated to do; handling block comments is non-trivial functionality that a sweet-expression reader must perform, and there is no standard way to return comments inside a datum. Semicolon comments immediately after a datum need not be copied or processed specially, because a sweet-expression reader has to consume them to see if it’s reached the end of the datum. A Scheme implementation with unlimited unread could do more with relative ease, but since many Scheme implementations do not have unlimited unread, these limitations make implementation of such tools much simpler.

These rules are based on the unsweeten tool.

Comparison to Q2

An interesting experimental notation, “Q2”, was developed by Per Bothner; see http://per.bothner.com/blog/2010/Q2-extensible-syntax/.

Q2 has somewhat similar goals to the “readable” project, though with a different approach. The big difference is that David A. Wheeler decided it was important to have a generic notation for any s-expression. Here is a brief additional comparison:

Comparison to P4P

P4P: A Syntax Proposal by Shriram Krishnamurthi describes an alternative, more readable format for the Racket implementation of Scheme. There are some similarities, but many differences.

P4P supports functional name-prefixing such as f(x), just as sweet-expressions do. However, function parameters are separated by commas (an extra character not typical in Lisp code, and in our experiments something of a pain since parameters are very common). P4P does not support infix notation at all, even though practically all non-Lisp languages support them.

P4P has a very different view of indentation, compared to sweet-expressions. In P4P, indentation does not control semantics. Instead, “the semantics controls indentation: that is, each construct has indentation rules, and the parser enforces them. However, changing the indentation of a term either leaves the program’s meaning unchanged or results in a syntax error; it cannot change the meaning of the program.”

This means that P4P has a large number of special-case syntactic constructs. For example, defvar: and deffun: specially use “=”, if: has intermediate keywords, and so on. While this looks nice when you stay within its set, it encounters the same problem that McCarthy had with M-expressions: There are always new constructs, including ones in meta-languages (not the underlying Scheme implementation) and macros. The P4P author notes that, “it would be easy to add new constructs such as provide:, test:, defconst: (to distinguish from defvar:), and so on”, but this misses the point; the task of defining constructs inhibits the use of those constructs, and may be impractical if there are syntactic differences at different language levels. For example, imagine processing lists where “deffun” has a different definition than the underlying language; this is trivial with s-expressions and sweet-expressions, but not practical using P4P.

The P4P author notes that, “the parser can be run in a mode where indentation-checking is simply turned off... This can be beneficial when dealing with program-generated code.” However, now the developer must deal with enabling various modes, and this mode is needed not just for program-generated code, but for code that has mixtures of various languages. Rather than having multiple modes, a single mode that works everywhere seems more useful to the developers of the sweet-expression notation.

In short, P4P fails to be generic; it is tied to specific semantics. Previous readability efforts, such as M-expressions, failed, and we believe that one reason was that those notations failed to be generic. We applaud the admirable goals of P4P, but do not think it represents the best way forward.

However, while we believe different design choices need to be made, we applaud the effort. In addition, we believe that P4P is additional evidence that people are interested in improving the readability of Lisp, and that indentation can help do so.

Writing out results

An obvious question is, “how do you write them out?” After all, with these notations there is more than one way to present expressions.

But no Lisp guarantees that what it writes out is the same sequence of characters that was written. For example, (quote x) when read might be written back as 'x, while on others, reading 'y might be printed as (quote y). Similarly, if you enter (a . (b . ())), many Lisps will write that back as “(a b)”. Nothing has fundamentally changed; as always, you should implement your Lisp expression writer so that it presents a format convenient to both human and machine readers.

Backwards compatibility

Backwards compatibility with traditional Lisp notation is helpful. A reader that can also read traditional s-expressions, formatted conventionally, is much easier to switch to.

The sweet-expression notation is fully backwards-compatible with well-formatted Lisp s-expressions. Thus, a user can enable sweet-expressions and continue to read and process traditionally-formatted s-expressions as well. If an s-expression is so badly formatted that it would be interpreted differently, that s-expression could first be sent through a traditional s-expression pretty-printer and have the problem resolved.

The changes that can cause a difference in interpretation are due to the active use of neoteric-expressions outside of {...}, unlike SRFI-105, and because of the indentation processing.

Neoteric-expressions are compatible for what I’d call “normal” formatting. The key issue is that neoteric-expressions change the meaning of an opening parenthesis, bracket, or brace after a character other than whitespace or another opening character. For example, a(b) becomes the single expressions “(a b)” in sweet-expressions, not the two expressions “a” followed later by “(b)”. There are millions of lines of Lisp code that would never see the difference. So if you wrote “a(b)” expecting it to be “a (b)”, you will need to insert the space before the opening parenthesis. We believe such s-expressions are poorly (and misleadingly) formatted in the first place; you should write “a (b)” if you intend for these to be two separate datums.

Sweet-expressions add indentation processing, but since indentation is disabled inside (...), and initial indentation also disables indentation processing, ordinary Lisp expressions immediately disable indentation processing and typically don’t cause issues. In rare circumstances they can be interpreted differently:

Past experiences

At least two programs have been written using sweet-expressions:

The SRFI authors believe that the existence of these programs - written by two different people for different application areas - shows that sweet-expressions are mature enough to be standardized.

The Readable Lisp S-expressions Project developed these notations and implementations of them. In particular, the project distributes the programs unsweeten (which takes sweet-expressions and transforms them into s-expressions) and sweeten (which takes s-expressions and transforms them into sweet-expressions), as well as other related tools.

Reference implementation

??? TODO

The implementation below is portable, with the exception that Scheme provides no standard mechanism to override the built-in reader. An implementation that complies with this SRFI must at least activate this behavior when they read the #!sweet marker followed by whitespace.

This reference implementation is SRFI type 2: “A mostly-portable solution that uses some kind of hooks provided in some Scheme interpreter/compiler. In this case, a detailed specification of the hooks must be included so that the SRFI is self-contained.”

; kernel.scm
; Implementation of the sweet-expressions project by readable mailinglist.
; Copyright (C) 2005-2012 by David A. Wheeler, Alan Manuel K. Gloria,
;                         and Egil Möller.
; This software is released as open source software under the "MIT" license:
; Permission is hereby granted, free of charge, to any person obtaining a
; copy of this software and associated documentation files (the "Software"),
; to deal in the Software without restriction, including without limitation
; the rights to use, copy, modify, merge, publish, distribute, sublicense,
; and/or sell copies of the Software, and to permit persons to whom the
; Software is furnished to do so, subject to the following conditions:
; The above copyright notice and this permission notice shall be included
; in all copies or substantial portions of the Software.

; This file includes code from SRFI-49 (by Egil Möller),
; but significantly modified.

; -----------------------------------------------------------------------------
; Compatibility Layer
; -----------------------------------------------------------------------------
; The compatibility layer is composed of:
;   (readable-kernel-module-contents (exports ...) body ...)
;   - a macro that should package the given body as a module, or whatever your
;     scheme calls it (chicken eggs?), preferably with one of the following
;     names, in order of preference, depending on your Scheme's package naming
;     conventions/support
;       (readable kernel)
;       readable/kernel
;       readable-kernel
;       sweetimpl
;   - The first element after the module-contents name is a list of exported
;     procedures.  This module shall never export a macro or syntax, not even
;     in the future.
;   - If your Scheme requires module contents to be defined inside a top-level
;     module declaration (unlike Guile where module contents are declared as
;     top-level entities after the module declaration) then the other
;     procedures below should be defined inside the module context in order
;     to reduce user namespace pollution.
;   (my-peek-char port)
;   (my-read-char port)
;   - Performs I/O on a "port" object.
;   - The algorithm assumes that port objects have the following abilities:
;     * The port automatically keeps track of source location
;       information.  On R5RS there is no source location
;       information that can be attached to objects, so as a
;       fallback you can just ignore source location, which
;       will make debugging using sweet-expressions more
;       difficult.
;   - "port" or fake port objects are created by the make-read procedure
;     below.
;   (make-read procedure)
;   - The given procedure accepts exactly 1 argument, a "fake port" that can
;     be passed to my-peek-char et al.
;   - make-read creates a new procedure that supports your Scheme's reader
;     interface.  Usually, this means making a new procedure that accepts
;     either 0 or 1 parameters, defaulting to (current-input-port).
;   - If your Scheme doesn't support unlimited lookahead, you should make
;     the fake port that supports 2-char lookahead at this point.
;   - If your Scheme doesn't keep track of source location information
;     automatically with the ports, you may again need to wrap it here.
;   - If your Scheme needs a particularly magical incantation to attach
;     source information to objects, then you might need to use a weak-key
;     table in the attach-sourceinfo procedure below and then use that
;     weak-key table to perform the magical incantation.
;   (invoke-read read port)
;   - Accepts a read procedure, which is a (most likely built-in) procedure
;     that requires a *real* port, not a fake one.
;   - Should unwrap the fake port to a real port, then invoke the given
;     read procedure on the actual real port.
;   (get-sourceinfo port)
;   - Given a fake port, constructs some object (which the algorithm treats
;     as opaque) to represent the source information at the point that the
;     port is currently in.
;   (attach-sourceinfo pos obj)
;   - Attaches the source information pos, as constructed by get-sourceinfo,
;     to the given obj.
;   - obj can be any valid Scheme object.  If your Scheme can only track
;     source location for a subset of Scheme object types, then this procedure
;     should handle it gracefully.
;   - Returns an object with the source information attached - this can be
;     the same object, or a different object that should look-and-feel the
;     same as the passed-in object.
;   - If source information cannot be attached anyway (your Scheme doesn't
;     support attaching source information to objects), just return the
;     given object.
;   (replace-read-with f)
;   - Replaces your Scheme's current reader.
;   - Replace 'read and 'get-datum at the minimum.  If your Scheme
;     needs any kind of involved magic to handle load and loading
;     modules correctly, do it here.
;   next-line
;   line-separator
;   paragraph-separator
;   - The Unicode characters with those names.
;   - If your Scheme does *not* support Unicode, define these to be #f.
;   - If your Scheme *does* support Unicode, to prevent other Schemes
;     from misreading this file, use the following defines:
;       (define next-line (integer->char #x0085))
;       (define line-separator (integer->char #x2028))
;       (define paragraph-separator (integer->char #x2029))
;   (parse-hash no-indent-read char fake-port)
;   - a procedure that is invoked when an unrecognized, non-R5RS hash
;     character combination is encountered in the input port.
;   - this procedure is passed a "fake port", as wrapped by the
;     make-read procedure above.  You should probably use my-read-char
;     and my-peek-char in it, or at least unwrap the port (since
;     make-read does the wrapping, and you wrote make-read, we assume
;     you know how to unwrap the port).
;   - if your procedure needs to parse a datum, invoke
;     (no-indent-read fake-port).  Do NOT use any other read procedure.  The
;     no-indent-read procedure accepts exactly one parameter - the fake port
;     this procedure was passed in.
;     - no-indent-read is either a version of curly-infix-read, or a version
;       of neoteric-read; this specal version accepts only a fake port.
;       It is never a version of sweet-read.  You don't normally want to
;       call sweet-read, because sweet-read presumes that it's starting
;       at the beginning of the line, with indentation processing still
;       active.  There's no reason either must be true when processing "#".
;   - At the start of this procedure, both the # and the character
;     after it have been read in.
;   - The procedure returns one of the following:
;       #f  - the hash-character combination is invalid/not supported.
;       ()  - the hash-character combination introduced a comment;
;             at the return of this procedure with this value, the
;             comment has been removed from the input port.
;       (a) - the datum read in is the value a
;   hash-pipe-comment-nests?
;   - a Boolean value that specifies whether #|...|# comments
;     should nest.
;   my-string-foldcase
;   - a procedure to perform case-folding to lowercase, as mandated
;     by Unicode.  If your implementation doesn't have Unicode, define
;     this to be string-downcase.  Some implementations may also
;     interpret "string-downcase" as foldcase anyway.

; On Guile 2.0, the define-module part needs to occur separately from
; the rest of the compatibility checks, unfortunately.  Sigh.
    ; define the module
    ; this ensures that the user's module does not get contaminated with
    ; our compatibility procedures/macros
    (define-module (readable kernel))))
; -----------------------------------------------------------------------------
; Guile Compatibility
; -----------------------------------------------------------------------------

    ; properly get bindings
    (use-modules (guile))

    ; On Guile 1.x defmacro is the only thing supported out-of-the-box.
    ; This form still exists in Guile 2.x, fortunately.
    (defmacro readable-kernel-module-contents (exports . body)
      `(begin (export ,@exports)

    ; Guile was the original development environment, so the algorithm
    ; practically acts as if it is in Guile.
    ; Needs to be lambdas because otherwise Guile 2.0 acts strangely,
    ; getting confused on the distinction between compile-time,
    ; load-time and run-time (apparently, peek-char is not bound
    ; during load-time).
    (define (my-peek-char p)     (peek-char p))
    (define (my-read-char p)     (read-char p))

    (define (make-read f)
      (lambda args
        (let ((port (if (null? args) (current-input-port) (car args))))
          (f port))))

    (define (invoke-read read port)
      (read port))

    ; create a list with the source information
    (define (get-sourceinfo port)
      (list (port-filename port)
            (port-line port)
            (port-column port)))
    ; destruct the list and attach, but only to cons cells, since
    ; only that is reliably supported across Guile versions.
    (define (attach-sourceinfo pos obj)
        ((pair? obj)
          (set-source-property! obj 'filename (list-ref pos 0))
          (set-source-property! obj 'line     (list-ref pos 1))
          (set-source-property! obj 'column   (list-ref pos 2))

    ; To properly hack into 'load and in particular 'use-modules,
    ; we need to hack into 'primitive-load.  On 1.8 and 2.0 there
    ; is supposed to be a current-reader fluid that primitive-load
    ; hooks into, but it seems (unverified) that each use-modules
    ; creates a new fluid environment, so that this only sticks
    ; on a per-module basis.  But if the project is primarily in
    ; sweet-expressions, we would prefer to have that hook in
    ; *all* 'use-modules calls.  So our primitive-load uses the
    ; 'read global variable if current-reader isn't set.

    (define %sugar-current-load-port #f)
    ; replace primitive-load
    (define primitive-load-replaced #f)
    (define (setup-primitive-load)
          (module-set! (resolve-module '(guile)) 'primitive-load
            (lambda (filename)
              (let ((hook (cond
                            ((not %load-hook)
                            ((not (procedure? %load-hook))
                              (error "value of %load-hook is neither procedure nor #f"))
                    (hook filename)))
                (let* ((port      (open-input-file filename))
                       (save-port port))
                  (define (load-loop)
                    (let* ((the-read
                                 ; current-reader doesn't exist on 1.6
                                 (if (string=? "1.6" (effective-version))
                                     (fluid-ref current-reader))
                           (form (the-read port)))
                        ((not (eof-object? form))
                          ; in Guile only
                          (primitive-eval form)
                  (define (swap-ports)
                    (let ((tmp %sugar-current-load-port))
                      (set! %sugar-current-load-port save-port)
                      (set! save-port tmp)))
                  (dynamic-wind swap-ports load-loop swap-ports)
                  (close-input-port port)))))
          (set! primitive-load-replaced #t))))

    (define (replace-read-with f)
      (set! read f))

    ; define Unicode chars based on version.  On 1.x assume
    ; no Unicode (actually 1.9 has Unicode, but that's not a
    ; stable branch.)
    (define has-unicode
      (let* ((v (effective-version))
             (c (string-ref v 0)))
        (if (or (char=? c #\0) (char=? c #\1))
    (define next-line
      (if has-unicode
          (integer->char #x0085)
    (define line-separator
      (if has-unicode
          (integer->char #x2028)
    (define paragraph-separator
      (if has-unicode
          (integer->char #x2028)

    ; Guile has #! !# comments; these comments do *not* nest.
    ; On Guile 1.6 and 1.8 the only comments are ; and #! !#
    ; On Guile 2.0, #; (SRFI-62) and #| #| |# |# (SRFI-30) comments exist.
    ; On Guile 2.0, #' #` #, #,@ have the R6RS meaning; on
    ; Guile 1.8 and 1.6 there is a #' syntax but I have yet
    ; to figure out what exactly it does.
    ; On Guile, #:x is a keyword.  Keywords have symbol
    ; syntax.
    (define (parse-hash no-indent-read char fake-port)
      (let* ((ver (effective-version))
             (c   (string-ref ver 0))
             (>=2 (and (not (char=? c #\0)) (not (char=? c #\1)))))
          ((char=? char #\!)
            (if (consume-curly-infix fake-port)
              '()  ; We saw #!curly-infix, we're done.
              ; Otherwise, process non-nestable comment #! ... !#
                (non-nest-comment fake-port)
          ((char=? char #\:)
            ; On Guile 1.6, #: reads characters until it finds non-symbol
            ; characters.
            ; On Guile 1.8 and 2.0, #: reads in a datum, and if the
            ; datum is not a symbol, throws an error.
            ; Follow the 1.8/2.0 behavior as it is simpler to implement,
            ; and even on 1.6 it is unlikely to cause problems.
            ; NOTE: This behavior means that #:foo(bar) will cause
            ; problems on neoteric and higher tiers.
            (let ((s (no-indent-read fake-port)))
              (if (symbol? s)
                  `( ,(symbol->keyword s) )
          ; On Guile 2.0 #' #` #, #,@ have the R6RS meaning.
          ; guard against it here because of differences in
          ; Guile 1.6 and 1.8.
          ((and >=2 (char=? char #\'))
            `( (syntax ,(no-indent-read fake-port)) ))
          ((and >=2 (char=? char #\`))
            `( (quasisyntax ,(no-indent-read fake-port)) ))
          ((and >=2 (char=? char #\,))
            (let ((c2 (my-peek-char fake-port)))
                ((char=? c2 #\@)
                  (my-read-char fake-port)
                  `( (unsyntax-splicing ,(no-indent-read fake-port)) ))
                  `( (unsyntax ,(no-indent-read fake-port)) )))))
          ; #{ }# syntax
          ((char=? char #\{ )  ; Special symbol, through till ...}#
            `( ,(list->symbol (special-symbol fake-port))))

    ; detect the !#
    (define (non-nest-comment fake-port)
      (let ((c (my-read-char fake-port)))
          ((eof-object? c)
          ((char=? c #\!)
            (let ((c2 (my-peek-char fake-port)))
              (if (char=? c2 #\#)
                    (my-read-char fake-port)
                  (non-nest-comment fake-port))))
            (non-nest-comment fake-port)))))

  ; Return list of characters inside #{...}#, a guile extension.
  ; presume we've already read the sharp and initial open brace.
  ; On eof we just end.  We could error out instead.
  ; TODO: actually conform to Guile's syntax.  Note that 1.x
  ; and 2.0 have different syntax when spaces, backslashes, and
  ; control characters get involved.
  (define (special-symbol port)
      ((eof-object? (my-peek-char port)) '())
      ((eqv? (my-peek-char port) #\})
        (my-read-char port) ; consume closing brace
          ((eof-object? (my-peek-char port)) '(#\}))
          ((eqv? (my-peek-char port) #\#)
            (my-read-char port) ; Consume closing sharp.
          (#t (append '(#\}) (special-symbol port)))))
      (#t (append (list (my-read-char port)) (special-symbol port)))))

    (define hash-pipe-comment-nests? #t)

    (define (my-string-foldcase s)
      (string-downcase s))
; -----------------------------------------------------------------------------
; R5RS Compatibility
; -----------------------------------------------------------------------------
    ; assume R5RS with define-syntax

    ; On R6RS, and other Scheme's, module contents must
    ; be entirely inside a top-level module structure.
    ; Use module-contents to support that.  On Schemes
    ; where module declarations are separate top-level
    ; expressions, we expect module-contents to transform
    ; to a simple (begin ...), and possibly include
    ; whatever declares exported stuff on that Scheme.
    (define-syntax readable-kernel-module-contents
      (syntax-rules ()
        ((readable-kernel-module-contents exports body ...)
          (begin body ...))))

    ; We use my-* procedures so that the
    ; "port" automatically keeps track of source position.
    ; On Schemes where that is not true (e.g. Racket, where
    ; source information is passed into a reader and the
    ; reader is supposed to update it by itself) we can wrap
    ; the port with the source information, and update that
    ; source information in the my-* procedures.

    (define (my-peek-char port) (peek-char port))
    (define (my-read-char port) (read-char port))

    ; this wrapper procedure wraps a reader procedure
    ; that accepts a "fake" port above, and converts
    ; it to an R5RS-compatible procedure.  On Schemes
    ; which support source-information annotation,
    ; but use a different way of annotating
    ; source-information from Guile, this procedure
    ; should also probably perform that attachment
    ; on exit from the given inner procedure.
    (define (make-read f)
      (lambda args
        (let ((real-port (if (null? args) (current-input-port) (car args))))
          (f real-port))))

    ; invoke the given "actual" reader, most likely
    ; the builtin one, but make sure to unwrap any
    ; fake ports.
    (define (invoke-read read port)
      (read port))
    ; R5RS doesn't have any method of extracting
    ; or attaching source location information.
    (define (get-sourceinfo _) #f)
    (define (attach-sourceinfo _ x) x)

    ; Not strictly R5RS but we expect at least some Schemes
    ; to allow this somehow.
    (define (replace-read-with f)
      (set! read f))

    ; Assume that a random R5RS Scheme doesn't support Unicode
    ; out-of-the-box
    (define next-line #f)
    (define line-separator #f)
    (define paragraph-separator #f)

    ; R5RS has no hash extensions, but handle #!curly-infix.
    (define (parse-hash no-indent-read char fake-port)
        ((eq? c #\!)
          (if (consume-curly-infix fake-port)
             '()  ; Found #!curly-infix, quietly accept it.
        (#t #f))) ; No other hash extensions.

    ; Hash-pipe comment is not in R5RS, but support
    ; it as an extension, and make them nest.
    (define hash-pipe-comment-nests? #t)

    ; If your Scheme supports "string-foldcase", use that instead of
    ; string-downcase:
    (define (my-string-foldcase s)
      (string-downcase s))

; -----------------------------------------------------------------------------
; Module declaration and useful utilities
; -----------------------------------------------------------------------------
  ; exported procedures
  (; tier read procedures
   curly-infix-read neoteric-read sweet-read
   ; comparison procedures
   compare-read-file ; compare-read-string
   ; replacing the reader
   replace-read restore-traditional-read
   enable-curly-infix enable-neoteric enable-sweet)

  ; Should we fold case of symbols by default?
  ; #f means case-sensitive (R6RS); #t means case-insensitive (R5RS).
  ; Here we'll set it to be case-sensitive, which is consistent with R6RS
  ; and guile, but NOT with R5RS.  Most people won't notice, I
  ; _like_ case-sensitivity, and the latest spec is case-sensitive,
  ; so let's start with #f (case-sensitive).
  ; This doesn't affect character names; as an extension,
  ; We always accept arbitrary case for them, e.g., #\newline or #\NEWLINE.
  (define foldcase-default #f)

  ; special tag to denote comment return from hash-processing

  ; Define the whitespace characters, in relatively portable ways
  ; Presumes ASCII, Latin-1, Unicode or similar.
  (define tab (integer->char #x0009))             ; #\ht aka \t.
  (define linefeed (integer->char #x000A))        ; #\newline aka \n. FORCE it.
  (define carriage-return (integer->char #x000D)) ; \r.
  (define line-tab (integer->char #x000D))
  (define form-feed (integer->char #x000C))
  (define space '#\space)

  (define line-ending-chars-ascii (list linefeed carriage-return))
  (define line-ending-chars
      (if next-line
          (list next-line)
      (if line-separator
          (list line-separator)

  ; This definition of whitespace chars is per R6RS section 4.2.1.
  ; R6RS doesn't explicitly list the #\space character, be sure to include!
  (define whitespace-chars-ascii
     (list tab linefeed line-tab form-feed carriage-return #\space))
  (define whitespace-chars
      (if next-line
          (list next-line)
      (if line-separator
          (list line-separator)
      (if paragraph-separator
          (list paragraph-separator)
  ; If supported, add characters whose category is Zs, Zl, or Zp

  ; Returns a true value (not necessarily #t)
  (define (char-line-ending? char) (memq char line-ending-chars))

  ; Return #t if char is space or tab.
  (define (char-horiz-whitespace? char)
    (or (eqv? char #\space)
        (eqv? char tab)))

  ; Create own version, in case underlying implementation omits some.
  (define (my-char-whitespace? c)
    (or (char-whitespace? c) (memq c whitespace-chars)))

  ; Consume an end-of-line sequence. This is 2 unequal end-of-line
  ; characters, or a single end-of-line character, whichever is longer.
  (define (consume-end-of-line port)
    (let ((c (my-peek-char port)))
      (if (char-line-ending? c)
          (my-read-char port)
          (let ((next (my-peek-char port)))
            (if (and (not (eq? c next))
                     (char-line-ending? next))
              (my-read-char port)))))))

  (define (consume-to-eol port)
    ; Consume every non-eol character in the current line.
    ; End on EOF or end-of-line char.
    ; Do NOT consume the end-of-line character(s).
    (let ((c (my-peek-char port)))
        ((not (or (eof-object? c)
                  (char-line-ending? c)))
          (my-read-char port)
          (consume-to-eol port)))))

  ; Consume exactly lyst from port.
  (define (consume-exactly port lyst)
      ((null? lyst) #t)
      ((eq? (my-peek-char port) (car lyst))
        (my-read-char port)
        (consume-exactly port (cdr lyst)))
      (#t #f)))

  ; Consume exactly "curly-infix" WHITESPACE, for use in #!curly-infix
  (define (consume-curly-infix port)
    (if (and (consume-exactly port (string->list "curly-infix"))
             (my-char-whitespace? (my-peek-char port)))
        (my-read-char port)

  (define (ismember? item lyst)
    ; Returns true if item is member of lyst, else false.
    (pair? (member item lyst)))

  ; Quick utility for debugging.  Display marker, show data, return data.
  (define (debug-show marker data)
    (display "DEBUG: ")
    (display marker)
    (display " = ")
    (write data)
    (display "\n")

  (define (my-read-delimited-list my-read stop-char port)
    ; Read the "inside" of a list until its matching stop-char, returning list.
    ; stop-char needs to be closing paren, closing bracket, or closing brace.
    ; This is like read-delimited-list of Common Lisp, but it
    ; calls the specified reader instead.
    ; This implements a useful extension: (. b) returns b. This is
    ; important as an escape for indented expressions, e.g., (. \\)
    (consume-whitespace port)
      ((pos (get-sourceinfo port))
       (c   (my-peek-char port)))
        ((eof-object? c) (read-error "EOF in middle of list") c)
        ((char=? c stop-char)
          (my-read-char port)
          (attach-sourceinfo pos '()))
        ((ismember? c '(#\) #\] #\}))  (read-error "Bad closing character") c)
          (let ((datum (my-read port)))
               ((eq? datum '.)
                 (let ((datum2 (my-read port)))
                   (consume-whitespace port)
                     ((eof-object? datum2)
                      (read-error "Early eof in (... .)")
                     ((not (eqv? (my-peek-char port) stop-char))
                      (read-error "Bad closing character after . datum")
                       (my-read-char port)
                 (attach-sourceinfo pos
                   (cons datum
                     (my-read-delimited-list my-read stop-char port))))))))))

; -----------------------------------------------------------------------------
; Read Preservation and Replacement
; -----------------------------------------------------------------------------

  (define default-scheme-read read)
  (define replace-read replace-read-with)
  (define (restore-traditional-read) (replace-read-with default-scheme-read))

  (define (enable-curly-infix)
    (if (not (or (eq? read curly-infix-read)
                 (eq? read neoteric-read)
                 (eq? read sweet-read)))
        (replace-read curly-infix-read)))

  (define (enable-neoteric)
    (if (not (or (eq? read neoteric-read)
                 (eq? read sweet-read)))
        (replace-read neoteric-infix-read)))

  (define (enable-sweet)
    (replace-read sweet-read))

; -----------------------------------------------------------------------------
; Scheme Reader re-implementation
; -----------------------------------------------------------------------------

; We have to re-implement our own Scheme reader.
; This takes more code than it would otherwise because many
; Scheme readers will not consider [, ], {, and } as delimiters
; (they are not required delimiters in R5RS and R6RS).
; Thus, we cannot call down to the underlying reader to implement reading
; many types of values such as symbols.
; If your Scheme's "read" also considers [, ], {, and } as
; delimiters (and thus are not consumed when reading symbols, numbers, etc.),
; then underlying-read could be much simpler.
; We WILL call default-scheme-read on string reading (the ending delimiter
; is ", so that is no problem) - this lets us use the implementation's
; string extensions if any.

  ; Identifying the list of delimiter characters is harder than you'd think.
  ; This list is based on R6RS section 4.2.1, while adding [] and {},
  ; but removing "#" from the delimiter set.
  ; NOTE: R6RS has "#" has a delimiter.  However, R5RS does not, and
  ; R7RS probably will not - http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results
  ; shows a strong vote AGAINST "#" being a delimiter.
  ; Having the "#" as a delimiter means that you cannot have "#" embedded
  ; in a symbol name, which hurts backwards compatibility, and it also
  ; breaks implementations like Chicken (has many such identifiers) and
  ; Gambit (which uses this as a namespace separator).
  ; Thus, this list does NOT have "#" as a delimiter, contravening R6RS
  ; (but consistent with R5RS, probably R7RS, and several implementations).
  ; Also - R7RS draft 6 has "|" as delimiter, but we currently don't.
  (define neoteric-delimiters
     (append (list #\( #\) #\[ #\] #\{ #\}  ; Add [] {}
                   #\" #\;)                 ; Could add #\# or #\|

  (define (consume-whitespace port)
    (let ((char (my-peek-char port)))
        ((eof-object? char))
        ((eqv? char #\;)
          (consume-to-eol port)
          (consume-whitespace port))
        ((my-char-whitespace? char)
          (my-read-char port)
          (consume-whitespace port)))))

  (define (read-until-delim port delims)
    ; Read characters until eof or a character in "delims" is seen.
    ; Do not consume the eof or delimiter.
    ; Returns the list of chars that were read.
    (let ((c (my-peek-char port)))
         ((eof-object? c) '())
         ((ismember? c delims) '())
         (#t (cons (my-read-char port) (read-until-delim port delims))))))

  (define (read-error message)
    (display "Error: ")
    (display message)

  (define (read-number port starting-lyst)
    (string->number (list->string
      (append starting-lyst
        (read-until-delim port neoteric-delimiters)))))

  (define (process-char port)
    ; We've read #\ - returns what it represents.
      ((eof-object? (my-peek-char port)) (my-peek-char port))
        ; Not EOF. Read in the next character, and start acting on it.
        (let ((c (my-read-char port))
              (rest (read-until-delim port neoteric-delimiters)))
            ((null? rest) c) ; only one char after #\ - so that's it!
              (let ((rest-string (list->string (cons c rest))))
                  ; Implement R6RS character names, see R6RS section 4.2.6.
                  ; As an extension, we will ALWAYS accept character names
                  ; of any case, no matter what the case-folding value is.
                  ((string-ci=? rest-string "space") #\space)
                  ((string-ci=? rest-string "newline") #\newline)
                  ((string-ci=? rest-string "tab") tab)
                  ((string-ci=? rest-string "nul") (integer->char #x0000))
                  ((string-ci=? rest-string "alarm") (integer->char #x0007))
                  ((string-ci=? rest-string "backspace") (integer->char #x0008))
                  ((string-ci=? rest-string "linefeed") (integer->char #x000A))
                  ((string-ci=? rest-string "vtab") (integer->char #x000B))
                  ((string-ci=? rest-string "page") (integer->char #x000C))
                  ((string-ci=? rest-string "return") (integer->char #x000D))
                  ((string-ci=? rest-string "esc") (integer->char #x001B))
                  ((string-ci=? rest-string "delete") (integer->char #x007F))
                  ; Additional character names as extensions:
                  ((string-ci=? rest-string "ht") tab)
                  ((string-ci=? rest-string "cr") (integer->char #x000d))
                  ((string-ci=? rest-string "bs") (integer->char #x0008))
                  (#t (read-error "Invalid character name"))))))))))

  ; If fold-case is active on this port, return string "s" in folded case.
  ; Otherwise, just return "s".  This is needed to support our
  ; foldcase-default configuration value when processing symbols.
  ; TODO: If R7RS adds #!fold-case and #!no-fold-case, add support here.
  (define (fold-case-maybe port s)
    (if foldcase-default
      (my-string-foldcase s)

  (define (process-sharp no-indent-read port)
    ; We've read a # character.  Returns a list whose car is what it
    ; represents; empty list means "comment".
    ; Note: Since we have to re-implement process-sharp anyway,
    ; the vector representation #(...) uses my-read-delimited-list, which in
    ; turn calls no-indent-read.
    ; TODO: Create a readtable for this case.
    (let ((c (my-peek-char port)))
        ((eof-object? c) (list c)) ; If eof, return eof.
          ; Not EOF. Read in the next character, and start acting on it.
          (my-read-char port)
            ((char-ci=? c #\t)  '(#t))
            ((char-ci=? c #\f)  '(#f))
            ((ismember? c '(#\i #\e #\b #\o #\d #\x
                            #\I #\E #\B #\O #\D #\X))
              (list (read-number port (list #\# (char-downcase c)))))
            ((char=? c #\( )  ; Vector.
              (list (list->vector (my-read-delimited-list no-indent-read #\) port))))
            ((char=? c #\\) (list (process-char port)))
            ; Handle #; (item comment).
            ((char=? c #\;)
              (no-indent-read port)  ; Read the datum to be consumed.
              '()) ; Return comment
            ; handle nested comments
            ((char=? c #\|)
              (nest-comment port)
              '()) ; Return comment
              (let ((rv (parse-hash no-indent-read c port)))
                  ((not rv)
                    (read-error "Invalid #-prefixed string"))

  ; detect #| or |#
  (define (nest-comment fake-port)
    (let ((c (my-read-char fake-port)))
        ((eof-object? c)
        ((char=? c #\|)
          (let ((c2 (my-peek-char fake-port)))
            (if (char=? c2 #\#)
                  (my-read-char fake-port)
                (nest-comment fake-port))))
        ((and hash-pipe-comment-nests? (char=? c #\#))
          (let ((c2 (my-peek-char fake-port)))
            (if (char=? c2 #\|)
                  (my-read-char fake-port)
                  (nest-comment fake-port))
            (nest-comment fake-port)))
          (nest-comment fake-port)))))

  (define digits '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))

  (define (process-period port)
    ; We've peeked a period character.  Returns what it represents.
    (my-read-char port) ; Remove .
    (let ((c (my-peek-char port)))
        ((eof-object? c) '.) ; period eof; return period.
        ((ismember? c digits)
          (read-number port (list #\.)))  ; period digit - it's a number.
          ; At this point, Scheme only requires support for "." or "...".
          ; As an extension we can support them all.
            (fold-case-maybe port
              (list->string (cons #\.
                (read-until-delim port neoteric-delimiters)))))))))

  ; This implements a simple Scheme "read" implementation from "port",
  ; but if it must recurse to read, it will invoke "no-indent-read"
  ; (a reader that is NOT indentation-sensitive).
  ; This additional parameter lets us easily implement additional semantics,
  ; and then call down to this underlying-read procedure when basic reader
  ; procedureality (implemented here) is needed.
  ; This lets us implement both a curly-infix-ONLY-read
  ; as well as a neoteric-read, without duplicating code.
  (define (underlying-read no-indent-read port)
    (consume-whitespace port)
    (let* ((pos (get-sourceinfo port))
           (c   (my-peek-char port)))
        ((eof-object? c) c)
        ((char=? c #\")
          ; old readers tend to read strings okay, call it.
          ; (guile 1.8 and gauche/gosh 1.8.11 are fine)
          (invoke-read default-scheme-read port))
          ; attach the source information to the item read-in
          (attach-sourceinfo pos
              ((ismember? c digits) ; Initial digit.
                (read-number port '()))
              ((char=? c #\#)
                (my-read-char port)
                (let ((rv (process-sharp no-indent-read port)))
                  ; process-sharp convention: null? means comment,
                  ; pair? means object (the object is in its car)
                    ((null? rv)
                      ; recurse
                      (no-indent-read port))
                    ((pair? rv)
                      (car rv))
                    (#t ; convention violated
                      (read-error "readable/kernel: ***ERROR IN COMPATIBILITY LAYER parse-hash must return #f '() or `(,a)")))))
              ((char=? c #\.) (process-period port))
              ((or (char=? c #\+) (char=? c #\-))  ; Initial + or -
                (my-read-char port)
                (if (ismember? (my-peek-char port) digits)
                  (read-number port (list c))
                  (string->symbol (fold-case-maybe port
                    (list->string (cons c
                      (read-until-delim port neoteric-delimiters)))))))
              ((char=? c #\')
                (my-read-char port)
                (list (attach-sourceinfo pos 'quote)
                  (no-indent-read port)))
              ((char=? c #\`)
                (my-read-char port)
                (list (attach-sourceinfo pos 'quasiquote)
                  (no-indent-read port)))
              ((char=? c #\,)
                (my-read-char port)
                    ((char=? #\@ (my-peek-char port))
                      (my-read-char port)
                      (list (attach-sourceinfo pos 'unquote-splicing)
                       (no-indent-read port)))
                    (list (attach-sourceinfo pos 'unquote)
                      (no-indent-read port)))))
              ((char=? c #\( )
                  (my-read-char port)
                  (my-read-delimited-list no-indent-read #\) port))
              ((char=? c #\) )
                (read-char port)
                (read-error "Closing parenthesis without opening")
                (underlying-read no-indent-read port))
              ((char=? c #\[ )
                  (my-read-char port)
                  (my-read-delimited-list no-indent-read #\] port))
              ((char=? c #\] )
                (read-char port)
                (read-error "Closing bracket without opening")
                (underlying-read no-indent-read port))
              ((char=? c #\} )
                (read-char port)
                (read-error "Closing brace without opening")
                (underlying-read no-indent-read port))
              ((char=? c #\| )
                ; Scheme extension, |...| symbol (like Common Lisp)
                ; This is present in R7RS draft 6.
                (my-read-char port) ; Consume the initial vertical bar.
                (let ((newsymbol
                  ; Do NOT call fold-case-maybe; always use literal values.
                  (string->symbol (list->string
                    (read-until-delim port '(#\|))))))
                  (my-read-char port)
              (#t ; Nothing else.  Must be a symbol start.
                (string->symbol (fold-case-maybe port
                    (read-until-delim port neoteric-delimiters)))))))))))

; -----------------------------------------------------------------------------
; Curly Infix
; -----------------------------------------------------------------------------

  ; Return true if lyst has an even # of parameters, and the (alternating)
  ; first parameters are "op".  Used to determine if a longer lyst is infix.
  ; If passed empty list, returns true (so recursion works correctly).
  (define (even-and-op-prefix? op lyst)
      ((null? lyst) #t)
      ((not (pair? lyst)) #f)
      ((not (equal? op (car lyst))) #f) ; fail - operators not the same
      ((not (pair? (cdr lyst)))  #f) ; Wrong # of parameters or improper
      (#t   (even-and-op-prefix? op (cddr lyst))))) ; recurse.

  ; Return true if the lyst is in simple infix format
  ; (and thus should be reordered at read time).
  (define (simple-infix-list? lyst)
      (pair? lyst)           ; Must have list;  '() doesn't count.
      (pair? (cdr lyst))     ; Must have a second argument.
      (pair? (cddr lyst))    ; Must have a third argument (we check it
                             ; this way for performance)
      (even-and-op-prefix? (cadr lyst) (cdr lyst)))) ; true if rest is simple

  ; Return alternating parameters in a list (1st, 3rd, 5th, etc.)
  (define (alternating-parameters lyst)
    (if (or (null? lyst) (null? (cdr lyst)))
      (cons (car lyst) (alternating-parameters (cddr lyst)))))

  ; Not a simple infix list - transform it.  Written as a separate procedure
  ; so that future experiments or SRFIs can easily replace just this piece.
  (define (transform-mixed-infix lyst)
     (cons '$nfx$ lyst))

  ; Given curly-infix lyst, map it to its final internal format.
  (define (process-curly lyst)
     ((not (pair? lyst)) lyst) ; E.G., map {} to ().
     ((null? (cdr lyst)) ; Map {a} to a.
       (car lyst))
     ((and (pair? (cdr lyst)) (null? (cddr lyst))) ; Map {a b} to (a b).
     ((simple-infix-list? lyst) ; Map {a OP b [OP c...]} to (OP a b [c...])
       (cons (cadr lyst) (alternating-parameters lyst)))
     (#t  (transform-mixed-infix lyst))))

  (define (curly-infix-read-real no-indent-read port)
    (let* ((pos (get-sourceinfo port))
            (c   (my-peek-char port)))
        ((eof-object? c) c)
        ((eqv? c #\;)
          (consume-to-eol port)
          (curly-infix-read-real no-indent-read port))
        ((my-char-whitespace? c)
          (my-read-char port)
          (curly-infix-read-real no-indent-read port))
        ((eqv? c #\{)
          (my-read-char port)
          ; read in as infix
          (attach-sourceinfo pos
              (my-read-delimited-list neoteric-read-real #\} port))))
          (underlying-read no-indent-read port)))))

  ; Read using curly-infix-read-real
  (define (curly-infix-read-nocomment port)
    (curly-infix-read-real curly-infix-read-nocomment port))

; -----------------------------------------------------------------------------
; Neoteric Expressions
; -----------------------------------------------------------------------------

  ; Implement neoteric-expression's prefixed (), [], and {}.
  ; At this point, we have just finished reading some expression, which
  ; MIGHT be a prefix of some longer expression.  Examine the next
  ; character to be consumed; if it's an opening paren, bracket, or brace,
  ; then the expression "prefix" is actually a prefix.
  ; Otherwise, just return the prefix and do not consume that next char.
  ; This recurses, to handle formats like f(x)(y).
  (define (neoteric-process-tail port prefix)
      (let* ((pos (get-sourceinfo port))
             (c   (my-peek-char port)))
          ((eof-object? c) prefix)
          ((char=? c #\( ) ; Implement f(x)
            (my-read-char port)
            (neoteric-process-tail port
              (attach-sourceinfo pos
                (cons prefix (my-read-delimited-list neoteric-read-nocomment #\) port)))))
          ((char=? c #\[ )  ; Implement f[x]
            (my-read-char port)
            (neoteric-process-tail port
                (attach-sourceinfo pos
                  (cons (attach-sourceinfo pos '$bracket-apply$)
                    (cons prefix
                      (my-read-delimited-list neoteric-read-nocomment #\] port))))))
          ((char=? c #\{ )  ; Implement f{x}
            (read-char port)
            (neoteric-process-tail port
              (attach-sourceinfo pos
                  ((tail (process-curly
                      (my-read-delimited-list neoteric-read-nocomment #\} port))))
                  (if (eqv? tail '())
                    (list prefix) ; Map f{} to (f), not (f ()).
                    (list prefix tail))))))
          (#t prefix))))

  ; This is the "real" implementation of neoteric-read.
  ; It directly implements unprefixed (), [], and {} so we retain control;
  ; it calls neoteric-process-tail so f(), f[], and f{} are implemented.
  ;  (if (eof-object? (my-peek-char port))
  (define (neoteric-read-real port)
      ((pos (get-sourceinfo port))
       (c   (my-peek-char port))
           ((eof-object? c) c)
           ((char=? c #\( )
             (my-read-char port)
             (attach-sourceinfo pos
               (my-read-delimited-list neoteric-read-nocomment #\) port)))
           ((char=? c #\[ )
             (my-read-char port)
             (attach-sourceinfo pos
               (my-read-delimited-list neoteric-read-nocomment #\] port)))
           ((char=? c #\{ )
             (my-read-char port)
             (attach-sourceinfo pos
                 (my-read-delimited-list neoteric-read-nocomment #\} port))))
           ((my-char-whitespace? c)
             (my-read-char port)
             (neoteric-read-real port))
           ((eqv? c #\;)
             (consume-to-eol port)
             (neoteric-read-real port))
           (#t (underlying-read neoteric-read-nocomment port)))))
      (if (eof-object? result)
        (neoteric-process-tail port result))))

  (define (neoteric-read-nocomment port)
    (neoteric-read-real port))

; -----------------------------------------------------------------------------
; Sweet Expressions
; -----------------------------------------------------------------------------

  ; NOTE split et al. should not begin in #, as # causes
  ; the top-level parser to guard against multiline comments.
  (define split (string->symbol "\\\\"))
  (define split-char #\\ ) ; First character of split symbol.
  (define non-whitespace-indent #\!) ; Non-whitespace-indent char.
  (define sublist (string->symbol "$"))
  (define sublist-char #\$) ; First character of sublist symbol.

  ; This is a special unique object that is used to
  ; represent the existence of the split symbol
  ; so that readblock-clean handles it properly:
  (define split-tag (cons 'split-tag! '()))

  ; This is a special unique object that is used to represent the
  ; existence of a comment such as #|...|#, #!...!#, and #;datum.
  ; The process-sharp for sweet-expressions is separately implemented,
  ; and returns comment-tag for these commenting expressions, so that the
  ; sweet-expression reader can properly handle newlines after them
  ; (e.g., after a newline the "!" indents become active).
  ; The problem is that "#" can introduce many constructs, not just comments,
  ; and we'd need two-character lookahead (which isn't portable) to know
  ; when that occurs.  So instead, we process #, and return comment-tag
  ; when it's a comment.
  ; We don't need use this for ;-comments; we can handle them directly,
  ; since no lookahead is needed to disambiguate them.
  (define comment-tag (cons 'comment-tag! '())) ; all cons cells are unique

  (define (process-sharp-comment-tag no-indent-read port)
    ; this changes the convention of process-sharp
    ; to be either the object itself, or a special
    ; object called the comment-tag
    (let ((rv (process-sharp no-indent-read port)))
        ((null? rv)
        ((pair? rv)
          (neoteric-process-tail port (car rv)))
          (read-error "the impossible happened: process-sharp returned incorrect value")))))

  ; Call neoteric-read, but handle # specially, so that #|...|# at the
  ; top level will return comment-tag instead.
  (define (neoteric-read-real-comment-tag port)
    (let ((c (my-peek-char port)))
        ((eof-object? c) c)
        ((eqv? c #\#)
          (my-read-char port)
          (process-sharp-comment-tag neoteric-read-real port))
        (#t (neoteric-read-real port)))))

  (define (readquote level port qt)
    (let ((char (my-peek-char port)))
      (if (char-whitespace? char)
          (list qt)
          (list qt (neoteric-read-nocomment port)))))

  ; NOTE: this procedure can return comment-tag.  Program defensively
  ; against this when calling it.
  (define (readitem level port)
    (let ((pos  (get-sourceinfo port))
          (char (my-peek-char port)))
       ((eqv? char #\`)
        (my-read-char port)
        (attach-sourceinfo pos (readquote level port 'quasiquote)))
       ((eqv? char #\')
        (my-read-char port)
        (attach-sourceinfo pos (readquote level port 'quote)))
       ((eqv? char #\,)
        (my-read-char port)
          ((eqv? (my-peek-char port) #\@)
            (my-read-char port)
            (attach-sourceinfo pos (readquote level port 'unquote-splicing)))
            (attach-sourceinfo pos (readquote level port 'unquote)))))
          (neoteric-read-real-comment-tag port)))))

  (define (indentation>? indentation1 indentation2)
    (let ((len1 (string-length indentation1))
            (len2 (string-length indentation2)))
      (and (> len1 len2)
             (string=? indentation2 (substring indentation1 0 len2)))))

  (define (accumulate-hspace port)
    (if (or (char-horiz-whitespace?     (my-peek-char port))
            (eqv? non-whitespace-indent (my-peek-char port)))
        (cons (read-char port) (accumulate-hspace port))

  (define (indentationlevel port)
    (let* ((indent (accumulate-hspace port)) (c (my-peek-char port)))
        ((eqv? c #\;)
          (consume-to-eol port) ; COMPLETELY ignore comment-only lines.
          (consume-end-of-line port)
          (indentationlevel port))
        ; If ONLY whitespace on line, treat as "", because there's no way
        ; to (visually) tell the difference (preventing hard-to-find errors):
        ((eof-object? c) "")
        ((char-line-ending? c) "")
        (#t (list->string indent)))))

  ;; Reads all subblocks of a block
  ;; this essentially implements the "body" production
  ;; - return value:
  ;;   cons
  ;;     next-level ;
  ;;     (xs ...) ; the body
  (define (readblocks level port)
    (let* ((pos        (get-sourceinfo port))
           (read       (readblock-clean level port))
           (next-level (car read))
           (block      (cdr read)))
        ; check EOF
        ((eqv? next-level -1)
          (cons "" '()))
        ((string=? next-level level)
          (let* ((reads (readblocks level port))
                 (next-next-level (car reads))
                 (next-blocks (cdr reads)))
            (if (eq? block '.)
                (if (pair? next-blocks)
                    (cons next-next-level (car next-blocks))
                    (cons next-next-level next-blocks))
                (cons next-next-level
                      (attach-sourceinfo pos (cons block next-blocks))))))
          (cons next-level (attach-sourceinfo pos (list block)))))))

  ;; Read one block of input
  ;; this essentially implements the "head" production
  ;; - return value:
  ;;   cons
  ;;     next-level ; the indentation of the line that ends this block
  ;;     expr ;       the read-in expression
  (define (readblock level port)
    (readblock-internal level port #t))
  (define (readblock-internal level port first-item?)
    (let* ((pos  (get-sourceinfo port))
           (char (my-peek-char port)))
       ((eof-object? char)
          (cons -1 char))
       ((eqv? char #\;)
          (consume-to-eol port)
          (readblock level port))
       ((char-line-ending? char)
          (consume-end-of-line port)
          (let ((next-level (indentationlevel port)))
            (if (indentation>? next-level level)
                (readblocks next-level port)
                (cons next-level (attach-sourceinfo pos '())))))
       ((char-horiz-whitespace? char)
          (my-read-char port)
          (readblock-internal level port first-item?))
          (let ((first (readitem level port)))
              ((and first-item?
                    (or (equal? first '(quote))
                        (equal? first '(quasiquote))
                        (equal? first '(unquote))
                        (equal? first '(unquote-splicing))))
                (consume-horizontal-whitespace port)
                (let* ((sub-read (readblock-clean level port))
                       (outlevel (car sub-read))
                       (sub-expr (cdr sub-read)))
                  (cons outlevel (attach-sourceinfo pos `(,@first ,sub-expr)))))
              ; remove multiline comment immediately if not at
              ; start of line
              ((and (not first-item?) (eq? first comment-tag))
                (readblock-internal level port first-item?))
                 ; treat multiline comment at start-of-line as SPLIT
                 (and first-item? (eq? first comment-tag))
                 (and (eq? char split-char) (eq? first split)))
                ; consume horizontal, non indent whitespace
                (consume-horizontal-whitespace port)
                (if first-item?
                    ;; NB: need a couple of hacks to fix
                    ;; behavior when SPLIT-by-itself
                    (if (char-line-ending? (my-peek-char port))
                        ; check SPLIT-by-itself
                        ; SPLIT-by-itself: some hacks needed
                        (let* ((sub-read (readblock level port))
                               (outlevel (car sub-read))
                               (sub-expr (cdr sub-read)))
                          ; check SPLIT followed by same indent line
                          (if (and (null? sub-expr) (string=? outlevel level))
                              ; blank SPLIT:
                              ; \
                              ; \
                              ; x
                              ; ===> x, not () () x
                              (readblock level port)
                              ; non-blank SPLIT: insert our
                              ; split-tag.  Without SPLIT-tag
                              ; readblock-clean will mishandle:
                              ; \
                              ;   x y
                              ; ==> ((x y)), which is a single
                              ; item list.  Single-item lists
                              ; are extracted, resulting in
                              ; (x y)
                              (cons outlevel (cons split-tag (attach-sourceinfo pos sub-expr)))))
                        ; not SPLIT-by-itself: just ignore it
                        (readblock-internal level port first-item?))
                    ; SPLIT-inline: end this block
                    (cons level (attach-sourceinfo pos '()))))
              ; sublist
              ((and (eq? char sublist-char) (eq? first sublist))
                    ; Create list of rest of items.
                    ; Was: (read-error "SUBLIST found at start of line")
                    (let* ((read (readblock-clean level port))
                           (next-level (car read))
                           (block (cdr read)))
                      (cons next-level (cons split-tag (attach-sourceinfo pos (list block))))))
                    (consume-horizontal-whitespace port)
                    (let* ((read (readblock-clean level port))
                           (next-level (car read))
                           (block (cdr read)))
                      (cons next-level (attach-sourceinfo pos (list block)))))))
                (let* ((rest (readblock-internal level port #f))
                       (level (car rest))
                       (block (cdr rest)))
                  ;; this check converts:
                  ;;  . foo
                  ;; ->
                  ;;  (. foo)
                  ;; ->
                  ;;  foo
                  ;; HOWEVER, it might not be compatible
                  ;; 100% with the "." as indentation
                  ;; whitespace thing.
                    ((eqv? level -1)
                      ; EOF encountered - end at first
                      (cons "" (list first)))
                    ((eq? first '.)
                      (if (pair? block)
                          (cons level (car block))
                      (cons level (attach-sourceinfo pos
                                       (cons first block)))))))))))))

  ;; Consumes as much horizontal, non-indent whitespace as
  ;; possible.  Treat comments as horizontal whitespace too.
  ;; Note that this does NOT consume any end-of-line characters.
  (define (consume-horizontal-whitespace port)
    (let ((char (my-peek-char port)))
        ((char-horiz-whitespace? char)
           (my-read-char port)
           (consume-horizontal-whitespace port))
        ((eqv? char #\;)
           (consume-to-eol port)))))

  ;; Reads a block and handles (quote), (unquote),
  ;; (unquote-splicing) and (quasiquote).
  (define (readblock-clean level port)
    (let* ((read (readblock level port))
           (next-level (car read))
           (block (cdr read)))
        ; remove split-tag
        ((and (pair? block) (eq? (car block) split-tag))
          (cons next-level (cdr block)))
        ; non-list and multi-item blocks.
        ((or (not (list? block)) (> (length block) 1))
          (cons next-level block))
        ; unwrap single-item blocks
        ((= (length block) 1)
          ; TODO: study if this is indeed necessary
          (if (eq? (car block) split-tag)
              ; "magically" remove split-tag
              (cons next-level '())
              (cons next-level (car block))))
          (cons next-level '.)))))

  ; TODO: merge the latter part of readblock-clean and
  ; readblock-clean-rotated, so that changes need to
  ; be done in only one place.

  ;; like readblock-clean, but with an initial object
  ;; already given
  (define (readblock-clean-rotated level port pos obj)
    (let* ((read (readblock-internal level port #f))
           (next-level (car read))
           (sub-block (cdr read))
           (block (cons obj sub-block)))
      ; unlike readblock-clean, we know that block
      ; is indeed a list, and its first item is
      ; *not* split-tag.  The question is the length
      ; of that list.
        ((null? sub-block)
          (cons next-level (attach-sourceinfo pos obj)))
        (#t (cons next-level (attach-sourceinfo pos block))))))

  ; Read single complete I-expression.
  ; TODO: merge handling of ;-comments and #|...|# comments
  (define (sugar-start-expr port)
    (let* ((c (my-peek-char port)))
      (if (eof-object? c)
        (let* ((indentation (list->string (accumulate-hspace port)))
               (pos (get-sourceinfo port))
               (c   (my-peek-char port)))
            ((eof-object? c) c) ; EOF - return it, we're done.
            ((eqv? c #\; )    ; comment - consume and see what's after it.
              (let ((d (consume-to-eol port)))
                  ((eof-object? d) d) ; If EOF after comment, return it.
                    (my-read-char port) ; Newline after comment.  Consume NL
                    (sugar-start-expr port))))) ; and try again
            ; hashes are potential comments too
            ((eqv? c #\#)
              (let ((obj (neoteric-read-real-comment-tag port)))
                (if (eq? obj comment-tag)
                    ; heh, comment.  Consume spaces and start again.
                    ; (Consuming horizontal spaces makes comments behave
                    ; as SPLIT when an item is after a comment on the
                    ; same line)
                      (accumulate-hspace port)
                      (sugar-start-expr port))
                    ; aaaaargh not a comment.  Use rotated version
                    ; of readblock-clean.
                    (let* ((sub-read (readblock-clean-rotated "" port pos obj))
                           (block (cdr sub-read)))
                        ((eq? block '.)
                          (attach-sourceinfo pos '()))
                          (attach-sourceinfo pos block)))))))
            ((char-line-ending? c)
              (consume-end-of-line port)
              (sugar-start-expr port)) ; Consume and again
            ((> (string-length indentation) 0) ; initial indentation disables
              ; ignore indented comments
              (let ((rv (neoteric-read-real-comment-tag port)))
                (if (eq? rv comment-tag)
                    ; indented comment.  restart.
                    (sugar-start-expr port)
              (let* ((read (readblock-clean "" port))
                     (level (car read))
                     (block (cdr read)))
                 ((eq? block '.)
                    (attach-sourceinfo pos '()))
                    (attach-sourceinfo pos block))))))))))

; -----------------------------------------------------------------------------
; Comparison procedures
; -----------------------------------------------------------------------------

  (define compare-read-file '()) ; TODO

; -----------------------------------------------------------------------------
; Exported Interface
; -----------------------------------------------------------------------------

  (define curly-infix-read (make-read curly-infix-read-nocomment))
  (define neoteric-read (make-read neoteric-read-nocomment))
  (define sweet-read (make-read sugar-start-expr))


; vim: set expandtab shiftwidth=2 :


The readable project website has more information: http://readable.sourceforge.net


??? TODO

We thank all the participants on the “readable-discuss” and “SRFI-105” mailing lists, including John Cowan, Shiro Kawai, Per Bothner, Mark H. Weaver, and many others whose names should be here but aren’t.


Copyright (C) 2012 David A. Wheeler and Alan Manuel K. Gloria. All Rights Reserved.

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use, copy,
modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.