On Fri, Jun 04, 2004 at 12:56:03AM +0000, Mikhael Goikhman wrote:
> > That would certainly be cool. Perhaps Ruby with its continuations would do
> > the job nicely, and it links very easily into C. And it's a hell of a lot
> > easier to write than LISP :-)
> Oh, please, no mandatory Ruby or Lisp, they are slow! :-)
> Not that I'm against these or Perl, I just worry about speed and memory.
Well, I wouldn't make it compulsory to use them. If you are processing a
file which can be handled adequately by the built-in deterministic state
machine, I'd use that. If you're doing something more difficult then I'd
want to be able to link either to a more advanced C parser, or to something
in a scripting language.
It's a common misconception that these languages are necessarily 'slow'.
Typically you have a high startup overhead when loading the interpreter;
which means it's certainly slow to write a standard shoot-once CGI for
example. But once loaded, they can zip along nicely. With Ruby you can call
the interpreter from C (getting it to run a particular function in an
object, for example) and it will return happily, maintaining its internal
state for the next call, so you're not tearing down and building an
interpreter each time.
Emacs users do lots of clever things in LISP, and don't complain about lack
of speed. And that was long before you could get a 2GHz processor for $50.
Actually, I can see two separate jobs which an external callout could do:
(1) Syntax marking/tokenising [not just colouring]. As the content of the
edit buffer changes, bits of reparsing are done as necessary for the
purposes of display, but also in case the editor functions need to know what
sort of token we are currently in or next to. Each character would know
which token it belongs to, a token type, and possibly some ancilliary data
(e.g. nesting depth)
There is clearly optimisation needed here, so that each keystroke doesn't
cause the entire edit buffer to be reparsed - or even the whole visible
(2) A hook to intercept keystrokes and/or edit events. These keystrokes
could take actions based on the current context, i.e. the results of the
syntax marking phase are made available to it.
So for example, let's say we want an intelligent XML editor. We use the
syntax marking to be able to decide where is a start tag, an end tag, or an
empty tag, and note the nesting depth (so given a start tag we can locate
the corresponding end tag, or vice versa)
The keystroke hook would then be used to intercept keypresses '<' and '>'
and implement rules such as:
- if you type '<', and you are not currently with a tag, then a new tag is
created using a pop-up line:
Tag: <tag attr="val">
which inserts <tag attr="val"></tag> into the buffer and leaves the cursor
between the two. Tab-completion will show you which tags and attributes are
valid according to the DTD for this document.
- if you type '<' and are within an end tag, then skip to the matching start
- if you type '<' and are within any other tag, then skip to the previous
- if you type '>' and are within a start tag, then skip to the matching end
- if you type '>' anywhere else then skip to the next tag
Now, actually this would have very little impact on performance. The
keystroke hook just has to say
puts "do some stuff"
puts "do some other stuff"
and the overhead of calling this function and it testing a case statement is
actually very low.
The re-parsing is the clever part. I think that whenever you type and you
are within a token (or adjacent to it), you need to reparse from the start
of that token. If the parser state at the end of the token is the same as it
was before, then you can stop there. If not, you continue parsing until you
get to a point where the parser state matches what was there originally, or
until you hit the end of the screen.
If this optimisation is not done carefully then it certainly *could* be
slow. But even then, it would still be possible to write an XML parser in C,
and have the keystroke hook done in Ruby (say).
But maybe this is all just wishful thinking. One of the things I like so
much about joe is that it is very compact (372K for the entire source
tarball!) and does its job, editing text, with high speed and robustness.
All this extra icing perhaps belongs in a completely separate optional
package which can be added at build-time or linked dynamically at run-time.
Or perhaps you turn the whole thing inside-out and make the core editor
functions a library (libjoe); you can then call them *from* a scripting