[sicsh-develop] Re: Additional evaluation layer
Status: Alpha
Brought to you by:
gvv
From: Gary V. V. <ga...@or...> - 2000-04-08 10:28:01
|
On Thu, Apr 06, 2000 at 12:54:02PM -0400, Ezra Peisach wrote: > > >I have been thinking (dangerous, I know) and it occurs to me that the > >syntax mechanism could be extended to recognise tokens and mark them > >in the Tokens* generated by tokenize() -- either in addition to, or > >probably as a replacement for the string that is generated currently. > > >A third type of module would then be possible to implement higher > >level semantic extensions. For example a loadable module could > >install a syntax handler that recognises "if", "then", "else" and > >"fi", and registers (enumerated) tokens `SYNTAX_IF', `SYNTAX_THEN', > >`SYNTAX_ELSE' and `SYNTAX_FI'. The tokenize() function would be > >changed to recognise these tokens while it is building the Tokens* and > >insert the enumerated values rather than the raw strings. > > I like the idea. One of the concerns I had in writing the equals > handler is understanding the context that the equals sign is to be > used. You have the right solution though: `=' in leading expressions are handled syntactically. Other uses for equals would be determined by the builtin that handles the rest of the Tokens*. > Clearly "if a=1 then" is very different from "a=1" vs "typo a=1". The > first a conditional, the second an assignment, the third a syntax error. These all work with the current system: - if a=1 then ... `-> if a=1 then ... the `if' builtin would then extract the tokens between `if' and `then' and pass it to an expression evaluator. - a=1 `-> syntax handler sets `a' and consumes the string - typo a=1 `-> eval() will complain that `typo' doesn't exist, or else the `typo' builtin will diagnose the error. > It would be nice if the recognition could happen durring the > syntax_handler phase - before generation of Tokens *. Perhaps a syntax > handler can indicate that it is okay to create tokens for everything > to where I am. This would allow differentiation of the above test cases. I can't think of any situation where the additional complexity of doing this way would win anything. If the syntax handler (for `=' say) can't/doesn't want to handle an instance, it should just copy it to the output buffer and punt to whichever builtin handler gets the argv from Tokens*. The clear advantage of having a new pass over the Tokens* is to simplify reserved word recognition in the builtins. I now think that the enumerated token should be in addition to the string token, which gives the additional advantage of being able to tag classes of syntax (e.g. you could register a SYNTAX_NUMBER tokeniser which would tag sequences of digits). By registering these in the module that has builtins that use them it would be much easier to write an `if' builtin -- it would initially scan the for balanced SYNTAX_THEN and SYNTAX_FI rather than comparing every element with "then" and "fi". > >Finally, the new type of module entry point (in addition to syntax and > >builtin entry points) would determine allowable enumerated token > >orderings. In this case, it would check that the above enumerated > >tokens were well formed and correctly nested -- generating a SIC_ERROR > >for semantic errors, SIC_INCOMPLETE if more lookahead is required, or > >SIC_OKAY if the evaluator should search for a suitable builtin handler. > I like the idea, but I am not sure how you plan to describe orderings. > You need to be able to describe optional orderings, vs required ones. > > IF ... THEN ... ENDIF > IF ....THEN ... ELSE ... ENDIF > > are straight forward with ELSE being optional - but would you ever > need syntax that is more complex. (we can say no and be done with it) > > A complex example might be: > a ... b ... c > a ... c ... b ... d > > Where d or c comes after b depending on if b comes after c or a. (a > bit contrived - but makes a description more difficult" The `a' builtin would check for whatever combinations it wants to allow. No problem. > >The builtin initialiser would also need extending to allow builtins to > >be registered against tokens rather than just single characters. > Logically... > > > I think it is a good idea that deserves merit. The big win can be for > the syntax handlers knowing the context. It also allows for real > structure. Yup. > Long term, we eventuall need a way to disable builtin syntax handlers > that have side affects. > > Consider: > if a=1 then a=2 else b=3 But the `=' syntax handler only sets variables which occur at the left of a command. Doesn't it? It will be the responsibility of the `if' (or presumably SYNTAX_IF) handler to evaluate only the suitable portions. The expression between `if' and `then' (i.e. the argv elements between these two) will be passed to (a new) expression evaluator which return true or false, and will then recursively eval() *either* the commands (i.e. argv elements) between `then' and the balanced `else', or balanced `else' and `fi'. Balanced is the key to doing this properly, hence the advantage of being able to scan through the argv looking for unique tokens rather than zillions of calls to strcmp (or a hideous big switch). > currently in parsing the line a and b will be assigned. We need a way > to say - recognize, but don't execute. The same with $(eval). We already have it. Just don't rescan the string with tokenize, and the syntax handlers aren't run again. > Ezra The bit I haven't thought right through yet, is how to build a Tokens* with one pass over the input buffer. The current method will redo all of the syntax handling for the entire command each time it is rejected as incomplete, just adding a new line to the input buffer until it finally gets through. By adding the new pass, we can recognise a big class of incomplete commands before passing things into the builtin handler (once the builtin handler has it and rejects it, we have no choice but to add another line ad try again). Instead of repeatedly adding another line and letting the if builtin decide whether the command is complete yet, we could register a token scanner that knows that a command must have an equal number of SYNTAX_IF, SYNTAX_THEN and SYNTAX_FI tokens. By the time the if builtin sees the argv, it is already balanced! I think we must solve the former before the latter is worthwhile though. Cheers, Gary. -- ___ _ ___ __ _ mailto:ga...@or... / __|__ _ _ ___ _| | / / | / /_ _ _ _ __ _| |_ __ _ ___ ga...@gn... | (_ / _` | '_|// / |/ /| |/ / _` | || / _` | ' \/ _` | _ \ \___\__,_|_|\_, /|___(_)___/\__,_|\_,_\__, |_||_\__,_|//_/ home page: /___/ /___/ gpg public key: http://www.oranda.demon.co.uk http://www.oranda.demon.co.uk/key.asc |