[CEDET-devel] Re: [cedet-semantic] Newbie adventures
Brought to you by:
zappo
From: David P. <dav...@wa...> - 2003-12-03 10:41:48
|
Eric, (I moved this thread to cedet-devel which seems more appropriate ;-) [...] >>2. The LALR parser is entered, it calls wisent-lex each time it needs >> a lexical token. > > > I would not be opposed to make this type of functionality available > from the core lex support code. The fact that the lexical step > analyzes the entire stream at once is a mechanation on the core > analyzer that creates one token at a time. I think the current design is a good for speed. Entering semantic-lex to obtain each lexical element will sensibly slow down lexical analysis. This is less critical for wisent-lex whose code is very simple and fast: (define-wisent-lexer wisent-lex "Return the next available lexical token in Wisent's form. The variable `wisent-lex-istream' contains the list of lexical tokens produced by `semantic-lex'. Pop the next token available and convert it to a form suitable for the Wisent's parser." (let* ((tk (car wisent-lex-istream))) ;; Eat input stream (setq wisent-lex-istream (cdr wisent-lex-istream)) (cons (semantic-lex-token-class tk) (cons (semantic-lex-token-text tk) (semantic-lex-token-bounds tk))))) >>3. Each time wisent-lex is called, it pops a semantic lexical token >> from the stream obtained in step 1 above. It translates it in a >> form understandable by wisent, and returns that form. More >> precisely: >> >> semantic-lex form wisent-lex form >> ------------------------- ------------------------------------- >> (TOKEN-CLASS START . END) -> (TOKEN-CLASS TOKEN-VALUE START . END) >> > > [ ... ] > > Is TOKEN-VALUE different (as in a string or a number) for different > values of TOKEN-CLASS? Should each analyzer be responsible for also > providing a value? TOKEN-VALUE is different for each token. The analyzer is just responsible for providing the TOKEN-CLASS and the bounds of the TOKEN-VALUE. > It could be useful to also have the default output of semantic-lex > match what you are using in wisent. All features of the token > (class, start, end and value) already have accessor functions so it > should have no effect on token stream consumers. > > The reason I did not put a TOKEN-VALUE into the original lexical token > (semantic v 0.1) was because some lexical entities, such as comments, > strings, and lists would have very large values of very small worth > (meaning they were seldom queried.) Avoiding that made it faster (on > a 486 50Mhz.) You may find that solving that type of problem (if it > exists in wisent) could make wisent's lexical step a bit speedier. The main difference between semantic-lex and a wisent lexer is that the former is buffer oriented whereas the latter is completely independent of the lexical source. For example wisent can be used to parse a string (this is what wisent-expr does for example). This is why token bounds are optionals in a wisent token, whereas token values are mandatory. Also, token values are pushed into and retrieved from the parser stack in order to be passed to semantic actions as $n values. IMO having token values as (start . end) elements will make wisent dependent on buffer as input stream, and will probably increase the complexity of semantic actions (and slow down parsing?) that should have to extract token values from buffer's bounds. __ For example, the LL parser handles itself the extraction of token values (in `semantic-bovinate-stream'), before calling semantic actions. IMO it is a better design to only have the lexical analyzer depend on the nature of the input source. __ Finally, the fact that the wisent lexer is only called when the parser need a new token, guaranties that token values will be obtained only once, when necessary. In conclusion, I am not convinced that the current design of semantic-lex should be changed. That will not simplify nor probably speed up LALR parsing. On the contrary I would change the design of the LL parser to be more closer to wisent's one. The goal would be to have a common layer for buffer oriented lexical analysis (semantic-lex). Then an "on demand" lexer layer (a la wisent-lex) that would handle token conversion from a buffer representation into any form suitable for the target parser. IMO, that would'nt have a big impact on performance, and it would be possible to use the bovinator to parse other kind of input sources. David |