From: Petra M. <pe...@cs...> - 2004-02-17 02:40:19
|
Hi Tim, Timothy Miller wrote: > Yes, Mark mentioned your unicode scanner when he rang the other day, > so I checked it out of CVS and connected to the parser. It seemed to > work ok, but the toolkit libraries are all in latex, so I couldn't use > operators etc yet. Ohhh, you are already done? You are very quick! :-) Did you check it in somewhere in CVS so that we can continue improving it together? I also added a latex scanner, it is in parser/src/net/sourceforge/czt/scanner/LatexLexer.java. It first transforms latex into unicode and then scans the unicode. >> - ZSECTION: > > ZSECTION is the latex tag "\begin{zsection}" and SECTION is just the > word "section". Because unicode doesn't have a ZSECTION character, I > think the best solution would be to remove the ZSECTION token from the > parser and have the latex scanner ignore it. This won't cause any > problems with an lalr parser. The standard tells that \begin{zsection} should be converted to ZED. Is your parser happy with getting a ZED token there? >> - GENSCH: > > Yes, I noticed this too. The token should be GENSCH, but he problem is > that a schema definition in latex is: > \begin{schema}{Name} > > and a generic schema definition is > \begin{schema}{Name}[Params] > > So we need a two-token lookahead to tell whether it has parameters. I > was avoiding this in the parser by combinining them: > SCH boxName:bn optFormalParameters:ofp schemaText:st END > > So the parameters are optional. The easiest solution for now is to add > a GENSCH rule as well, but a longer term solution would be to > implement the lookahead. I see. My latex to unicode converter has the same problem (and I haven't solved it yet). Lets follow your suggestions. Could you add the GENSCH rule? Next I am going to change the unicode scanner to return the differnt stroke characters ... lets see how we get on with this :-) Regards, Petra |