C# Building a xpath based parser using SMC

Translates state machine into a target programming language.

Brought to you by: cwrapp

C# Building a xpath based parser using SMC

Forum: Help

Creator: Bob Watson

Created: 2009-10-17

Updated: 2013-05-28

Bob Watson - 2009-10-17

Hi,

I'm a complete newbie with SCM and was wondering of the following was possible.

Is it possible to build a SMC state machine that will parse custom Xpath. I'm reading an incoming xml packet which has been deserialised into an C# xsd genetated classes from the schema. I plan to use a xpath like query language, where I can read the xml and append/prepend onto a new composite xml message. Both the incoming xml packet and the composite xml share the same schema, and that is the IDMEF schema.

So say the user says, append /Alert/Source/Node/Address to the new XML, I can parse down that xpath, and just read each specific element/attribute etc out the incoming xml.

So what I did is I created a table driven FSM, in excel to prototype it,
which has all the states, transitions etc for reading Source, Node etc, and it comes to 50+ states, which is too big for the State pattern, and certainly too big having a switch statements and 50+ methods.

So can this sort of thing in SCM. I know the state, I know the transition,
i.e. going from Source->ReadSource->Node etc. How would that work
in SCM.

Bob.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charles Rapp - 2009-10-17

Certainly SMC can be used to parse. SMC is used to lex (break the input into token) and parse .sm files. Further I have written my own XML parser using SMC. If you want to see how SMC works as a lexer and parser then download the SMC source distribution and go to the net/sf/smc/parser directory. Look at SmcLexer.sm and SmcParser.sm. The XML parser code is not currently available.

Parser code architecture is: the parser retrieves the next token from the lexer and then passes the FSM using the appropriate transition for that token. The lexer reads in the target text one character at a time and passes that character to its FSM using the appropriate transition for that character. When the lexer FSM detects a known token (keyword, syntactical feature, string, number, etc.), that token is returned to the parser. This continues until the entire input is parsed or an unrecoverable error is encountered.

I hope this helps. BTW, 50+ states? Sounds rather large to me. This might result if you merge lexing and parsing functions into a single entity. Your FSM might simplify if you separate the two functions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.