Thank you Zach and Robert!

On Wed, Apr 25, 2012 at 8:00 AM, Robert Plummer <robertleeplummerjr@gmail.com> wrote:
> Hey Guys,
> I've been working close with Zach Carter, the creator of Jison, to come up
> with the syntax that is needed to process wiki syntax....
> First the problem....
> Parsers can be rigid, BUT wiki syntax is not, case in point is the
> following:
>
> {CODE()}
> __bold
> {CODE}
>
> Oh no! I didn't close the bold text!!!!  Now the parser dies!!!
>
> Another example:
> {CODE()}
> {PLUGIN()}
> {CODE}
>
> Oh no! I didn't close the plugin!!!!  Now the parser dies!!!
>
> This has been an issue for sometime, I didn't know how to solve it.  parsers
> always look for rigid rules and I've been coding in my spare time to find
> the answer.... for a year.  I almost gave up 2 days ago and I was
> considering telling everyone that the new Jison parser (despite months of
> hard work) was not the way to go.  Well, Zach Carter is a really cool guy,
> and he knows his parsers.  I consider myself a pretty well seasoned
> developer, I can debug understand, fix many things... But next to Zach
> Carter,  I am but a young Padawan to a Jedi Master (of code).  Anyway, Zach
> told me strait up "The parser isn't magic, it can't do everything", but then
> he showed me how to properly "fake" a closure using the lexer's unput method
> (which sort of sticks code into the parser in real time in the lexer, before
> the parser takes over.
>
> The lexer code analyses what code you have an basically tracks state or
> names the entities within the code, for example:
>
> {CODE()}
> <bold>[_][_]
>     %{
>         lexer.popState(); //js
>         return 'BOLD_END'; //js
>
>         //php $this->popState();
>         //php return 'BOLD_END';
>     %}
> [_][_]
>     %{
>         lexer.begin('bold'); //js
>         return 'BOLD_START'; //js
>
>         //php $this->begin('bold');
>         //php return 'BOLD_START';
>     %}
> {CODE}
>
> Here we look for the chars "__" with a regex statement, and we tell the
> lexer to activate a state "BOLD_START", and when we see another occurance of
> "__" we tell the lexer that it is a closure, even though it is the same
> characters, which is  "BOLD_END".  The problem here is that sometimes wiki
> syntax isn't closed, and it reaches the end of the file (or EOF) before the
> closure, which ends very badly for the parser and the computer that it is
> running on.  By the way, if it was successful, it sends this to the parser
> (which can be seen further down in the Wiki.jison file in
> lib/code/JisonPraser/Wiki.jison) and that looks like this:
>
> {CODE()}
>  | BOLD_START BOLD_END
>  | BOLD_START contents BOLD_END
>     {
>         $$ = parser.bold($2); //js
>         //php $$ = $this->bold($2);
>     }
> {CODE}
>
> Here we tell the parser that if we find a pattern of start and end directly
> after each other to keep on parsing or start + content + end, to process the
> content and return it.  So how do we "fix" the human counterpart of this
> wiki syntax (which is where the "bug" is)?
>
> :)
>
> {CODE()}
> <bold><<EOF>>
>     %{
>         lexer.unput('__'); //js
>
>         //php $this->unput('__');
>     %}
> <bold>[_][_]
>     %{
>         lexer.popState(); //js
>         return 'BOLD_END'; //js
>
>         //php $this->popState();
>         //php return 'BOLD_END';
>     %}
> [_][_]
>     %{
>         lexer.begin('bold'); //js
>         return 'BOLD_START'; //js
>
>         //php $this->begin('bold');
>         //php return 'BOLD_START';
>     %}
> {CODE}
>
> As you can see "unput" is where the magic is.  In realtime we stash bold
> closure characters at the end of the lexical string, and then it is lexed
> once more to the statement just after it, or in our case if you input
> "__text", it is seen as "__text__" to the parser, and the parser has no idea
> that we "fixed" our human buggy nature in real time.  What does this mean
> for the new wiki parser?  A few things, first that we can finally stop
> tinkering, and start really building it.  Second, not only is wiki syntax
> fun, but now when you understand what is going on in the parser (which seems
> to effortlessly just work) it brings a smile to your face.  Third, this is
> ground breaking stuff (written on a parser that is rooted in the 1970's) and
> it will help others to see how they can advance interesting syntax.
>
> I know this is probably a little out in left field, but I had to take a
> break from the parser, I was on the road a lot this week, and couldn't focus
> on the thing.  But in short, the new jison parser's biggest bug, is now
> smashed!  I will be committing this code shortly, I have to fix a couple
> things and it'll be ready.  Another development with the parser is that I'm
> going to be adding preParser and postParser functions in trunk the reason
> that I'm doing that is because I've removed ~np~ parsing completely from the
> parser (it was just too differen't from wiki syntax), and that will be
> handled outside the parse function. Just like functions in the old parser,
> they will be stripped out of wiki syntax while the parser is running and be
> put back in when it finishes.
>
> To Zach Carter, you say the parser isn't magic?  I just proved it is. :)
> --
> Robert Plummer
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> TikiWiki-devel mailing list
> TikiWiki-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>

--
Marc Laporte

http://MarcLaporte.com
http://Tiki.org/MarcLaporte
http://AvanTech.net