Hey,
I've been looking through the source, and have been analyzing some regexes I saw. The pat_main regex in the lexical analyzer is pretty inefficient, so I decided to let you know. :)
The current part is (doing near 170 loops before succeeding):
. "(?: [^\\n\\r{$b}{$e}] | \\\" [^\\\"\\n\\r]* \\\" | \\' [^\\'\\n\\r]* \\' )*"
This should become (doing it in 3 to 20 steps):
. "(?: [^\\n\\r\"'{$b}{$e}]+ | \" [^\"\\n\\r]* \" | ' [^'\\n\\r]* \' )*"
Also, when you'd use single quotes, it would be much more legible, because you wouldn't have to use that many backslashes. ;)
. '(?: [^\n\r"\'' . $b . $e . ']+ | " [^"\n\r]* " | \' [^\'\n\r]* \' )*'
But that's personal.
I you'd like to let me know if you can do something with this info, you can contact me at ri.van.velzen@st.hanze.nl
Richard van Velzen
Richard, thank you for your comment. I haven't used the new tracker system much, and I didn't even realize your post existed until today, nearly a year later.
When I was first designing the main lexing regex, I was primarily concerned about correctness, so it wouldn't entirely surprise me if it's presently over-specified.
(It's still far more efficient, though, than most BBCode-lexing algorithms you'll see in PHP, since it pushes all of the hardest lexing work into C code. In my tests, even that apparently-inefficient regex was outperforming other BBCode-parsing solutions by a mile, which is actually somewhat depressing.)
Anyway, if you have a more efficient version, I'd be happy to take a look at it, and if it's reasonably sane and maintainable, I would have no qualms with updating the code to use it.