Menu

#208 Wrong regex pattern for get_language_fullname()

open-later
BenBE
5
2012-12-25
2012-04-06
murkymark
No

Version: GeSHi 1.0.8.10, PHP 5.3.10
In function "get_language_fullname()":

The LANG_NAME matching regex pattern is incorrect, which somehow is related to the backslashes:
(To avoid confusion I encapsulate strings below with <>)

Current <'/\'LANG_NAME\'\s*=>\s*\'((?:[^\']|\\\')+)\'/'> wrongly captures a big string starting wit the correct character to the very last <'> in the language file

Using <'/\'LANG_NAME\'\s*=>\s*\'(.+)\\\'/'> instead seems to work perfectly, it captures <\'> and <\\&gt; correctly (returned as <'> and <\&gt; by the function).
Example: <'YAML\\\'xasd\''> from the language file is stored as <YAML\'xasd'> in the array returned by the function using my suggested pattern.

Discussion

  • murkymark

    murkymark - 2012-04-06

    To match a backslash character in $data you need 4 consecutives in the preg_match pattern otherwise there can be side effects depending on the following pattern characters:
    preg_match('/\\\\/', $data, $matches) #matches a single '\'
    So "([^\']|\\\')" actually matches "not apostrophe or apostrophe" which is the same as "(.)"

    For GeSHi also the following returns correct results;
    preg_match("/'LANG_NAME'\s*=>\s*'(.+|(?:\\')+)'/", $data, $matches);

     
  • BenBE

    BenBE - 2012-08-15
    • assigned_to: nobody --> benbe
    • milestone: --> Next_Release_(Stable)
    • labels: --> General Bugs
    • status: open --> open-later
     
  • BenBE

    BenBE - 2012-08-15

    Please note that the removal of escape characters is done just at the time it is returned and thus no removal of the escaping is required. Using your example it' more or less undefined what the language name is if there's another single quote on the same line. But given your example I just saw what might be missing in my pattern. Will investigate after the upcoming release.

    Also note the stripcslashes function (http://php.net/stripcslashes) which is used for cleanup of the results.

     

Log in to post a comment.