Re: [Doxygen-develop] Adding of new (all) HTML entities? A oneline solution?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

>  -1 In the regex, there is [a-zA-Z0-9]{$} where $ is 1,2,3 or 4.
>       I think this means repeat $ times an alphanumeric value.
>       (I'm using Regular Expression syntax of Python 1.5, I hope
>       that's the sames rules Doxygen uses).
>
It is the regexp rules of flex you are using. Read the man-page for flex
for the full documentation of the regexp syntax available. Here is a short
snippet

PATTERNS
     The patterns in the input are written using an extended  set
     of regular expressions.  These are:

         x          match the character 'x'
         .          any character (byte) except newline
         [xyz]      a "character class"; in this case, the pattern
                      matches either an 'x', a 'y', or a 'z'
         [abj-oZ]   a "character class" with a range in it; matches
                      an 'a', a 'b', any letter from 'j' through 'o',
                      or a 'Z'
         [^A-Z]     a "negated character class", i.e., any character
                      but those in the class.  In this case, any
                      character EXCEPT an uppercase letter.
         [^A-Z\n]   any character EXCEPT an uppercase letter or
                      a newline
         r*         zero or more r's, where r is any regular expression
         r+         one or more r's
         r?         zero or one r's (that is, "an optional r")
         r{2,5}     anywhere from two to five r's
         r{2,}      two or more r's
         r{4}       exactly 4 r's
         {name}     the expansion of the "name" definition
                    (see above)
         "[xyz]\"foo"
                    the literal string: [xyz]"foo
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                      then the ANSI-C interpretation of \x.
                      Otherwise, a literal 'X' (used to escape
                      operators such as '*')
         \0         a NUL character (ASCII code 0)
         \123       the character with octal value 123
         \x2a       the character with hexadecimal value 2a
         (r)        match an r; parentheses are used to override
                      precedence (see below)

(this is only a part of the regular expressions available)

But I think that you can use the {$} syntax if I understand the above text
correctly. :)

> By longest matching, you certainly means the more restrictive, the more
> precise. So I think this will work.I will provide changes soon.
> 
Yes. :)

> Sorry for newbie questions. I just wanted to help a little (for my needs
> first), I'm simply maintainer of the French location of Doxygen. :-)
> [...]
> 
Aren't we all newbies? ;)

If you add the '-r' switch to flex when it generates the C-source code it
will insert debug printouts, giving verbose information about what happens
when it parses the text.

Also, the default behaviour of flex is to print everything it can't match
to stdout (this is the default rule which is always active). However, the
flex code in doc.l (and scanner.l) has defined a rule which overrides the
default rule which silently ignores text it can't match. If you modify the
"match-all" rule so it looks like this

<*>.                                    { ECHO; }

you will "restore" the default behaviour (I think).

/John