Thread: [Doxygen-develop] Adding of new (all) HTML entities?

Brought to you by: dimitri

doxygen-develop

[Doxygen-develop] Adding of new (all) HTML entities?

From: Xavier O. <xav...@an...> - 2001-07-19 10:37:44

Hi,

Proposition:
I would like to add new entities of HTML. I am currently interested in
Greek letters but all what is defined in HTML 4.0 would be interesting.
There are 3 files containing all these entities:
 HTMLlat1.ent.htm (Doxygen already have some of them)
 HTMLspecial.ent.htm  (Doxygen already have some of them)
 HTMLsymbol.ent.htm
There are available at:
http://www.w3.org/TR/1998/REC-html40-19980424/struct/global.html#idx-entity_sets.

Does someone is working on that?

I don't know exactly what is to be changed.

I have made a search for sharpS. One entity already implemented
and I have found:
Searching for 'SharpS'...
C:\doxygen-1.2.8.1\src\doc.cpp(17976):{ outDoc->writeSharpS();
}
C:\doxygen-1.2.8.1\src\htmlgen.h(177):    void writeSharpS()        { t
<< "&szlig;"; }
C:\doxygen-1.2.8.1\src\latexgen.h(182):    void writeSharpS()       { t
<< "\"s"; }
C:\doxygen-1.2.8.1\src\mangen.h(166):    void writeSharpS()        { t
<< "s\\*:";     /* just a wild guess,
C:\doxygen-1.2.8.1\src\outputgen.h(207):    virtual void writeSharpS() =
0;
C:\doxygen-1.2.8.1\src\outputlist.h(301):    void writeSharpS()
C:\doxygen-1.2.8.1\src\outputlist.h(302):    {
forall(&OutputGenerator::writeSharpS); }
C:\doxygen-1.2.8.1\src\rtfgen.h(164):    void writeSharpS()       { t <<
"\337"; }
8 occurrence(s) have been found.

As far as I understand, what is to be done for each new entity
is the following:

-1 Add a new pure virtual method named writeEntity() in
    class BaseOutputDocInterface (outputgen.h)
-2 Add a new method
    void writeEntity()
    { forall(&OutputGenerator::writeEntity); }
   in class OutputList (outputlist.h)
-3 Add non pure virtual method in generators:
    void writeEntity()        {}
   in classes
    class ManGenerator (mangen.h)
    class LatexGenerator (latexgen.h)
    class HtmlGenerator (htmlgen.h)

Remarks:
I do not understand what doc.cpp does really. I suppose it has also
to be changed but I don't know how.
Besides, I know what to do with HTML and LaTeX (maybe not for
all symbols, I have to check in my documentation) but I don't know
what to do with man.

Questions:
Is the process (with the 3 steps) correct? Is something missing?
As my free time is so precious (as for everybody), is it possible
tha I only change the files and send them back for merging,
compiling and so on?

Greetings,

Xavier.
--
 Artificial Anthill Project
  http://www.aanthill.org/
  mailto:aan...@aa...

 D2SET Non Profit Association
  http://www.d2set.org/
  mailto:d2...@d2...

Re: [Doxygen-develop] Adding of new (all) HTML entities?

From: Dimitri v. H. <di...@st...> - 2001-07-23 09:23:33

On Thu, Jul 19, 2001 at 12:37:10PM +0200, Xavier Outhier wrote:
> Hi,
> 
> 
> As far as I understand, what is to be done for each new entity
> is the following:
> 
> -1 Add a new pure virtual method named writeEntity() in
>     class BaseOutputDocInterface (outputgen.h)
> -2 Add a new method
>     void writeEntity()
>     { forall(&OutputGenerator::writeEntity); }
>    in class OutputList (outputlist.h)
> -3 Add non pure virtual method in generators:
>     void writeEntity()        {}
>    in classes
>     class ManGenerator (mangen.h)
>     class LatexGenerator (latexgen.h)
>     class HtmlGenerator (htmlgen.h)
> 
> Remarks:
> I do not understand what doc.cpp does really. I suppose it has also
> to be changed but I don't know how.

doc.cpp is a generated file. Look at the flex file doc.l: You'll see lines like
these:

<DocScan,Text>"&"[cC]"cedil;"           { outDoc->writeCCedil(yytext[1]); }

these consist of a set of states (between <>), a regular expression, 
and some code.

The code is executed when the input matches the regular expression. yytext
is a char* containing the actual text that was matched. So the
above matches &ccedil; and &Ccedil;

> Besides, I know what to do with HTML and LaTeX (maybe not for
> all symbols, I have to check in my documentation) but I don't know
> what to do with man.

This has always been a mistery to me too.


> Questions:
> Is the process (with the 3 steps) correct? Is something missing?

Seems ok to me.

> As my free time is so precious (as for everybody), is it possible
> tha I only change the files and send them back for merging,
> compiling and so on?

Yes, that is fine. You can choose to do only those entities that you
need as well.

Regards,
  Dimitri

Re: [Doxygen-develop] Adding of new (all) HTML entities? A one line solution?

From: Xavier O. <xav...@an...> - 2001-07-23 13:37:04

Dimitri van Heesch wrote:

> On Thu, Jul 19, 2001 at 12:37:10PM +0200, Xavier Outhier wrote:
> > Hi,
> >
> >
> > As far as I understand, what is to be done for each new entity
> > is the following:
> >
> > -1 Add a new pure virtual method named writeEntity() in
> >     class BaseOutputDocInterface (outputgen.h)
> > -2 Add a new method
> >     void writeEntity()
> >     { forall(&OutputGenerator::writeEntity); }
> >    in class OutputList (outputlist.h)
> > -3 Add non pure virtual method in generators:
> >     void writeEntity()        {}
> >    in classes
> >     class ManGenerator (mangen.h)
> >     class LatexGenerator (latexgen.h)
> >     class HtmlGenerator (htmlgen.h)
> >
> > Remarks:
> > I do not understand what doc.cpp does really. I suppose it has also
> > to be changed but I don't know how.
>
> doc.cpp is a generated file. Look at the flex file doc.l: You'll see lines like
> these:
>
> <DocScan,Text>"&"[cC]"cedil;"           { outDoc->writeCCedil(yytext[1]); }
>
> these consist of a set of states (between <>), a regular expression,
> and some code.
>
> The code is executed when the input matches the regular expression. yytext
> is a char* containing the actual text that was matched. So the
> above matches &ccedil; and &Ccedil;
>
> > Besides, I know what to do with HTML and LaTeX (maybe not for
> > all symbols, I have to check in my documentation) but I don't know
> > what to do with man.
>
> This has always been a mistery to me too.

The mystery will be thicker now.

> > Questions:
> > Is the process (with the 3 steps) correct? Is something missing?
>
> Seems ok to me.
>
> > As my free time is so precious (as for everybody), is it possible
> > tha I only change the files and send them back for merging,
> > compiling and so on?
>
> Yes, that is fine. You can choose to do only those entities that you
> need as well.

[...]
Until discussion with Petr is not over, I will do it.

I wonder if the work, allowing all entities to be transparent
for Doxygen, could not be done in a straight forward ...
at least for HTML output.

I propose that:
 -1 Add a new method in BaseOutputDocInterface
virtual void writeEntity(const char* const text) = 0;
-2 Add a new method
    void writeEntity()
    { forall(&OutputGenerator::Entity); }
   in class OutputList (outputlist.h)
-3 Add non pure virtual method in generators:
    void writeSpecCharEntity()        {}
   in classes
    class ManGenerator (mangen.h)
    class LatexGenerator (latexgen.h)
    class HtmlGenerator (htmlgen.h)

The straight forwardness is for HTML only (later
intermediate XML):
void HtmlGenerator::writeEntity(const char* const text)
{ t << "&" << text << ";" ;}

doc.l would be modify and this line added:
<DocScan,Text>"&"[a-zA-Z0-9]{1}";"   { outDoc->writeEntity(yytext[1]); }
<DocScan,Text>"&"[a-zA-Z0-9]{2}";"   { outDoc->writeEntity(yytext[2]); }
<DocScan,Text>"&"[a-zA-Z0-9]{3}";"   { outDoc->writeEntity(yytext[3]); }
<DocScan,Text>"&"[a-zA-Z0-9]{4}";"   { outDoc->writeEntity(yytext[4]); }

I have several questions...:

 -Q1: Are the following single line could replace the 4 lines?
<DocScan,Text>"&"[a-zA-Z0-9]{1,4}";"   { outDoc->writeEntity(yytext[4]); }
(I'm very lazy! :-)
-Q2:  There is a theorical problem: what is the order of scanning? to be
         backward compatible, the added regex should be check if nothing
         else has already matched.
-Q3: Could it be all what to do if later Doxygen use XML (DocBook or
        other)? Maybe, we would have to remove all the single entities already
        added?

... and some remarks:
 - R1: If matching order is configurable, then it would solved all HTML
          entities (and SGML, sorry I'm really not a specialist of this domain),
          wouldn't be?
 - R2: For LaTeX, RTF and man (but not XML), more work should be
          done. Is a big switch reasonable in this case? There would be
          255 cases if all entities defined in HTML 4.0 are used.
          But if the current discussion on XML leads to have a runnable version
          in less that 2 or 3 months, it maybe not useful to do that work.
          If it's for longer term (I think it will), then I could begin to do that
          at least for LaTeX.
 - R3: If the "one line" solution is not possible I will only do the changes
          for greek letters (HTML and LaTeX) that could also be useful for
          others.

Waiting the answers to all my questions I will work a little for me. :-)

Xavier.

Re: [Doxygen-develop] Adding of new (all) HTML entities? A one line solution?

From: John O. <jo...@ly...> - 2001-07-23 15:16:48

> doc.l would be modify and this line added:
> <DocScan,Text>"&"[a-zA-Z0-9]{1}";"   { outDoc->writeEntity(yytext[1]); }
> <DocScan,Text>"&"[a-zA-Z0-9]{2}";"   { outDoc->writeEntity(yytext[2]); }
> <DocScan,Text>"&"[a-zA-Z0-9]{3}";"   { outDoc->writeEntity(yytext[3]); }
> <DocScan,Text>"&"[a-zA-Z0-9]{4}";"   { outDoc->writeEntity(yytext[4]); }
> 
> I have several questions...:
> 
>  -Q1: Are the following single line could replace the 4 lines?
> <DocScan,Text>"&"[a-zA-Z0-9]{1,4}";"   { outDoc->writeEntity(yytext[4]); }
> (I'm very lazy! :-)
>
Why do you have yytext[1] etc. in the above example and only yytext[4] in
your suggested replacement?


> -Q2:  There is a theorical problem: what is the order of scanning? to be
>          backward compatible, the added regex should be check if nothing
>          else has already matched.
>
If you read the flex manual page everything will be explained. ;)

The parser generated by flex picks the longest matching rule it can find
of all active rules. One way for a rule to become active is to call for it
explicitly, i.e. BEGIN(DocScan) will activate all rules with DocScan in
them.


BTW. I have now successfully implemented my tags so that I can write

@run foo @filename bar gazonk @endrun

*and*

@file @run foo @filename bar gazonk @endrun

I had to modify (among other things) the function findFileDef found in
util.cpp so that it recognizes ClearCase version numbers. The command

@run cleartool ls @filename @endrun

would result in something like this

/view/r10_epkjols_garp/vobs/bscrp/r10_new_trh/foo.c@@/main/r10_predev/CHECKEDOUT
from /main/r10_predev/1                 Rule: CHECKEDOUT

beeing returned from cleartool (in the case when a file is checked out).
My changes makes the findFileDef function aware of the @-signes in the
string and when comparing against already found files it ignores
everything after the first @-sign (including the sign itself). I have also
added a new method to the FileDef class called setVersion() which stores a
QCString containing version number infomration about the file, i.e. in my
case everything after the @-signes including them as well.

Then in the writeDocumentation() method in FileDef I modified the title
etc. to use my getVersionName() method which returns the filename combined
with version infomration.

I have also modified the layout of the generated HTML-pages so the result
looks more like what Java2 JavaDoc outputs. The reason for doing so was
simply that when you have code containing several pages of #define
statements and (in the same file) several pages of functions it is very
difficult to see where the different sections start and end. IMHO the
Java2 way of solving this by using lightblue background for the headings
solves this problem nicely.

When doing these changes to the layout I found out that the pages
containing the lists of todo:s, bug:s and test:s uses a different to
generate the headings. They use startSection() instead of startTitle().

Is this a bug, and if not, what is the design reason for doing it that
way?

As a result of changing the layout, I came quickly to the conclusion that
the layout of the generated files should not be hardcoded. Instead I think
that a user should be able to configure this via external file(s). One
possibility would be to have the document layout of the different kinds of
pages defined using XML. That is (for example), instead of having
configuration options for which of the different dot-diagrams I want to 
have, I modify an XML file. Just to give you an idea of what I'am after
I'll give you an example of what such a file could look like


<sourcecode>
  <tagDefinitions>
    <define heading>
     <before data="<table border=1 cellpadding=3 cellspacing=0 width=100%>
                    <tr bgcolor=#CCCCFF>
                     <td colspan=1><font size=+2><b>"/>
     <after data="   </b></font></td>
                    </tr>
                   </table><br>"
    </define>
    <define include>
     <before data="<code>"/>
     <after data="</code><br>"/>
    </define>

    etc.
  </tagDefinitions>

  <layout>
    <tag name="heading">
    <tag name="include">
    
    etc.
  </layout>
</sourcecode>


If a user wants the include section repeated in the detailed section, you
do not have to modify the source code, just modify the layout section in
the XML file. The same goes if you want your headings in some other way
than the default provided etc. etc.

Ofcourse you have to have different layouts for the different targets
(HTML, LaTeX, man, RTF, ...), but you can never avoid that. The question
is if that information should be hardcoded or not...


/John

Re: [Doxygen-develop] Adding of new (all) HTML entities? A oneline solution?

From: Xavier O. <xav...@an...> - 2001-07-24 08:29:02

John Olsson wrote:

> > doc.l would be modify and this line added:
> > <DocScan,Text>"&"[a-zA-Z0-9]{1}";"   { outDoc->writeEntity(yytext[1]); }
> > <DocScan,Text>"&"[a-zA-Z0-9]{2}";"   { outDoc->writeEntity(yytext[2]); }
> > <DocScan,Text>"&"[a-zA-Z0-9]{3}";"   { outDoc->writeEntity(yytext[3]); }
> > <DocScan,Text>"&"[a-zA-Z0-9]{4}";"   { outDoc->writeEntity(yytext[4]); }
> >
> > I have several questions...:
> >
> >  -Q1: Are the following single line could replace the 4 lines?
> > <DocScan,Text>"&"[a-zA-Z0-9]{1,4}";"   { outDoc->writeEntity(yytext[4]); }
> > (I'm very lazy! :-)
> >
> Why do you have yytext[1] etc. in the above example and only yytext[4] in
> your suggested replacement?

 -1 In the regex, there is [a-zA-Z0-9]{$} where $ is 1,2,3 or 4.
      I think this means repeat $ times an alphanumeric value.
      (I'm using Regular Expression syntax of Python 1.5, I hope
      that's the sames rules Doxygen uses).
 -2 I have supposed that yytext[1] means one character, yytext[2] ,...
     But it means, the second character, the third and so on
     It was alos not correct because &alpha; has more than 4 characters!
     I ha in mind numeric entities like &#1245;. Well ...

So now, I think that it would be more correct to have:
<DocScan,Text>"&"[a-zA-Z0-9]{1,}";"   { outDoc->writeEntity(yytext,strlen(yytext));
}

void HtmlGenerator::writeEntity(const char* const text, int length)
{
  if (text)
  {
    char c;
    int textlength = length;
    while (textlength)
    {
      c=*text++;
      t << c;
      textlength--;
    }
  }
}

> > -Q2:  There is a theorical problem: what is the order of scanning? to be
> >          backward compatible, the added regex should be check if nothing
> >          else has already matched.
> >
> If you read the flex manual page everything will be explained. ;)
>
> The parser generated by flex picks the longest matching rule it can find
> of all active rules. One way for a rule to become active is to call for it
> explicitly, i.e. BEGIN(DocScan) will activate all rules with DocScan in
> them.

By longest matching, you certainly means the more restrictive, the more
precise. So I think this will work.I will provide changes soon.

Sorry for newbie questions. I just wanted to help a little (for my needs
first), I'm simply maintainer of the French location of Doxygen. :-)
[...]

Thanks,

Xavier.
--
 Artificial Anthill Project
  http://www.aanthill.org/
  mailto:aan...@aa...

 D2SET Non Profit Association
  http://www.d2set.org/
  mailto:d2...@d2...

Re: [Doxygen-develop] Adding of new (all) HTML entities? A oneline solution?

From: John O. <jo...@ly...> - 2001-07-24 09:01:25

>  -1 In the regex, there is [a-zA-Z0-9]{$} where $ is 1,2,3 or 4.
>       I think this means repeat $ times an alphanumeric value.
>       (I'm using Regular Expression syntax of Python 1.5, I hope
>       that's the sames rules Doxygen uses).
>
It is the regexp rules of flex you are using. Read the man-page for flex
for the full documentation of the regexp syntax available. Here is a short
snippet

PATTERNS
     The patterns in the input are written using an extended  set
     of regular expressions.  These are:

         x          match the character 'x'
         .          any character (byte) except newline
         [xyz]      a "character class"; in this case, the pattern
                      matches either an 'x', a 'y', or a 'z'
         [abj-oZ]   a "character class" with a range in it; matches
                      an 'a', a 'b', any letter from 'j' through 'o',
                      or a 'Z'
         [^A-Z]     a "negated character class", i.e., any character
                      but those in the class.  In this case, any
                      character EXCEPT an uppercase letter.
         [^A-Z\n]   any character EXCEPT an uppercase letter or
                      a newline
         r*         zero or more r's, where r is any regular expression
         r+         one or more r's
         r?         zero or one r's (that is, "an optional r")
         r{2,5}     anywhere from two to five r's
         r{2,}      two or more r's
         r{4}       exactly 4 r's
         {name}     the expansion of the "name" definition
                    (see above)
         "[xyz]\"foo"
                    the literal string: [xyz]"foo
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                      then the ANSI-C interpretation of \x.
                      Otherwise, a literal 'X' (used to escape
                      operators such as '*')
         \0         a NUL character (ASCII code 0)
         \123       the character with octal value 123
         \x2a       the character with hexadecimal value 2a
         (r)        match an r; parentheses are used to override
                      precedence (see below)

(this is only a part of the regular expressions available)

But I think that you can use the {$} syntax if I understand the above text
correctly. :)


> By longest matching, you certainly means the more restrictive, the more
> precise. So I think this will work.I will provide changes soon.
> 
Yes. :)


> Sorry for newbie questions. I just wanted to help a little (for my needs
> first), I'm simply maintainer of the French location of Doxygen. :-)
> [...]
> 
Aren't we all newbies? ;)

If you add the '-r' switch to flex when it generates the C-source code it
will insert debug printouts, giving verbose information about what happens
when it parses the text.

Also, the default behaviour of flex is to print everything it can't match
to stdout (this is the default rule which is always active). However, the
flex code in doc.l (and scanner.l) has defined a rule which overrides the
default rule which silently ignores text it can't match. If you modify the
"match-all" rule so it looks like this

<*>.                                    { ECHO; }

you will "restore" the default behaviour (I think).


/John

[Doxygen-develop] Transparent HTML entities: files

From: Xavier O. <xav...@an...> - 2001-07-24 14:02:32

Attachments: transparent-SGML-entities.zip transparent-SGML-entities-no length parameter.zip

John Olsson wrote:

> [...]> Sorry for newbie questions. I just wanted to help a little (for my
> needs

> > first), I'm simply maintainer of the French location of Doxygen. :-)
> > [...]
> >
> Aren't we all newbies? ;)

I think I'm the newest one. ;-)

> [...]
> Also, the default behaviour of flex is to print everything it can't match
> to stdout (this is the default rule which is always active). However, the
> flex code in doc.l (and scanner.l) has defined a rule which overrides the
> default rule which silently ignores text it can't match. If you modify the
> "match-all" rule so it looks like this
>
> <*>.                                    { ECHO; }
>
> you will "restore" the default behaviour (I think).

I don't know the reason why it is such. I'm not sure 'restoring' would
be a good idea. I prefer to restrict to entities in the form &entity;.

So here are my changes in sources. Files modified are:
 outputgen.h // new method writeEntity(const char* const entity) = 0;
or writeEntity(const char* const entity, int length) = 0; depending if
char* yytext has a trailing \0 or not, cf my PS.
 outputlist.h
 mangen.h    // empty implementation
 latexgen.h    // empty implementation
 rtfgen.h        // empty implementation
 html.h        // tranparent entities for HMTL
 doc.l    // new match for all entities

In all files writeEntity() is just behind writeCCedil().

Sorry, I have not time to compile and so on (I'm not under Linux :( ).
Tell me if there is any bug. I will try to implement the method for
LatexGenerator if I have free time. In any case, I will not implement
nothing for ManGenerator and RTFGenerator

Greetings,

Xavier.

PS: Because I don't know if yytext ends with '\0',
I include 2 zip:
-1 transparent-SGML-entities-no length parameter.zip
     To be used if yytext does _not_ include trailing '\0'.
     strlen(yytext) cannot be used.
-2 transparent-SGML-entities-no length parameter.zip
     To be used if yytext does include trailing '\0'.
     strlen(yytext) can be used
OK, I could read flex doc but that is faster. Hope you will
not quarrel me about that. :)