XML output not wellformed

2008-02-17
2013-04-24
  • Frederick Schulz

    I'm using 0.40 from darwin ports on Mac OS 10.5.2.

    The XML output option generates invalid xml files:

    - <span> and </span> are reversed, output is like </span> spanned text <span>
    - <A> is closed by </a>

    There's a simple workaround (at least for me): run xml output through a filter (e.g. sed) to correct these.
    But: Is there a possibility this is fixed anytime soon?

    Greetings,
    Frederick

     
    • spirov

      spirov - 2008-12-18

      I confirm what Frederick is saying:

      I get reversed <span> ; this was not the case in the previous version

      (also on Mac)

       
  • stfwi

    stfwi - 2010-01-13

    Hi,
    I found a similar bug in

    pdftohtml version 0.40 http://pdftohtml.sourceforge.net/, based on Xpdf version 3.01
    

    ,
    @MacOS 10.6.2.

    The following output is a capitalised page number decl, where the first <span> and the last </span> is missing:

    <text top="1166" left="106" width="83" height="22" font="21">S</span><span class="ft1">EITE </span><span class="ft0">1/41</text>
    

    However, it could be the same bug. Thanks for coding the great tool!

    Cheers

       Stefan

     
  • deaddecoy

    deaddecoy - 2011-05-09

    The problem code is in "src/HtmlOutputDev.cc" lines 538 - 546:

          GString *fntFix;
          GString *iStr=GString::fromInt(str2->fontpos);     
          fntFix = new GString("</span><span class=\"ft");
          fntFix->append(iStr);
          fntFix->append("\">");
          if (((hlink1 == NULL) && (hlink2 == NULL)) && (hfont1->isEqualIgnoreBold(*hfont2) == gFalse))
          {
        str1->htext->append(fntFix);
          }
    

    It looks like the developer is trying to add support for subscripts and superscripts, but that this feature isn't fully implemented or is for html only (I'm dumping out to xml). I simply commented out these lines and recompiled.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks