Menu

analyze HTML (iText) - ExceptionConverter

2006-03-01
2013-05-01
  • Nobody/Anonymous

    Hello,
    I wanted to use example 7 from chapter 7 (parsing the HTML).
    Function Main code:

    using System;
    using System.IO;
    using com.lowagie.text;
    using com.lowagie.text.pdf;

    using com.lowagie.text.html;

    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 80, 50, 30, 65);     

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter.getInstance(document, new FileStream("Chap0707.pdf", FileMode.Create));
           
    // step 3: we parse the document
    try
    {
    HtmlParser.parse(document, "Chap0702.html");
    }
    catch  (Exception e)
    {
    Console.Write(e.ToString());
    }

    Chap0702.html file contains:
    <html>
    <head>
    <meta name="Microsoft Theme" content="concrete 1000, default">
    </head>
    <body>hello</body>
    </html>

    When I compile this code I get exception:
    ExceptionConverter: org.xml.sax.SAXParseException: required string (expected "meta")

    What should I do to avoid this exception?

    --
    Tom

     
    • Kazuya Ujihara

      Kazuya Ujihara - 2006-03-01

      Chap0702.html file in my web site wasn't correct. I uploaded a fixed file in <http://www.ujihara.jp/iTextdotNET/examples/Chap0702.html>. Please make an HTML file following the fxied file.

       
      • Nobody/Anonymous

        I downloaded corrected HTML file and when I compile the program I get:
        ExceptionConverter: org.xml.sax.SAXParseException: whitespace required (found ">")

        I also created shortened HTML file that contains:

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
        <HTML><HEAD>
        <META http-equiv=Content-Type content="text/html; charset=windows-1250">
        <META content="MSHTML 6.00.2800.1528" name=GENERATOR></HEAD>
        <BODY>
        </BODY></HTML>

        and I also get the above mentioned exception.
        What should a simple HTML file look like to make the parser work?

        --
        Tom

         
        • Kazuya Ujihara

          Kazuya Ujihara - 2006-03-03

          HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.

          <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
          <html><head>
          <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
          <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
          </head>
          <body>
          <p>text</p>
          </body>
          </html>

           
        • Kazuya Ujihara

          Kazuya Ujihara - 2006-03-03

          HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.

          <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
          <html>
          <head>
          <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" /> 
          <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
          </head> 
          <body>
          <p>text</p>
          </body>
          </html>

           
        • Kazuya Ujihara

          Kazuya Ujihara - 2006-03-03

          HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below. 

          <html>
          <head> 
          <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" /> 
          <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" /> 
          </head> 
          <body> 
          <p>text</p> 
          </body> 
          </html> 

           
          • Nobody/Anonymous

            In the first two examples, after I have removed semi colon in <!DOCTYPE in the first line, I got exception:
            ExceptionConverter: java.io.FileNotFoundException: C:\VS-Projekty\iTextDotNet\bin\Debug\xhtml-lat1.ent
            What is wrong with those two examples?
            The last example worked.

            --
            Tom

             
            • Kazuya Ujihara

              Kazuya Ujihara - 2006-03-03

              > What is wrong with those two examples?
              SourceForce site accidentally added "/>" at the first line. Removing the first "/>" from the examples leads a correct XHTML.
               

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.