ConcatPDF / Discussion / Open Discussion: analyze HTML (iText)

Nobody/Anonymous - 2006-03-01

Hello,
I wanted to use example 7 from chapter 7 (parsing the HTML).
Function Main code:

using System;
using System.IO;
using com.lowagie.text;
using com.lowagie.text.pdf;

using com.lowagie.text.html;

// step 1: creation of a document-object
Document document = new Document(PageSize.A4, 80, 50, 30, 65);

// step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter.getInstance(document, new FileStream("Chap0707.pdf", FileMode.Create));

// step 3: we parse the document
try
{
HtmlParser.parse(document, "Chap0702.html");
}
catch (Exception e)
{
Console.Write(e.ToString());
}

Chap0702.html file contains:
<html>
<head>
<meta name="Microsoft Theme" content="concrete 1000, default">
</head>
<body>hello</body>
</html>

When I compile this code I get exception:
ExceptionConverter: org.xml.sax.SAXParseException: required string (expected "meta")

What should I do to avoid this exception?

--
Tom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kazuya Ujihara - 2006-03-01
 
 Chap0702.html file in my web site wasn't correct. I uploaded a fixed file in <http://www.ujihara.jp/iTextdotNET/examples/Chap0702.html>. Please make an HTML file following the fxied file.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Nobody/Anonymous - 2006-03-02
 
 I downloaded corrected HTML file and when I compile the program I get:
 ExceptionConverter: org.xml.sax.SAXParseException: whitespace required (found ">")
 
 I also created shortened HTML file that contains:
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=Content-Type content="text/html; charset=windows-1250">
 <META content="MSHTML 6.00.2800.1528" name=GENERATOR></HEAD>
 <BODY>
 </BODY></HTML>
 
 and I also get the above mentioned exception.
 What should a simple HTML file look like to make the parser work?
 
 --
 Tom
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Kazuya Ujihara - 2006-03-03
 
 HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
 
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html><head>
 <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
 <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
 </head>
 <body>
 text
 </body>
 </html>
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Kazuya Ujihara - 2006-03-03
 
 HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
 
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
 <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
 </head>
 <body>
 text
 </body>
 </html>
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Kazuya Ujihara - 2006-03-03
 
 HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
 
 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
 <meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
 </head>
 <body>
 text
 </body>
 </html>
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Nobody/Anonymous - 2006-03-03
 
 In the first two examples, after I have removed semi colon in <!DOCTYPE in the first line, I got exception:
 ExceptionConverter: java.io.FileNotFoundException: C:\VS-Projekty\iTextDotNet\bin\Debug\xhtml-lat1.ent
 What is wrong with those two examples?
 The last example worked.
 
 --
 Tom
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 
 Kazuya Ujihara - 2006-03-03
 
 > What is wrong with those two examples?
 SourceForce site accidentally added "/>" at the first line. Removing the first "/>" from the examples leads a correct XHTML.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

analyze HTML (iText) - ExceptionConverter

PDF Concatenation Tool

Forums

Help

analyze HTML (iText) - ExceptionConverter

analyze HTML (iText) - ExceptionConverter

PDF Concatenation Tool

Forums

Help

analyze HTML (iText) - ExceptionConverter document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

analyze HTML (iText) - ExceptionConverter