Hello,
I wanted to use example 7 from chapter 7 (parsing the HTML).
Function Main code:
using System;
using System.IO;
using com.lowagie.text;
using com.lowagie.text.pdf;
using com.lowagie.text.html;
// step 1: creation of a document-object
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
// step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter.getInstance(document, new FileStream("Chap0707.pdf", FileMode.Create));
// step 3: we parse the document
try
{
HtmlParser.parse(document, "Chap0702.html");
}
catch (Exception e)
{
Console.Write(e.ToString());
}
Chap0702.html file in my web site wasn't correct. I uploaded a fixed file in <http://www.ujihara.jp/iTextdotNET/examples/Chap0702.html>. Please make an HTML file following the fxied file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I downloaded corrected HTML file and when I compile the program I get:
ExceptionConverter: org.xml.sax.SAXParseException: whitespace required (found ">")
I also created shortened HTML file that contains:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=windows-1250">
<META content="MSHTML 6.00.2800.1528" name=GENERATOR></HEAD>
<BODY>
</BODY></HTML>
and I also get the above mentioned exception.
What should a simple HTML file look like to make the parser work?
--
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the first two examples, after I have removed semi colon in <!DOCTYPE in the first line, I got exception:
ExceptionConverter: java.io.FileNotFoundException: C:\VS-Projekty\iTextDotNet\bin\Debug\xhtml-lat1.ent
What is wrong with those two examples?
The last example worked.
--
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> What is wrong with those two examples?
SourceForce site accidentally added "/>" at the first line. Removing the first "/>" from the examples leads a correct XHTML.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I wanted to use example 7 from chapter 7 (parsing the HTML).
Function Main code:
using System;
using System.IO;
using com.lowagie.text;
using com.lowagie.text.pdf;
using com.lowagie.text.html;
// step 1: creation of a document-object
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
// step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter.getInstance(document, new FileStream("Chap0707.pdf", FileMode.Create));
// step 3: we parse the document
try
{
HtmlParser.parse(document, "Chap0702.html");
}
catch (Exception e)
{
Console.Write(e.ToString());
}
Chap0702.html file contains:
<html>
<head>
<meta name="Microsoft Theme" content="concrete 1000, default">
</head>
<body>hello</body>
</html>
When I compile this code I get exception:
ExceptionConverter: org.xml.sax.SAXParseException: required string (expected "meta")
What should I do to avoid this exception?
--
Tom
Chap0702.html file in my web site wasn't correct. I uploaded a fixed file in <http://www.ujihara.jp/iTextdotNET/examples/Chap0702.html>. Please make an HTML file following the fxied file.
I downloaded corrected HTML file and when I compile the program I get:
ExceptionConverter: org.xml.sax.SAXParseException: whitespace required (found ">")
I also created shortened HTML file that contains:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=windows-1250">
<META content="MSHTML 6.00.2800.1528" name=GENERATOR></HEAD>
<BODY>
</BODY></HTML>
and I also get the above mentioned exception.
What should a simple HTML file look like to make the parser work?
--
Tom
HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
<meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
</head>
<body>
<p>text</p>
</body>
</html>
HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
<meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
</head>
<body>
<p>text</p>
</body>
</html>
HtmlParser supports only an XML text, ie XHTML, as described in a source code. You have to rewrite your HTML like below.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
<meta content="MSHTML 6.00.2800.1528" name="GENERATOR" />
</head>
<body>
<p>text</p>
</body>
</html>
In the first two examples, after I have removed semi colon in <!DOCTYPE in the first line, I got exception:
ExceptionConverter: java.io.FileNotFoundException: C:\VS-Projekty\iTextDotNet\bin\Debug\xhtml-lat1.ent
What is wrong with those two examples?
The last example worked.
--
Tom
> What is wrong with those two examples?
SourceForce site accidentally added "/>" at the first line. Removing the first "/>" from the examples leads a correct XHTML.