First point: Saxon is not a parser. When you use Saxon to transform an XML document, Saxon invokes an XML parser to do the parsing. By default it invokes the XML parser provided in the Java JDK,

Second point: if you're getting an error message, please tell us what it is. You might not understand it, but there's a good chance that we do.

Third point: you need to say what you mean by a "junk" character. The fact that the error goes away when you change the output method from HTML to XML suggests that the problem is probably a character which is permitted in XML but not in HTML; the most common such characters are the codepoints in the range 128 - 159 (decimal). And the most common reason for such codepoints to be present in your input is that the input is encoded in Windows-1252 but the source does not contain an XML declaration, which means that the XML parser is assuming the input is encoded in UTF-8.

If your input file uses Windows-1252 encoding (which is likely if you used a simple Windows editor to create the file) then you should include an XML declaration of the form

<?xml version="1.0" encoding="cp1252"?>

at the start of the file.

Michael Kay

On 27/08/2012 11:25, santhosh k wrote:

I am using saxon parser to generate html after parsing my input xml file content

I have xslt transforms stylesheet for html.

In my input xml I have several sections. For each section and its content I am creating one html file. The input xml data has few junk character.

In my transforms.xsl I am using xsl:output method as html and encoding as utf-8.

I am using saxon parser to parse my xml. I am passing command line arguments to saxon as mentioned below

"cmd /c java net.sf.saxon.Transform -s:source.xml -xsl:transforms.xsl -o:output"

The problem is saxon parser is stop parsing the xml when it reached junk character.

But when I use xsl:output method as xml and encoding as utf-8, it will parse the xml completely without any complaints and generates the html file.

As I am new to this, I dint understand the difference between using xsl:output method as xml and html.

What is the mechanism behind using xsl:output method as xml and html. How xsl:output method xml will ignore these characters.

Is there an option where I can pass it as an command line argument to saxon to ignore these characters?

Please provide a detailed information on xsl:output element with xml && html.

Thanking you in advance.


Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 

saxon-help mailing list archived at