First point: Saxon is not a parser. When you use Saxon to transform
an XML document, Saxon invokes an XML parser to do the parsing. By
default it invokes the XML parser provided in the Java JDK,
Second point: if you're getting an error message, please tell us
what it is. You might not understand it, but there's a good chance
that we do.
Third point: you need to say what you mean by a "junk" character.
The fact that the error goes away when you change the output method
from HTML to XML suggests that the problem is probably a character
which is permitted in XML but not in HTML; the most common such
characters are the codepoints in the range 128 - 159 (decimal). And
the most common reason for such codepoints to be present in your
input is that the input is encoded in Windows-1252 but the source
does not contain an XML declaration, which means that the XML parser
is assuming the input is encoded in UTF-8.
If your input file uses Windows-1252 encoding (which is likely if
you used a simple Windows editor to create the file) then you should
include an XML declaration of the form
<?xml version="1.0" encoding="cp1252"?>
at the start of the file.
On 27/08/2012 11:25, santhosh k wrote:
I am using saxon parser to generate html after parsing my
input xml file content
I have xslt transforms stylesheet for html.
In my input xml I have several sections. For each section and its
content I am creating one html file. The input xml data has few
In my transforms.xsl I am using xsl:output method as
html and encoding as utf-8.
I am using saxon parser to parse my xml. I am passing command line
arguments to saxon as mentioned below
"cmd /c java net.sf.saxon.Transform -s:source.xml
The problem is saxon parser is stop parsing the xml when it
reached junk character.
But when I use xsl:output method as xml and encoding as
utf-8, it will parse the xml completely without any complaints and
generates the html file.
As I am new to this, I dint understand the difference between
using xsl:output method as xml and html.
What is the mechanism behind using xsl:output method as xml and
html. How xsl:output method xml will ignore these characters.
Is there an option where I can pass it as an command line
argument to saxon to ignore these characters?
Please provide a detailed information on xsl:output element with
xml && html.
Thanking you in advance.
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
saxon-help mailing list archived at http://saxon.markmail.org/