Hi,

check whether one of your documents does not contain unescaped characters which have special meaning in XML (usually ‚<‘).

Such as:

<p>this: 2 <= (5-3) is true</p>

You can use tools such as HTMLTidy to cleanup your inputs before you use XML tools.  

They will change it to

<p>this: 2 &lt;= (5-3) is true</p>

Jakub.

 

From: Arvind Anya Services [mailto:arvind@anya.im]
Sent: Sunday, January 20, 2013 9:45 PM
To: saxon-help@lists.sourceforge.net
Subject: [saxon] Getting Saxon error->org.xml.sax.SAXParseException- The content of elements must consist of well-formed character data or markup

 

Hello

 

I am using Saxon to extract data from thousands of HTML pages→ converted to XML documents.

However for some(very few ) of the XML documents, I am getting the following error when I run my XQuery command-- the same command is working perfectly for 99.9% of the documents

What am I doing wrong here? How do I resolve the (below) error?

 

The error is--

 

 

Caused by: net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException; lineN
umber: 51; columnNumber: 2; The content of elements must consist of well-formed
character data or markup.
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:278)
        at net.sf.saxon.event.Sender.send(Sender.java:144)
        at net.sf.saxon.event.Sender.send(Sender.java:46)
        at net.sf.saxon.event.Builder.build(Builder.java:209)
        at net.sf.saxon.event.Builder.build(Builder.java:161)
        at net.sf.saxon.query.StaticQueryContext.buildDocument(StaticQueryContex
t.java:435)
        at com.anya.crawler.runtime.processors.XQueryProcessor.castSimpleValue(X
QueryProcessor.java:171)
        at com.anya.crawler.runtime.processors.XQueryProcessor.execute(XQueryPro
cessor.java:147)
        ... 15 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 51; columnNumber: 2; The c
ontent of elements must consist of well-formed character data or markup.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Un
known Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)

 

        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Sour
ce)
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
        ... 22 more

 

 

 

 

Yours sincerely,
Arvind.