Hello
 
I am using Saxon to extract data from thousands of HTML pages→ converted to XML documents.
However for some(very few ) of the XML documents, I am getting the following error when I run my XQuery command-- the same command is working perfectly for 99.9% of the documents
What am I doing wrong here? How do I resolve the (below) error?
 
The error is--
 
 
Caused by: net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException; lineN
umber: 51; columnNumber: 2; The content of elements must consist of well-formed
character data or markup.
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:278)
        at net.sf.saxon.event.Sender.send(Sender.java:144)
        at net.sf.saxon.event.Sender.send(Sender.java:46)
        at net.sf.saxon.event.Builder.build(Builder.java:209)
        at net.sf.saxon.event.Builder.build(Builder.java:161)
        at net.sf.saxon.query.StaticQueryContext.buildDocument(StaticQueryContex
t.java:435)
        at com.anya.crawler.runtime.processors.XQueryProcessor.castSimpleValue(X
QueryProcessor.java:171)
        at com.anya.crawler.runtime.processors.XQueryProcessor.execute(XQueryPro
cessor.java:147)
        ... 15 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 51; columnNumber: 2; The c
ontent of elements must consist of well-formed character data or markup.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Un
known Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
 
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Sour
ce)
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
        ... 22 more
 
 
 
 
Yours sincerely,
Arvind.