From: Mark A. G. <m.a...@sh...> - 2015-06-04 07:14:03
|
Daan, Two things. One can you try using the newly released GATE 8.1 and see if the problem has been fixed (some of the libs we use have been upgraded and there have been plenty of bug fixes). Secondly can you send me a HTML document that triggers the problem so I can try and reproduce the problem and see exactly what's happening, Mark On 04/06/15 07:50, Daan Van den Nest wrote: > > > > Dear users, > > > > I am currently using Gate embedded (8.0) and am experiencing problems > when trying to serialize a gate Document to xml. Quite often a call to > /gate.corpora.DocumentImpl.toXml()/, will produce the exception below, > which says my document (whose original format is HTML) contains an > invalid white space character. In the exception below the invalid > character is /0x1/, but I’ve also encountered cases where it’s > /0x1f/or /0xb/. Does anybody know why Gate cannot handle such > characters and if there is a fix? I found an old forum post that was > somewhat related to my question but no fix was provided > (http://article.gmane.org/gmane.comp.ai.gate.general/4170/match=wstxioexception) > > > > Thanks > > > > Daan > > > > com.thoughtworks.xstream.io.StreamException: : Invalid white space > character (0x1) in text to output (in xml 1.1, could output as a > character entity) > at > com.thoughtworks.xstream.io.xml.StaxWriter.setValue(StaxWriter.java:160) > at > com.thoughtworks.xstream.io.WriterWrapper.setValue(WriterWrapper.java:45) > at > com.thoughtworks.xstream.converters.SingleValueConverterWrapper.marshal(SingleValueConverterWrapper.java:45) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:51) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88) > at > com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64) > at > com.thoughtworks.xstream.converters.collections.ArrayConverter.marshal(ArrayConverter.java:45) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88) > at > com.thoughtworks.xstream.converters.reflection.SerializableConverter$1.defaultWriteObject(SerializableConverter.java:214) > at > com.thoughtworks.xstream.converters.reflection.SerializableConverter.doMarshal(SerializableConverter.java:274) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:83) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:250) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:226) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.<init>(AbstractReflectionConverter.java:189) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:135) > at > com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:83) > at > com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) > at > com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) > at > com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82) > at > com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37) > at com.thoughtworks.xstream.XStream.marshal(XStream.java:1022) > at com.thoughtworks.xstream.XStream.marshal(XStream.java:1011) > at com.thoughtworks.xstream.XStream.toXML(XStream.java:984) > at gate.corpora.ObjectWrapper.toString(ObjectWrapper.java:78) > at > gate.corpora.DocumentStaxUtils.writeFeatures(DocumentStaxUtils.java:1500) > at > gate.corpora.DocumentStaxUtils.writeDocument(DocumentStaxUtils.java:1025) > at > gate.corpora.DocumentStaxUtils.writeDocument(DocumentStaxUtils.java:1097) > at gate.corpora.DocumentStaxUtils.toXml(DocumentStaxUtils.java:931) > at gate.corpora.DocumentImpl.toXml(DocumentImpl.java:2082) > > […] > > Caused by: com.ctc.wstx.exc.WstxIOException: Invalid white space character (0xb) in text to output (in xml 1.1, could output as a > character entity) > at com.ctc.wstx.sw.BaseStreamWriter.writeCharacters(BaseStreamWriter.java:462) > at com.thoughtworks.xstream.io.xml.StaxWriter.setValue(StaxWriter.java:158) > ... 45 more > Caused by: java.io.IOException: Invalid white space character (0xb) in text to output (in xml 1.1, could output as a character entity) > at com.ctc.wstx.api.InvalidCharHandler$FailingHandler.convertInvalidChar(InvalidCharHandler.java:55) > at com.ctc.wstx.sw.XmlWriter.handleInvalidChar(XmlWriter.java:623) > at com.ctc.wstx.sw.BufferingXmlWriter.writeCharacters(BufferingXmlWriter.java:554) > at com.ctc.wstx.sw.BaseStreamWriter.writeCharacters(BaseStreamWriter.java:460) > ... 46 more > > > > > > > > > > > > > > > > Daan Van den Nest > *Computational Linguist, PhD* > > StepStone > *love your job* > > StepStone n.v./s.a > > Koningsstraat 47 Rue Royale > > 1000 Brussel/Bruxelles > > www.stepstone.be <http://www.stepstone.be/> > > legal information > <http://www.stepstone.be/About-Us/legal-informations.cfm> > > > > T > > > > +32 (0)2 209 97 98 > > M > > > > +32 (0)485 67 92 69 > > F > > > > +32 (0)2 218 79 45 > > Daa...@st... <mailto:Daa...@st...> > > With the new StepStone DirectSearch, whoever seeks finds! > Thanks to our innovative search engine, recruiters can find the profiles > they need in the blink of an eye amongst more than 330 000 active CV’s > in Belgium. Visit *www.stepstone.be/directsearch* > <http://www.stepstone.be/be-recruiters/products-and-services/cv-database/?cid=signature_DS_BE> > > Founding member of *THE NETWORK*- Global leader in online recruitment > Visit us at www.the-network.com <http://www.the-network.com/> > > > > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > GATE-users mailing list > GAT...@li... > https://lists.sourceforge.net/lists/listinfo/gate-users |