Hi,
I'm using babeldoc 1.2 to convert documents from EDIFACT (plian text files). The conversion consists in 2 stages. First I convert the input EDIFACT document into XML format and then the XML doc into another EDIFACT document using the XSLTransform pipeline stage.
The XML intermediate document is memorized in the system for backup reasons, so I absolutely need it.
The problem is that some EDIFACT input documents contains not UTF-8 characters, so babeldoc handles an error and stops conversion. I 'attach the error trace at the and of this post.
Somebody knows if there's a way to escape the not UTF-8 characters or to just to substitute them with blank spaces using some feature of babeldoc.
Thanks Hagop
ERROR TRACE ..
xml2xml-2 Error: com.babeldoc.core.pipeline.PipelineException: [XslTransformPipelineStage.process]
<2003-12-16 17:04:59,159> ERROR [Thread-0] : [AsynchronousFeeder$1.run]
com.babeldoc.core.pipeline.PipelineException: [XslTransformPipelineStage.process]
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source)
at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown Source)
at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:534)
Caused by: javax.xml.transform.TransformerException: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:741)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:715)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1129)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1107)
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.transformInputStream(Unknown Source)
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.transformDocument(Unknown Source)
... 28 more
Caused by: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xml.dtm.ref.DTMManagerDefault.getDTM(DTMManagerDefault.java:495)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:658)
... 32 more
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I find the cause of the problem. The FLATTOXML pipeline stage always creates documents with UTF-8 encoding. if in the input file there are not UTF-8 characters the output of the FLATTOXML stage is not consitent because declares to be UTF-8 but contains other characters.
Is there a way to declare the encoding type of the output document in the FLATTOXML stage like in XlsToXml where exists the pipeline encoding parameters for example?
Thanks in advance for the help
Hagop
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I haven't used this pipeline stage but I guess you should be able to set encoding in XML conversion document. As I could see the FlatToXML stage uses DigesterConversionMarshaller class (com.babeldoc.conversion.flatfile.digester.DigesterConversionUnmarshaller)
As I could see from the source code UTF-8 is default encoding but it isn't hardcoded and you can specify it with header of conversion xml document.
I guess you can get more information from Bruce, but he is pretty bussy these days with other things. Try to ask this question on mailing list.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using babeldoc 1.2 to convert documents from EDIFACT (plian text files). The conversion consists in 2 stages. First I convert the input EDIFACT document into XML format and then the XML doc into another EDIFACT document using the XSLTransform pipeline stage.
The XML intermediate document is memorized in the system for backup reasons, so I absolutely need it.
The problem is that some EDIFACT input documents contains not UTF-8 characters, so babeldoc handles an error and stops conversion. I 'attach the error trace at the and of this post.
Somebody knows if there's a way to escape the not UTF-8 characters or to just to substitute them with blank spaces using some feature of babeldoc.
Thanks Hagop
xml2xml-2 Error: com.babeldoc.core.pipeline.PipelineException: [XslTransformPipelineStage.process]
<2003-12-16 17:04:59,159> ERROR [Thread-0] : [AsynchronousFeeder$1.run]
com.babeldoc.core.pipeline.PipelineException: [XslTransformPipelineStage.process]
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResult(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStageResults(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipelineStage(Unknown Source)
at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source)
at com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source)
at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown Source)
at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:534)
Caused by: javax.xml.transform.TransformerException: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:741)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:715)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1129)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1107)
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.transformInputStream(Unknown Source)
at com.babeldoc.core.pipeline.stage.XslTransformPipelineStage.transformDocument(Unknown Source)
... 28 more
Caused by: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xml.dtm.ref.DTMManagerDefault.getDTM(DTMManagerDefault.java:495)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:658)
... 32 more
I find the cause of the problem. The FLATTOXML pipeline stage always creates documents with UTF-8 encoding. if in the input file there are not UTF-8 characters the output of the FLATTOXML stage is not consitent because declares to be UTF-8 but contains other characters.
Is there a way to declare the encoding type of the output document in the FLATTOXML stage like in XlsToXml where exists the pipeline encoding parameters for example?
Thanks in advance for the help
Hagop
I haven't used this pipeline stage but I guess you should be able to set encoding in XML conversion document. As I could see the FlatToXML stage uses DigesterConversionMarshaller class (com.babeldoc.conversion.flatfile.digester.DigesterConversionUnmarshaller)
As I could see from the source code UTF-8 is default encoding but it isn't hardcoded and you can specify it with header of conversion xml document.
I guess you can get more information from Bruce, but he is pretty bussy these days with other things. Try to ask this question on mailing list.
I found the solution.
In the header of the flattoxml I must set the encoding tag to the encoding I need.
Thanks for the help
Hagop
I also need to process EDI (820) documents, and was considering FlatToXml. Would you be willing to share pointers and/or post your mapping file?