I'm planning to use vtd-xml for updating persisted java objects form large xml file. It seems to work fine.
But there's a sad point: vtd-xml seems to not recognize ISO-8859-15 encoding
when this header is set :
<?xml version="1.0" encoding="ISO-8859-15" standalone="yes"?>
boolean loaded = vt.parseFile("test.xml", false);
returns false ....
maybe it's the internal
decide_encoding(); method of the VTDGEN class
that can't match this encoding althought it works for ISO-8859-1 or UTF8
1- does it support ISO-8859-15 ?
2- how can I enforce programatically the encoding of the VTDGen ?
current iso encoding is from 8859-1 to 8859-15
Can you write down a list of encodings that you will need to have...
to add ISO-8859-15 is not difficult ...
I just make sure that the next versions of so will have all the same support
if there is a need, we can just do a release (to add all those encoding support) next week if you want...
yes why not.
matchISOEncoding() need a ISO8859_15Reader
It could be a real benefit as this encoding can be seen as an evolution of the ISO-8859-1, including the Euro symbol.
By the way this little problem lead me to considering other solutions. I'm wondering if this solution is appropriate for my case an particularly parsing 1G of data splited in numerous normal sized files.
those files has to be parsed in a batch mode
the question is the impact on the memory in comparison with jaxb2 or manual solution like commons.digester
what platform do you use Java, C or C++?
When do you need it?
I don't think that is a big fix to make this feature available.. I heard
that jaxb2 is pretty big memory hog, performance is slow as well (just what
I heard) with VTD-XML, my take is that there is no need to split big XML
files to smaller ones.
ok, let's go
my platform is java. I still go with this solution, waiting a new release.
for the moment I artificially change the header for iso-8859-1, just to not being stuck.
I appreciate your support
Ok, will try to come up with a new release or patch soon
iso-8859 from 1 to 10 is supported, just need to add 11~16..
will notify you (feel free to join the vtd-xml-users list
so this discussion will benefit other people, you will get
up-to-date info too)
thanks a lot for these answers,
I'm awaiting for the new release.
For now I'm going to test it against a huge amount of data, I tell you if it works as expected
Working on it...
will get back
ok, support for iso-8859-11, 13, 14 and 15 has been added...
you can get the following files from CVS
you will then need to recompile and generate Jar file
Let me know if it works for you or not...
I get the cvs version and make a jar
then subclassing VTDGen to catch parse exceptions
[java] VTDGenISO is parsing
[java] com.ximpleware.ParseException: Other Error: Should never happen
[java] at com.ximpleware.VTDGen.getPrevOffset(VTDGen.java:1313)
[java] at com.ximpleware.VTDGen.parse(VTDGen.java:1851)
[java] at test.VTDGenISO.parse(Unknown Source)
[java] at com.ximpleware.VTDGen.parseFile(VTDGen.java:2332)
[java] at test.SpringHibernateTester.main(Unknown Source)
[java] file is found ? : true
[java] format =16 (ISO=1 59-15=16 UTF8=2)
the good encoding is found but there's an error
maybe it's at
private int getPrevOffset() throws ParseException
the cases for new encodings may be missing
maybe i'm wrong, i go on testing
Yep, youare right, I forgot to add the cases for 11,13,14, and 15... I just checked in the new version into CVS
ok it works !
but need to refactor imports in my project AND in the sources :
very good solution altought
that was one of the changes we will make for the next release of VTD-XML...
you can just check out the whole thing if you want...
The C version: http://downloads.sourceforge.net/vtd-xml/c_tutorial_by_code_examples....
The C# version: http://downloads.sourceforge.net/vtd-xml/CSharp_tutorial_by_code_exam...
The Java version: http://downloads.sourceforge.net/vtd-xml/Java_tutorial_by_code_exampl...
Also some latest articles:
Schemaless Java-XML databinding with VTD-XML
Index XML documents with VTD-XML
Improve XPath Efficiency with VTD-XML
Log in to post a comment.