• giordano

    giordano - 2007-11-13


    I'm planning to use vtd-xml for updating persisted java objects form large xml file. It seems to work fine.

    But there's a sad point: vtd-xml seems to not recognize ISO-8859-15 encoding

    when this header is set :
    <?xml version="1.0" encoding="ISO-8859-15" standalone="yes"?>

    boolean loaded = vt.parseFile("test.xml", false);
    returns false ....

    maybe it's the     internal   
    decide_encoding(); method of the VTDGEN class

    that can't match this encoding althought it works for ISO-8859-1 or UTF8

    double question
    1- does it support ISO-8859-15 ?
    2- how can I enforce programatically the encoding of the VTDGen ?


    • jimmy zhang

      jimmy zhang - 2007-11-13

      current iso encoding is from 8859-1 to 8859-15
      Can you write down a list of encodings that you will need to have...
      to add ISO-8859-15 is not difficult ...
      I just make sure that the next versions of so will have all the same support
      if there is a need, we can just do a release (to add all those encoding support) next week if you want...

    • giordano

      giordano - 2007-11-14

      yes why not.

      matchISOEncoding() need a ISO8859_15Reader

      It could be a real benefit as this encoding can be seen as an evolution of the ISO-8859-1, including the Euro symbol.

      By the way this little problem lead me to considering other solutions. I'm wondering if this solution is appropriate for my case an particularly parsing 1G of data splited in numerous normal sized files.
      those files has to be parsed in a batch mode

      the question is the impact on the memory in comparison with jaxb2 or manual solution like commons.digester


      • jimmy zhang

        jimmy zhang - 2007-11-14

        what platform do you use Java, C or C++?
        When do you need it?

        I don't think that is a big fix to make this feature available.. I heard
        that jaxb2 is pretty big memory hog, performance is slow as well (just what
        I heard) with VTD-XML, my take is that there is no need to split big XML
        files to smaller ones.

    • giordano

      giordano - 2007-11-15

      ok, let's go

      my platform is java. I still go with this solution, waiting a new release.
      for the moment I artificially change the header for iso-8859-1, just to not being stuck.

      I appreciate your support

      • jimmy zhang

        jimmy zhang - 2007-11-15

        Ok, will try to come up with a new release or patch soon
        right now
        iso-8859 from 1 to 10 is supported, just need to add 11~16..
        will notify you (feel free to join the vtd-xml-users list
        so this discussion will benefit other people, you will get
        up-to-date info too)

    • giordano

      giordano - 2007-11-19

      thanks a lot for these answers,
      I'm awaiting for the new release.

      For now I'm going to test it against a huge amount of data, I tell you if it works as expected


      • jimmy zhang

        jimmy zhang - 2007-11-19

        Working on it...
        will get back

        • jimmy zhang

          jimmy zhang - 2007-11-20

          ok, support for iso-8859-11, 13, 14 and 15 has been added...
          you can get the following files from CVS



          you will then need to recompile and generate Jar file

          Let me know if it works for you or not...

    • giordano

      giordano - 2007-11-20

      I get the cvs version and make a jar
      then subclassing VTDGen to catch parse exceptions

          [java] VTDGenISO is parsing
           [java] com.ximpleware.ParseException: Other Error: Should never happen
           [java]     at com.ximpleware.VTDGen.getPrevOffset(
           [java]     at com.ximpleware.VTDGen.parse(
           [java]     at test.VTDGenISO.parse(Unknown Source)
           [java]     at com.ximpleware.VTDGen.parseFile(
           [java]     at test.SpringHibernateTester.main(Unknown Source)
           [java] file is found ? : true
           [java] format =16 (ISO=1 59-15=16 UTF8=2)

      the good encoding is found but there's an error
      maybe it's at
          private int getPrevOffset() throws ParseException
              the cases for new encodings may be missing

      maybe i'm wrong, i go on testing

      • jimmy zhang

        jimmy zhang - 2007-11-20

        Yep, youare right, I forgot to add the cases for 11,13,14, and 15... I just checked in the new version into CVS

    • giordano

      giordano - 2007-11-22

      ok it works !

      but need to refactor imports in my project AND in the sources :

      import com.ximpleware.xpath.XPathEvalException;
      import com.ximpleware.XPathEvalException;

      very good solution altought


Log in to post a comment.