Re: [Vtd-xml-users] Storing parsing info

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Fernando, It is interetsing that you have substitute Byte[] with =
IbyteBuffer... since
there is a level of indirection , the slight slow down should be =
expected... I would certainly
be interested in your approach to the issue and feel free to send me the =
code...
Cheers,
Jimmy
  ----- Original Message -----=20
  From: Fernando Gonzalez=20
  To: Jimmy Zhang=20
  Sent: Wednesday, March 07, 2007 7:49 AM
  Subject: Re: [Vtd-xml-users] Storing parsing info

  Hi Jimmy,

  Writing the following I have found that may be it's quite complicated =
to understand since you don't know exactly the changes I have made. Even =
my tests are not thorough so maybe the best option is to submit a =
technical description of the changes, pros and cons, the code, and that =
kind of things.=20

  I have been testing the XPath performance problem and it seems like =
it's a classloader issue. As you can see in the following log the =
slowest XPath evaluation is the first, no matter how the parsing =
information is obtained.=20
  391 ms->Load XML
  2125 ms->Parse XML
  31 ms->Evaluate XPath
  0 ms->Evaluate XPath
  0 ms->Evaluate XPath
  453 ms->Store parse info
  0 ms->Clear parse info
  313 ms->Read Parse info
  0 ms->Evaluate XPath
  0 ms->Evaluate XPath

  I have been working in something more. I have done some changes to VTD =
and I have succeeded in the following.
  1) The byte[] of the XML file is accessed through an interface =
(IByteBuffer).=20
  2) When I use the UniByteBuffer implementation I get a bit slower =
results at parsing
  391 ms->Load XML
  2109 ms->Parse XML (vs 1890 ms I obtained accessing directly the =
byte[] buffer)
  0,172 ms->Evaluate XPath=20
  0,078 ms->Evaluate XPath
  0,094 ms->Evaluate XPath
  0,078 ms->Evaluate XPath
  0,078 ms->Evaluate XPath

  3) When I use an implementation that loads chunks as they are needed I =
get much slower results in parsing the file, but I get the same results =
evaluating a XPath expression. The advantage of this approach is that =
there is no need to load all the XML file in memory, so I have obtained =
the following results:=20

  25406 ms->Parse XML
  406 ms->Store parse info
  0,156 ms->Evaluate XPath
  0,093 ms->Evaluate XPath
  0,078 ms->Evaluate XPath
  0,093 ms->Evaluate XPath
  0,094 ms->Evaluate XPath
  0,078 ms->Evaluate XPath=20

  500 ms->Read Parse info
  0,235 ms->Evaluate XPath
  0,094 ms->Evaluate XPath
  0,078 ms->Evaluate XPath
  0,094 ms->Evaluate XPath
  0,078 ms->Evaluate XPath
  0,094 ms->Evaluate XPath

  The great thing in these results is that the XML file was 100Mb and I =
run the program with the -Xmx64Mb jvm option (just enough to store the =
30mb parsing info, and the 16mb buffer)

  Well, as I said before I can send you a technical description of the =
changes, pros and cons, and the code.=20

  cheers,
  Fernando

  On 3/7/07, Fernando Gonzalez <fer...@gm...> wrote:
    Hi Jimmy,

    Thanks for your response.

    I think I'm using the version 2.0 since I have tested the =
"VTDGen.writeIndex" method. I looked for another solution because I =
cannot remove the original XML file so I would have to store the XML =
file twice: the original xml file and the file with the XML, VTD and LCs =
created by "VTDGen.writeIndex". As I'm dealing with really big XML =
files, that's a drawback.

    Yes, you're right, I have added code. Just three or four lines. If =
you're interested I can explain thoroughly my solution. About the XPath =
performance, I think that's a classloader issue. I will check that and I =
will report the results.=20

    greetings,
    Fernando

    On 3/6/07, Jimmy Zhang < cra...@co...> wrote:
      Hey Fernando, Thanks for the email.. I am glad VTD-XML is helpful.
      My question: Which version are you using? =20
      If you are currently using 2.0, it contains the indexing feature =
that might
      accomplish just what is described in your email.

      Your solution is to seperate XML from VTD and LC, which I think =
you
      must have added code to do that...

      VTD+XML (as in version 2.0) is to package XML, VTD and LCs into=20
      a single file... which should also work

      The only suspicious part is that the XPath performance dropped for =

      your case ... which shouldn't happen=20

      Buffer reuse is useful if your app instantiates a VTDGen to =
sequentially
       process many incoming XML document ...

      if you deal only with one XML doc... buffer reuse won't make a big =
difference

      I think you might be interested in first investigating the =
persistence feature in=20
      2.0, and there is a directory under code examples...
        Cheers,
      Jimmy

      =20

        ----- Original Message -----=20
        From: Fernando Gonzalez=20
        To: vtd...@li...=20
        Sent: Tuesday, March 06, 2007 1:23 AM
        Subject: [Vtd-xml-users] Storing parsing info

        Hello,

        First of all I would like to congratulate you on your project, I =
really think it's great.

        Second, I want to use the java VTD-XML to do a certain task and =
I have succeeded but I don't know if I have done it in the right way, or =
there is a better one. Can you give me some advice?=20

        I want to evaluate some XPath expressions on a lot of files of =
this size and larger, so the memory eficiency is critical. The first =
idea that comes to my mind is to have a VTDGen object for each XML file =
but this solution leads to having all the XMLs loaded in memory in the =
"protected byte[] XMLDoc;" attribute in VTDGen class. So each time I =
have to evaluate a XPath expression in a XML file I have to read the xml =
file, parse it, evaluate XPath and set to null the VTDGen object to get =
the memory freed by the garbage collector.=20

        I have obtained these results reading a big XML file (~100Mb):

        360 ms reading file
        1890 ms parsing file
        32 ms evaluating a XPath expression
        93 ms showing results
        total =3D 2375 milliseconds

        Where the second step ("parsing file") means:
        VTDGen vg =3D new VTDGen();
        vg.setDoc(b);
        vg.parse(true);

        To speed up the process I have stored the parsing information in =
a file. After that I can read the XML file and the parsing information =
file, evaluate the XPath expression and close everything again in a =
shorter time:=20
        344 ms reading the file
        422 ms reading parsing information
        125 ms evaluating a XPath expression
        93 ms showing results
        total =3D 984

        I think the result is good enough but maybe there's a better =
solution than mine. I have stored the parsing info by serializing all =
the VTDGen object but the XMLDoc attribute. Then I retrieve the object =
from disk and I set the XMLDoc attribute. This way:=20

                    ObjectInputStream ois =3D new ObjectInputStream(new =
FileInputStream(PARSING_INFO));
                    vg =3D (MyVTDGen) ois.readObject();
                    ois.close();
                    FileInputStream fis2 =3D new =
FileInputStream(TEST_XML);=20
                    byte[] b2 =3D new byte[(int) f.length()];
                    fis2.read(b2);
                    vg.setXML(b2); //This method only sets the XMLDoc =
attribute

        Is this solution good? Is there a better one? Can "Buffer reuse" =
solve my probem?=20

        best regards,
        Fernando

------------------------------------------------------------------------

        =
-------------------------------------------------------------------------=

        Take Surveys. Earn Cash. Influence the Future of IT
        Join SourceForge.net's Techsay panel and you'll get the chance =
to share your
        opinions on IT & business topics through brief surveys-and earn =
cash
        =
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV=20

------------------------------------------------------------------------

        _______________________________________________
        Vtd-xml-users mailing list
        Vtd...@li...
        https://lists.sourceforge.net/lists/listinfo/vtd-xml-users