|
From: Jimmy Z. <cra...@co...> - 2007-03-07 19:16:34
|
Fernando, It is interetsing that you have substitute Byte[] with =
IbyteBuffer... since
there is a level of indirection , the slight slow down should be =
expected... I would certainly
be interested in your approach to the issue and feel free to send me the =
code...
Cheers,
Jimmy
----- Original Message -----=20
From: Fernando Gonzalez=20
To: Jimmy Zhang=20
Sent: Wednesday, March 07, 2007 7:49 AM
Subject: Re: [Vtd-xml-users] Storing parsing info
Hi Jimmy,
Writing the following I have found that may be it's quite complicated =
to understand since you don't know exactly the changes I have made. Even =
my tests are not thorough so maybe the best option is to submit a =
technical description of the changes, pros and cons, the code, and that =
kind of things.=20
I have been testing the XPath performance problem and it seems like =
it's a classloader issue. As you can see in the following log the =
slowest XPath evaluation is the first, no matter how the parsing =
information is obtained.=20
391 ms->Load XML
2125 ms->Parse XML
31 ms->Evaluate XPath
0 ms->Evaluate XPath
0 ms->Evaluate XPath
453 ms->Store parse info
0 ms->Clear parse info
313 ms->Read Parse info
0 ms->Evaluate XPath
0 ms->Evaluate XPath
I have been working in something more. I have done some changes to VTD =
and I have succeeded in the following.
1) The byte[] of the XML file is accessed through an interface =
(IByteBuffer).=20
2) When I use the UniByteBuffer implementation I get a bit slower =
results at parsing
391 ms->Load XML
2109 ms->Parse XML (vs 1890 ms I obtained accessing directly the =
byte[] buffer)
0,172 ms->Evaluate XPath=20
0,078 ms->Evaluate XPath
0,094 ms->Evaluate XPath
0,078 ms->Evaluate XPath
0,078 ms->Evaluate XPath
3) When I use an implementation that loads chunks as they are needed I =
get much slower results in parsing the file, but I get the same results =
evaluating a XPath expression. The advantage of this approach is that =
there is no need to load all the XML file in memory, so I have obtained =
the following results:=20
25406 ms->Parse XML
406 ms->Store parse info
0,156 ms->Evaluate XPath
0,093 ms->Evaluate XPath
0,078 ms->Evaluate XPath
0,093 ms->Evaluate XPath
0,094 ms->Evaluate XPath
0,078 ms->Evaluate XPath=20
500 ms->Read Parse info
0,235 ms->Evaluate XPath
0,094 ms->Evaluate XPath
0,078 ms->Evaluate XPath
0,094 ms->Evaluate XPath
0,078 ms->Evaluate XPath
0,094 ms->Evaluate XPath
The great thing in these results is that the XML file was 100Mb and I =
run the program with the -Xmx64Mb jvm option (just enough to store the =
30mb parsing info, and the 16mb buffer)
Well, as I said before I can send you a technical description of the =
changes, pros and cons, and the code.=20
cheers,
Fernando
On 3/7/07, Fernando Gonzalez <fer...@gm...> wrote:
Hi Jimmy,
Thanks for your response.
I think I'm using the version 2.0 since I have tested the =
"VTDGen.writeIndex" method. I looked for another solution because I =
cannot remove the original XML file so I would have to store the XML =
file twice: the original xml file and the file with the XML, VTD and LCs =
created by "VTDGen.writeIndex". As I'm dealing with really big XML =
files, that's a drawback.
Yes, you're right, I have added code. Just three or four lines. If =
you're interested I can explain thoroughly my solution. About the XPath =
performance, I think that's a classloader issue. I will check that and I =
will report the results.=20
greetings,
Fernando
On 3/6/07, Jimmy Zhang < cra...@co...> wrote:
Hey Fernando, Thanks for the email.. I am glad VTD-XML is helpful.
My question: Which version are you using? =20
If you are currently using 2.0, it contains the indexing feature =
that might
accomplish just what is described in your email.
Your solution is to seperate XML from VTD and LC, which I think =
you
must have added code to do that...
VTD+XML (as in version 2.0) is to package XML, VTD and LCs into=20
a single file... which should also work
The only suspicious part is that the XPath performance dropped for =
your case ... which shouldn't happen=20
Buffer reuse is useful if your app instantiates a VTDGen to =
sequentially
process many incoming XML document ...
if you deal only with one XML doc... buffer reuse won't make a big =
difference
I think you might be interested in first investigating the =
persistence feature in=20
2.0, and there is a directory under code examples...
Cheers,
Jimmy
=20
----- Original Message -----=20
From: Fernando Gonzalez=20
To: vtd...@li...=20
Sent: Tuesday, March 06, 2007 1:23 AM
Subject: [Vtd-xml-users] Storing parsing info
Hello,
First of all I would like to congratulate you on your project, I =
really think it's great.
Second, I want to use the java VTD-XML to do a certain task and =
I have succeeded but I don't know if I have done it in the right way, or =
there is a better one. Can you give me some advice?=20
I want to evaluate some XPath expressions on a lot of files of =
this size and larger, so the memory eficiency is critical. The first =
idea that comes to my mind is to have a VTDGen object for each XML file =
but this solution leads to having all the XMLs loaded in memory in the =
"protected byte[] XMLDoc;" attribute in VTDGen class. So each time I =
have to evaluate a XPath expression in a XML file I have to read the xml =
file, parse it, evaluate XPath and set to null the VTDGen object to get =
the memory freed by the garbage collector.=20
I have obtained these results reading a big XML file (~100Mb):
360 ms reading file
1890 ms parsing file
32 ms evaluating a XPath expression
93 ms showing results
total =3D 2375 milliseconds
Where the second step ("parsing file") means:
VTDGen vg =3D new VTDGen();
vg.setDoc(b);
vg.parse(true);
To speed up the process I have stored the parsing information in =
a file. After that I can read the XML file and the parsing information =
file, evaluate the XPath expression and close everything again in a =
shorter time:=20
344 ms reading the file
422 ms reading parsing information
125 ms evaluating a XPath expression
93 ms showing results
total =3D 984
I think the result is good enough but maybe there's a better =
solution than mine. I have stored the parsing info by serializing all =
the VTDGen object but the XMLDoc attribute. Then I retrieve the object =
from disk and I set the XMLDoc attribute. This way:=20
ObjectInputStream ois =3D new ObjectInputStream(new =
FileInputStream(PARSING_INFO));
vg =3D (MyVTDGen) ois.readObject();
ois.close();
FileInputStream fis2 =3D new =
FileInputStream(TEST_XML);=20
byte[] b2 =3D new byte[(int) f.length()];
fis2.read(b2);
vg.setXML(b2); //This method only sets the XMLDoc =
attribute
Is this solution good? Is there a better one? Can "Buffer reuse" =
solve my probem?=20
best regards,
Fernando
------------------------------------------------------------------------
=
-------------------------------------------------------------------------=
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance =
to share your
opinions on IT & business topics through brief surveys-and earn =
cash
=
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV=20
------------------------------------------------------------------------
_______________________________________________
Vtd-xml-users mailing list
Vtd...@li...
https://lists.sourceforge.net/lists/listinfo/vtd-xml-users
|