vtd-xml-users Mailing List for VTD-XML: The Future of XML Processing (Page 31)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ok, I see... it seems that you can be sure that the "chunks" of GML =
files contain what the user would need...
But in general, if the chunks don't have what one is looking for, you =
will have to load in another chunk... then
another chunk.. that could mean a lot of disk activities
As an alternative, would it be possible to split GML into little chunks =
of well-formed GML files, then index
them individually.=20
So instead of dealing with 10 big GML files, split them into 100 smaller =
GML files and the algorithm you describe
 may still work..
  ----- Original Message -----=20
  From: Fernando Gonzalez=20
  To: vtd...@li...=20
  Sent: Tuesday, March 20, 2007 2:39 AM
  Subject: Re: [Vtd-xml-users] Storing parsing info

  On 3/20/07, Jimmy Zhang <cra...@co...> wrote:
    So what you are trying to accomplish is to load all the GML docs =
into memory at once...
    I guess you can simply index all those files to avoid parsing...
    but I still don't seem to understand the benefits of read teh parse =
info and a chunk of
    the XML file..

  Quite near. What I need is to access a random feature at any time with =
as a low cost as possible. That could be possible loading all the GML =
docs in memory but the GML files are very big so I cannot do it.=20

  As that solution wasn't suitable to my problem, I thought of opening =
one file each time (using buffer reuse) and then it came to my mind that =
I could save parsing time storing the parse info. As I told before I =
cannot delete the GML. Storing the GML twice will waste disk space. I'm =
talking about an environment where the user can have in his computer a =
lot of digital cartography. Disk space is quite a bottle neck. It could =
be valid, but storing only the parse info was so easy that I did it and =
I obtained a better solution (for my environment).

  There is a use case where the user doesn't work with the files =
directly, but with a spatial region. In this case, the GML files and =
other spatial data are "layers", so the user can work at the same time =
with a lot of files. These files can be in other formats than GML, =
satellite images, different raster or vectorial formats; and these can =
bring the system to a even more memory constrained situation. That's =
what lead me to load chunks of the GML file.

  The workflow is the following
  * I open a file with the chunk approach
  * I parse the file (loading it with the chunks approach takes a lot, =
but no problem)
  * I store the parse info=20
  The user asks for information:
  * I load the parse info
  * I load the chunk
  * I return the asked information

  I want to speed up the asking of information because the user can ask =
for a map image with 20 GML files, and the map code is something like =
this:

  for each gml file
    guess what "features" are inside the map bounds (GML is indexed =
spatially previously)
    get those features from the GML (random access) (load parse info + =
load chunk + return info)=20
    draw the features on a image
  next gml file

  Maybe this will make things a bit clearer. This screenshot =
(http://www.gvsig.gva.es/fileadmin/conselleria/images/Documentacion/captu=
ras/raster_shp_dgn_750.gif) shows a program that uses the library. You =
can see on the left all the loaded (from the user point of view) files: =
four "dgn" files, one "shp" and seven "ecw" files. A lot of operations =
done in the map are done over *every* file listed on the left so I don't =
care how much time it takes to put all those files on the left =
(generating parse info, etc). I care how much time takes to read the =
information after they are loaded (again, from the user point of view).

  Well, I hope it's clear enough. Notice that I'm not proposing changing =
the way VTD-XML works but I'm proposing to add new ways.

  greetings,
  Fernando =20

      ----- Original Message -----=20
      From: Fernando Gonzalez=20
      To: vtd...@li...=20
      Sent: Monday, March 19, 2007 2:56 AM
      Subject: Re: [Vtd-xml-users] Storing parsing info

      Well, jeje, the computer is new but I don't think my disk is so =
fast. I think Java or the operating system has to cache something =
because the first time I load the file it takes a bit more than 2 =
seconds and after the first load, it only takes 300ms to read the =
file...=20
      I have no experience on doing benchmarks and maybe I'm am missing =
something. That's why I attached the program.

      "So if you can't delete the orginal XML files, can you compress =
them and=20
      store them away (archiving)?"

      I cannot delete nor archive the GML file because in this context =
it won't be rare to be reading it from two different programs at the =
same time... It's difficult to find an open source program that does =
everything you need. For example, in a development context, there may be =
a map server serving a map image based on a GML file while you are =
opening it to see some data in it.=20

      "The other issue you raised is buffer reuse. To reuse internal =
buffers of=20
      VTDGen, you can call setDoc_BR(...). But there is more you can =
do...
      you can in fact reuse the byte array containing the XML document."
      Buffer reuse absolutly solves my memory constraints. But the =
problem I see with buffer reuse is that it will force me to read and =
parse the whole XML file every time the user ask for information on =
another XML file, won't it? If I read the XML file by chunks and I =
store/read the parse information, each time the user asks for =
information on another XML file I only have to read the parse info and a =
chunk of the XML file.=20

      To show you my point of view:
      The "user asking for another XML file" may be a map server that =
reads some big GML files and draws its spatial information in a map =
image. If each time the map server draws a GML file and "changes" to the =
next it takes 2 seconds or so, the drawing of the map (all the GML =
files) takes too much time.=20

      best regards,
      Fernando

      On 3/19/07, Jimmy Zhang <cra...@co...> wrote:=20

        What intrigues me with Fernando's test results is that it only =
takes 300ms to read a 100MB
        file? He got a super fast disk...
          ----- Original Message -----=20
          From: Rodrigo Cunha=20
          To: Jimmy Zhang=20
          Cc: Fernando Gonzalez ; vtd...@li...=20
          Sent: Sunday, March 18, 2007 8:40 PM
          Subject: Re: [Vtd-xml-users] Storing parsing info

          In fact the idea occured to me in the past also... but VTD is =
so fast reading large files anyway! With a fast processor I think we =
might be disk-limited rather than processor-limited. Still, if the code =
is made already, the option seems cute enought to keep :-)

          Since I mainly deal with large files requiring a lots of =
processing this has not been an issue. Others, in different =
environments, might disagree.

          Jimmy Zhang wrote:=20
            Fernando,  The option for storing VTD in a separate file  is =
open.=20
            I attached  the technical document from your last email, and =
am also=20
            interested in the suggestions/comments from the mailing list =
...=20

-------------------------------------------------------------------------=
-

      =
-------------------------------------------------------------------------=

      Take Surveys. Earn Cash. Influence the Future of IT
      Join SourceForge.net's Techsay panel and you'll get the chance to =
share your
      opinions on IT & business topics through brief surveys-and earn =
cash
      =
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV=20

-------------------------------------------------------------------------=
-

      _______________________________________________
      Vtd-xml-users mailing list
      Vtd...@li...
      https://lists.sourceforge.net/lists/listinfo/vtd-xml-users

    =
-------------------------------------------------------------------------=

    Take Surveys. Earn Cash. Influence the Future of IT
    Join SourceForge.net's Techsay panel and you'll get the chance to =
share your=20
    opinions on IT & business topics through brief surveys-and earn cash
    =
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV
    _______________________________________________
    Vtd-xml-users mailing list
    Vtd...@li...
    https://lists.sourceforge.net/lists/listinfo/vtd-xml-users=20

-------------------------------------------------------------------------=
-----

  =
-------------------------------------------------------------------------=

  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to =
share your
  opinions on IT & business topics through brief surveys-and earn cash
  =
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV

-------------------------------------------------------------------------=
-----

  _______________________________________________
  Vtd-xml-users mailing list
  Vtd...@li...
  https://lists.sourceforge.net/lists/listinfo/vtd-xml-users

2006	Jan	Feb	Mar	Apr	May (2)	Jun (6)	Jul (21)	Aug (40)	Sep (7)	Oct (41)	Nov (52)	Dec (19)
2007	Jan (49)	Feb (37)	Mar (84)	Apr (11)	May (29)	Jun (9)	Jul (19)	Aug (9)	Sep (6)	Oct (5)	Nov (15)	Dec (3)
2008	Jan (7)	Feb (11)	Mar (25)	Apr (50)	May (7)	Jun (8)	Jul (10)	Aug (18)	Sep (1)	Oct (15)	Nov (1)	Dec (9)
2009	Jan (5)	Feb (2)	Mar (3)	Apr (5)	May (10)	Jun (4)	Jul (5)	Aug (5)	Sep (7)	Oct (15)	Nov (13)	Dec (6)
2010	Jan	Feb (3)	Mar (4)	Apr (6)	May	Jun (4)	Jul (12)	Aug (8)	Sep	Oct (1)	Nov (1)	Dec (1)
2011	Jan (19)	Feb (39)	Mar (28)	Apr (6)	May (7)	Jun (9)	Jul	Aug (1)	Sep	Oct (8)	Nov (3)	Dec (12)
2012	Jan (2)	Feb (1)	Mar (3)	Apr (4)	May (4)	Jun (3)	Jul (10)	Aug (2)	Sep (13)	Oct (24)	Nov (3)	Dec (1)
2013	Jan (11)	Feb (5)	Mar (4)	Apr (3)	May (3)	Jun (5)	Jul (7)	Aug (16)	Sep	Oct (7)	Nov (11)	Dec
2014	Jan (7)	Feb (4)	Mar	Apr	May (4)	Jun	Jul	Aug (1)	Sep (3)	Oct	Nov (3)	Dec
2015	Jan	Feb	Mar (1)	Apr (11)	May (8)	Jun (3)	Jul (1)	Aug (3)	Sep (5)	Oct (2)	Nov (1)	Dec (1)
2016	Jan (1)	Feb (1)	Mar	Apr (3)	May (7)	Jun	Jul	Aug	Sep	Oct (6)	Nov	Dec
2017	Jan	Feb	Mar (5)	Apr	May (2)	Jun	Jul (4)	Aug	Sep (2)	Oct	Nov	Dec
2018	Jan	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (2)	Dec
2019	Jan (1)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

vtd-xml-users Mailing List for VTD-XML: The Future of XML Processing (Page 31)

vtd-xml-users — This mailing list is for users or people interested in VTD-XML