Re: [Vtd-xml-users] Storing parsing info

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 3/20/07, Jimmy Zhang <cra...@co...> wrote:
>
>  So what you are trying to accomplish is to load all the GML docs into
> memory at once...
> I guess you can simply index all those files to avoid parsing...
> but I still don't seem to understand the benefits of read teh parse info
> and a chunk of
> the XML file..
>

Quite near. What I need is to access a random feature at any time with as a
low cost as possible. That could be possible loading all the GML docs in
memory but the GML files are very big so I cannot do it.

As that solution wasn't suitable to my problem, I thought of opening one
file each time (using buffer reuse) and then it came to my mind that I could
save parsing time storing the parse info. As I told before I cannot delete
the GML. Storing the GML twice will waste disk space. I'm talking about an
environment where the user can have in his computer a lot of digital
cartography. Disk space is quite a bottle neck. It could be valid, but
storing only the parse info was so easy that I did it and I obtained a
better solution (for my environment).

There is a use case where the user doesn't work with the files directly, but
with a spatial region. In this case, the GML files and other spatial data
are "layers", so the user can work at the same time with a lot of files.
These files can be in other formats than GML, satellite images, different
raster or vectorial formats; and these can bring the system to a even more
memory constrained situation. That's what lead me to load chunks of the GML
file.

The workflow is the following
* I open a file with the chunk approach
* I parse the file (loading it with the chunks approach takes a lot, but no
problem)
* I store the parse info
The user asks for information:
* I load the parse info
* I load the chunk
* I return the asked information

I want to speed up the asking of information because the user can ask for a
map image with 20 GML files, and the map code is something like this:

for each gml file
  guess what "features" are inside the map bounds (GML is indexed spatially
previously)
  get those features from the GML (random access) (load parse info + load
chunk + return info)
  draw the features on a image
next gml file

Maybe this will make things a bit clearer. This screenshot (
http://www.gvsig.gva.es/fileadmin/conselleria/images/Documentacion/capturas/raster_shp_dgn_750.gif)
shows a program that uses the library. You can see on the left all the
loaded (from the user point of view) files: four "dgn" files, one "shp" and
seven "ecw" files. A lot of operations done in the map are done over *every*
file listed on the left so I don't care how much time it takes to put all
those files on the left (generating parse info, etc). I care how much time
takes to read the information after they are loaded (again, from the user
point of view).

Well, I hope it's clear enough. Notice that I'm not proposing changing the
way VTD-XML works but I'm proposing to add new ways.

greetings,
Fernando

----- Original Message -----
> *From:* Fernando Gonzalez <fer...@gm...>
> *To:* vtd...@li...
> *Sent:* Monday, March 19, 2007 2:56 AM
> *Subject:* Re: [Vtd-xml-users] Storing parsing info
>
> Well, jeje, the computer is new but I don't think my disk is so fast. I
> think Java or the operating system has to cache something because the first
> time I load the file it takes a bit more than 2 seconds and after the first
> load, it only takes 300ms to read the file...
> I have no experience on doing benchmarks and maybe I'm am missing
> something. That's why I attached the program.
>
> "So if you can't delete the orginal XML files, can you compress them and store
> them away (archiving)?"
> I cannot delete nor archive the GML file because in this context it won't
> be rare to be reading it from two different programs at the same time...
> It's difficult to find an open source program that does everything you need.
> For example, in a development context, there may be a map server serving a
> map image based on a GML file while you are opening it to see some data in
> it.
>
> "The other issue you raised is buffer reuse. To reuse internal buffers of VTDGen,
> you can call setDoc_BR(...). But there is more you can do...
> you can in fact reuse the byte array containing the XML document."
> Buffer reuse absolutly solves my memory constraints. But the problem I see
> with buffer reuse is that it will force me to read and parse the whole XML
> file every time the user ask for information on another XML file, won't it?
> If I read the XML file by chunks and I store/read the parse information,
> each time the user asks for information on another XML file I only have to
> read the parse info and a chunk of the XML file.
>
> To show you my point of view:
> The "user asking for another XML file" may be a map server that reads some
> big GML files and draws its spatial information in a map image. If each time
> the map server draws a GML file and "changes" to the next it takes 2 seconds
> or so, the drawing of the map (all the GML files) takes too much time.
>
> best regards,
> Fernando
>
>
> On 3/19/07, Jimmy Zhang <cra...@co...> wrote:
> >
> >
> > What intrigues me with Fernando's test results is that it only takes
> > 300ms to read a 100MB
> > file? He got a super fast disk...
> >
> > ----- Original Message -----
> > *From:* Rodrigo Cunha <rn...@gm...>
> > *To:* Jimmy Zhang <cra...@co...>
> > *Cc:* Fernando Gonzalez <fer...@gm...> ; vtd...@li...
> >
> > *Sent:* Sunday, March 18, 2007 8:40 PM
> > *Subject:* Re: [Vtd-xml-users] Storing parsing info
> >
> > In fact the idea occured to me in the past also... but VTD is so fast
> > reading large files anyway! With a fast processor I think we might be
> > disk-limited rather than processor-limited. Still, if the code is made
> > already, the option seems cute enought to keep :-)
> >
> > Since I mainly deal with large files requiring a lots of processing this
> > has not been an issue. Others, in different environments, might disagree.
> >
> > Jimmy Zhang wrote:
> >
> > Fernando,  The option for storing VTD in a separate file  is open.
> > I attached  the technical document from your last email, and am also
> > interested in the suggestions/comments from the mailing list ...
> >
> >
> >
>  ------------------------------
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
> ------------------------------
>
> _______________________________________________
> Vtd-xml-users mailing list
> Vtd...@li...
> https://lists.sourceforge.net/lists/listinfo/vtd-xml-users
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Vtd-xml-users mailing list
> Vtd...@li...
> https://lists.sourceforge.net/lists/listinfo/vtd-xml-users
>
>