From: Tatu S. <cow...@ya...> - 2006-08-18 05:11:21
|
--- Jimmy Zhang <cra...@co...> wrote: > one of the early email from din sush asks about how > to split a large file > into smaller ones > > then I thought about it and felt that a better > solution (than current > VTD-XML or Woodstox) > can indeed be built.... > > the basic idea is to record only the offset and > length of an element when > splitting, so it doesn't > need to read the whole thing into memory like > VTD-XML did, nor does it need > to need to perform > decode/re-encoding and string creation like Pull.... > > after retaining the offset and length, just copy the > file segment into > separate files.... > > what do you guys think? For the specific splitting task that would be good. Maybe such utility could be written, perhaps being passed an Xpath expression defining where to split the file (like, defining root nodes of resulting docs?). And you could probably use much of VTD-XML code as core of such tool? That should be very fast & memory efficient solution. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |