From: Rodrigo C. <rn...@gm...> - 2007-02-14 01:40:42
|
Hello there! About 10 months ago I started a topic on the discussion forum concerning the need for true random access and location storing in VTD-XML. Currently we only have a pop/push interface. At the time I had no compelling reasons to advice such a change, from a pure stack-oriented approach into a more flexible one. I chenged the API, but my changes where not inserted into the project, due to lack of compelling reasons, and a somewhat bad design also. Now, after using VTD-XML for a few months to work with huge and complex files I have a reason: position caching. Let me give an example, taken from a real problem I faced: <document> [...] [...] <nes> <ne> <name>XPTO</name> [...]complex structure describing NE[...] <level1> <level2> [....] <nice_indexing_atribute> </level2> </level1> </ne> [...] a bunch of NEs... [...] [...] [...] [...] </nes> [...] [...] [...] <tpaths> <tpath> [...]complex structure describing tpath[...] <level1> <a few more levels> <level4> [....] <nice_indexing_atribute> <pointer to nice NE atribute> [....] </level2> </level1> <level1> <a few more levels> <level4> [....] <pointer to nice NE atribute> [....] </level2> </level1> </tpath> [...] more paths... [...] </tpaths> [...] </document> In order to navigate the file efficiently and produce interactive results I was forced to maintain positions caches for both <ne> and <tpath> indexed by those nice very-inner atributes. For example, a task that took 36 seconds using unhelped navigation can now be done in 1 or 2 seconds. I had previously changed the API to allow multiple stacks, and context-export, but as previously mentioned keeping a context unrelated to a VTDNav object makes not much sense. Perhaps a better operation would be something like: NavContext VTDNav.getCtx(); // sends back a context boolean VTDNav.setPos(NavContext ctx); // sets internal navigation registers from context VTDNav NavContext.getNav(); // gets the VTDNav object this context belongs to The Context would internally point at a VTDNav, so that they could check each other when they need. An exception could be generated if a non-related context is used in setPos, or simply "false" could be returned. Addicionally contexts should suport some interfaces so that they can be kept in hash tables efficiently, for example... but that's not a problem normally. I'm currenly using this kind of approach to caching and true random with my previous interface that exported multiple stacks, but that's cumbersome, heavy and prone to errors. A lighter interface like this one i'm proposing now, and better implemented, would be way better, and cleaner also. Any comments? -- Rodrigo Cunha |