Is below a typo on FileAccessData?  Should it have been AccessData uses main memory?
 
I am using FileAccessData to keep track between crawls.
 
Should I change my FileAccessData to RepositoryDataAccess?
 
From: Christiaan Fluit <christiaan.fluit@ad...> - 2007-03-16 03:28
> I'll see if I can find where the bug come from. But for now, FileAccessData
> is ok, I'm not even sure I need the ModelAccessData, maybe a DB stuff will
> be used later. I'll look for the bug but I'm new to this whole stack so I'm
> not sure I can :)

The big drawback of FileAccessData is that it keeps all information in
main memory. For crawling file systems this can be a lot but still
doable on a decent machine. For web crawling it will definitely not
scale because WebCrawler uses the AccessData instance to keep track of
the entire hypertext graph structure, i.e. not only the visited pages
but also the links between them. This can blow up your application quite
easily.

ModelAccessData works on any Model, i.e. also on a Model wrapping a
NativeStore. This prevents such large memory requirements.

Note that there is also a RepositoryAccessData class in src/examples,
which is an AccessData implementation operating directly on top of a
Repository, thus skipping the Model abstraction layer. You may want to
give that a try. We (Aduna) use our own tweaked version of this class.


Regards,

Chris


We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.