Welcome to Help
I'm part of a project that wants to use your open source febrl project in order to deduplicate and link many addresses located on a db.
We would like to share with you our improvements (if any) and experience but we need some information because our machine hang up while deduplicating 5 milion records.
We are using one machine with two intel Xeon 2.4Ghz processors and 2 Gigabyte Ram running windows 2000 server O.S.. The hard disk where we put shelve files has 126 gigabyte free.
The records are on SQL Server on the same machine
We put on disk every dictionary that wrote to memory using shelve because during previous tests our machine was very slow during comparison step because of the use of ram memory.
During our 5 milion test the machine apparently stopped after nearly 19 hours ( the resulting shelve file size was 5 gigabyte) i.e. it bacame unstable and very very slow on answers so we killed the process.
During one 500 thousand deduplication test the loading step took 25 minutes so I thing the 5 milion loading shouldn't have last more than 5-6 hours.
Unfortunately we have no written log because we cut off every "useless" resource.
We would like to know if you have had experience of such number of records in order to know if the problem was febrl or the machine.
tanks in advance,
Simone Capra - italy