From: Albert-jan R. <fo...@ya...> - 2008-07-30 07:30:43
|
Hi Dinu, I also had MemoryErrors before with Febrl 0.3 and I resolved them by dividing the data. One of my blocking vars was date of birth so I chopped up my data in 12 pieces/months. Then I wrote a script to put the results back together. In my case, it was even a blessing in disguise, because it allowed me to run one session on 12 different computers, slashing the run time from 24h to something like 3h. HTH, Albert-Jan --- On Tue, 7/29/08, Dinu Corbu <d....@gr...> wrote: > From: Dinu Corbu <d....@gr...> > Subject: [Febrl-list] Isuues with Febrl 4.02 > To: feb...@li... > Date: Tuesday, July 29, 2008, 11:07 AM > <div id=yiv732357793><font face="Default Sans > Serif,Verdana,Arial,Helvetica,sans-serif" > size="2"><div><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" size="3">Hi > there,</font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font size="3"><font > face="Default Serif,Times New > Roman,Times,serif">I did some work on exploring > Febrl version 4.02 and the record linking literature, in > view of preparing a deduplication project that I have to > complete soon. <span > style=""> </span></font></font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" size="3">Not > everything went well, and I hope somebody could help me > with some useful answers to the next list of > issues:</font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><ol > style="MARGIN-TOP:0cm;" type="1"> > <li class="MsoNormal" style="MARGIN:0cm > 0cm 0pt;"><font face="Default Serif,Times > New Roman,Times,serif" size="3">It > appeared that the installation process was successful. > However, when starting Febrl, the following message > appears on the shell that opens behind GUI: > </font></li></ol><p > class="MsoNormal" style="MARGIN:0cm 0cm 0pt > 18pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm 0pt > 18pt;"><font size="3"><font > face="Default Serif,Times New > Roman,Times,serif"><span > style=""> > </span><b style="">WARNING: root: > Cannot import Numeric and PyML modules > </b></font></font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm 0pt > 18pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><ol > style="MARGIN-TOP:0cm;" type="1" > start="2"> <li class="MsoNormal" > style="MARGIN:0cm 0cm 0pt;"><font > size="3"><font face="Default > Serif,Times New Roman,Times,serif"><span > style=""> </span>When trying to run > a de-duplication project on a dataset of 1,069,472, it > progressed till 7%, and after that Febrl got blocked, the > shell behind GUI showing the message > “MemoryError”</font></font></li></ol><p > class="MsoNormal" style="MARGIN:0cm 0cm 0pt > 18pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><ol > style="MARGIN-TOP:0cm;" type="1" > start="3"> <li class="MsoNormal" > style="MARGIN:0cm 0cm 0pt;"><font > face="Default Serif,Times New Roman,Times,serif" > size="3">When trying to run a de-duplication > project on a dataset of 130,213, it progressed 61% and gave > the message > “MemoryError”</font></li></ol><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><ol > style="MARGIN-TOP:0cm;" type="1" > start="4"> <li class="MsoNormal" > style="MARGIN:0cm 0cm 0pt;"><font > size="3"><font face="Default > Serif,Times New Roman,Times,serif">When playing > with the data set of 10,000 records > “dataset_A_10000.csv” that is in > instatllation folder > “C:\Febrl4\febrl-0.4.02\data\dedup-dsgen”, > everything worked. However, I verified the results of > several comparison functions (among them Jaro and > Winkler), and I found that Febrl gave values that were > lower than what I computed.<span > style=""> > </span></font></font></li></ol><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" size="3">I would be > grateful to anyone who can give me some advice on how these > issues can be solved.</font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3">Regards,</font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3">Dinu</font></p><p > class="MsoNormal" style="MARGIN:0cm 0cm > 0pt;"><font face="Default Serif,Times New > Roman,Times,serif" > size="3"> </font></p><br><div>---------------------------------------------------------------------<br>Dinu Corbu<br>Senior Research Assistant<br><br>Ph. +61 (07) 373 55600 <br>Fax +61 (07) 373 56812<br><br>Key Centre for Ethics, Law, Justice and Governance<br>Griffith University<br>Mt Gravatt campus<br>Messines Ridge Road, Mt Gravatt, QLD, 4122, Australia<br>---------------------------------------------------------------------<br></div></div></font> > </div>------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge > Build the coolest Linux based applications with Moblin SDK > & win great prizes > Grand prize is a trip for two to an Open Source event > anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________ > Febrl-list mailing list > Feb...@li... > https://lists.sourceforge.net/lists/listinfo/febrl-list |