From: Dinu C. <d....@gr...> - 2008-07-29 09:07:39
|
<FONT face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif" size=2><DIV><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>Hi there,</FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><FONT face="Default Serif,Times New Roman,Times,serif">I did some work on exploring Febrl version 4.02 and the record linking literature, in view of preparing a deduplication project that I have to complete soon. <SPAN style="mso-spacerun: yes"> </SPAN></FONT></FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>Not everything went well, and I hope somebody could help me with some useful answers to the next list of issues:</FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><OL style="MARGIN-TOP: 0cm" type=1> <LI class=MsoNormal style="MARGIN: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>It appeared that the installation process was successful. However, when starting Febrl, the following message appears on the shell that opens behind GUI: </FONT></LI></OL><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt"><FONT size=3><FONT face="Default Serif,Times New Roman,Times,serif"><SPAN style="mso-spacerun: yes"> </SPAN><B style="mso-bidi-font-weight: normal">WARNING: root: Cannot import Numeric and PyML modules <o:p></o:p></B></FONT></FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><OL style="MARGIN-TOP: 0cm" type=1 start=2> <LI class=MsoNormal style="MARGIN: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt"><FONT size=3><FONT face="Default Serif,Times New Roman,Times,serif"><SPAN style="mso-spacerun: yes"> </SPAN>When trying to run a de-duplication project on a dataset of 1,069,472, it progressed till 7%, and after that Febrl got blocked, the shell behind GUI showing the message “MemoryError”</FONT></FONT></LI></OL><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><OL style="MARGIN-TOP: 0cm" type=1 start=3> <LI class=MsoNormal style="MARGIN: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>When trying to run a de-duplication project on a dataset of 130,213, it progressed 61% and gave the message “MemoryError”</FONT></LI></OL><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><OL style="MARGIN-TOP: 0cm" type=1 start=4> <LI class=MsoNormal style="MARGIN: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt"><FONT size=3><FONT face="Default Serif,Times New Roman,Times,serif">When playing with the data set of 10,000 records “dataset_A_10000.csv” that is in instatllation folder “C:\Febrl4\febrl-0.4.02\data\dedup-dsgen”, everything worked. However, I verified the results of several comparison functions (among them Jaro and Winkler), and I found that Febrl gave values that were lower than what I computed.<SPAN style="mso-spacerun: yes"> </SPAN></FONT></FONT></LI></OL><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>I would be grateful to anyone who can give me some advice on how these issues can be solved.</FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>Regards,</FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face="Default Serif,Times New Roman,Times,serif" size=3>Dinu</FONT></P><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Default Serif,Times New Roman,Times,serif" size=3> </FONT></o:p></P><BR><DIV>---------------------------------------------------------------------<BR>Dinu Corbu<BR>Senior Research Assistant<BR><BR>Ph. +61 (07) 373 55600 <BR>Fax +61 (07) 373 56812<BR><BR>Key Centre for Ethics, Law, Justice and Governance<BR>Griffith University<BR>Mt Gravatt campus<BR>Messines Ridge Road, Mt Gravatt, QLD, 4122, Australia<BR>---------------------------------------------------------------------<BR></DIV></DIV></FONT> |