I’m trying to run Meta-blocking on DBpedia (Clean-Clean; 2009vs2007).
Notes:
I’m running on a Intel Xeon (Ten-Core) E5-2670v2 2.50 GHz with 48GB RAM.
Workflow:
TokenBlocking -> SizeBlockPruning (-> ComparisonBasedBlcokPruning) -> Meta-blocking WNP (Jaccard Similarity)
Since I had memory issues trying to yield the actual blocks with Meta-blocking, I’m using an OnTheFlyWNP (inspired by an answer you gave in another thread - I attach the file).
Now, the problem is that the running time is huge. I see from your thesis that the running time should be ~10 hours for the Materialisation + Restructure Time; but when I run in my setting it takes more than 24 hours (I limited to 24 hours the execution time, so I can’t give the final running time).
Additional notes:
After the ComparisonBasedBlockPruning the aggregate cardinality is ~3.5E10, and it is even less than 5.68E10 that is the baseline in your thesis. So, maybe I’m missing something.
Thank you,
Giovanni
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
for some reason, I saw your post just now. Do you still have problems with Meta-blocking? If yes, let me know how I can help you. Yesterday, I uploaded a new version of Meta-blocking that is much faster (https://sourceforge.net/p/erframework/svn/HEAD/tree/trunk/BlockingFramework/src/MetaBlocking/FastImplementations/).
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "General Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
Hi George,
I’m trying to run Meta-blocking on DBpedia (Clean-Clean; 2009vs2007).
Notes:
I’m running on a Intel Xeon (Ten-Core) E5-2670v2 2.50 GHz with 48GB RAM.
Workflow:
TokenBlocking -> SizeBlockPruning (-> ComparisonBasedBlcokPruning) -> Meta-blocking WNP (Jaccard Similarity)
Since I had memory issues trying to yield the actual blocks with Meta-blocking, I’m using an OnTheFlyWNP (inspired by an answer you gave in another thread - I attach the file).
Now, the problem is that the running time is huge. I see from your thesis that the running time should be ~10 hours for the Materialisation + Restructure Time; but when I run in my setting it takes more than 24 hours (I limited to 24 hours the execution time, so I can’t give the final running time).
Additional notes:
After the ComparisonBasedBlockPruning the aggregate cardinality is ~3.5E10, and it is even less than 5.68E10 that is the baseline in your thesis. So, maybe I’m missing something.
Thank you,
Giovanni
Hi Giovanni,
for some reason, I saw your post just now. Do you still have problems with Meta-blocking? If yes, let me know how I can help you. Yesterday, I uploaded a new version of Meta-blocking that is much faster (https://sourceforge.net/p/erframework/svn/HEAD/tree/trunk/BlockingFramework/src/MetaBlocking/FastImplementations/).
Kind regards,
George
View and moderate all "General Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
The theoretical background of the new Meta-blocking implementation is explained in this paper:
https://www.researchgate.net/publication/290324691_Scaling_Entity_Resolution_to_Large_Heterogeneous_Data_with_Enhanced_Meta-blocking
Combining the new implementation with Block Filtering (also presented in this paper) reduces the overhead time for processing DBPedia to a couple of hours.