Is there an easy way of constructing a list of entity profiles that were not recommended for a match?
For example, let's take a dataset with 100 entities. After BlockingFramework run, we get a list of 60 entities that are recommend for further entity matching process to decide if they are duplicates or not.
I am looking for a list of those 40 entities that are not part of the purged blocks. Does the program keep track of this list or do I need to write a method to exclude the final entities from the original entity profiles to get the list of unmatched entities?
Best Regards,
Gerard
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the method Utilities.BlockStatistics.getEntities() does the opposite of what you are describing. Given a block collection, it identifies the entities that are placed in at least one block. The entity ids of the profiles you are looking for are those that increment the variables singletonEntities (Dirty ER) and singletonEntitiesD1, singletonEntitiesD2 (Clean-Clean ER).
Hope this helps.
Best regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi George,
Is there an easy way of constructing a list of entity profiles that were not recommended for a match?
For example, let's take a dataset with 100 entities. After BlockingFramework run, we get a list of 60 entities that are recommend for further entity matching process to decide if they are duplicates or not.
I am looking for a list of those 40 entities that are not part of the purged blocks. Does the program keep track of this list or do I need to write a method to exclude the final entities from the original entity profiles to get the list of unmatched entities?
Best Regards,
Gerard
View and moderate all "General Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
Hi Gerard,
the method Utilities.BlockStatistics.getEntities() does the opposite of what you are describing. Given a block collection, it identifies the entities that are placed in at least one block. The entity ids of the profiles you are looking for are those that increment the variables singletonEntities (Dirty ER) and singletonEntitiesD1, singletonEntitiesD2 (Clean-Clean ER).
Hope this helps.
Best regards,
George