Menu

List of entities that are not part of the purged blocks

Gerard
2014-10-19
2014-10-21
  • Gerard

    Gerard - 2014-10-19

    Hi George,

    Is there an easy way of constructing a list of entity profiles that were not recommended for a match?

    For example, let's take a dataset with 100 entities. After BlockingFramework run, we get a list of 60 entities that are recommend for further entity matching process to decide if they are duplicates or not.

    I am looking for a list of those 40 entities that are not part of the purged blocks. Does the program keep track of this list or do I need to write a method to exclude the final entities from the original entity profiles to get the list of unmatched entities?

    Best Regards,

    Gerard

     
  • Anonymous

    Anonymous - 2014-10-21

    Hi Gerard,

    the method Utilities.BlockStatistics.getEntities() does the opposite of what you are describing. Given a block collection, it identifies the entities that are placed in at least one block. The entity ids of the profiles you are looking for are those that increment the variables singletonEntities (Dirty ER) and singletonEntitiesD1, singletonEntitiesD2 (Clean-Clean ER).

    Hope this helps.

    Best regards,
    George

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.