a mix of brute force, hand-made rules and machine learning:
> brute force lexicon lookup: in super class "NamedEntityRecognition.java"
> hand-made rule system: in "NamedEntityRecognitionNerf.java" step 1 to 10 - except step 6
> machine learning: in "NamedEntityRecognitionNerf.java" step 6
excerpt from "NamedEntityRecognitionNerf.java"
// 1. check for entities ambiguous with common nouns
CheckEntityNounAmbiguityNerf();
// 2. temporary map final types
m_TokenList.MapNewNETypes(m_Mapping);
// 3. check adjacent entities sharing a super-type (e.g., a first name and a last name)
PreCheckEntityBoundaryNerf();
// 4. revert to intermediate types mapping
m_TokenList.MapNewNETypes(intermediateTagSet);
// 5. resolve alias network and fix some entity types
CheckEntityEntityAmbiguityNerf();
// 6. apply classifier if more than one type remains
ApplyEntityEntityClassifiersNerf();
// 7. apply defensive rules for very ambiguous types
CheckVeryAmbiguousTypes();
// 8. map final types
m_TokenList.MapNewNETypes(m_Mapping);
// 9. check boundaries with disambiguated types
PostCheckEntityBoundaryNerf();
//10. resolve unknown capitalized words in alias network
CheckUnknownCapitalizedExt();
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How does Balie extract it's entities? Does it do it via an algorithm or is it a bruce force through the lexicon lists provided?
Hello,
a mix of brute force, hand-made rules and machine learning:
> brute force lexicon lookup: in super class "NamedEntityRecognition.java"
> hand-made rule system: in "NamedEntityRecognitionNerf.java" step 1 to 10 - except step 6
> machine learning: in "NamedEntityRecognitionNerf.java" step 6
excerpt from "NamedEntityRecognitionNerf.java"
// 1. check for entities ambiguous with common nouns
CheckEntityNounAmbiguityNerf();
// 2. temporary map final types
m_TokenList.MapNewNETypes(m_Mapping);
// 3. check adjacent entities sharing a super-type (e.g., a first name and a last name)
PreCheckEntityBoundaryNerf();
// 4. revert to intermediate types mapping
m_TokenList.MapNewNETypes(intermediateTagSet);
// 5. resolve alias network and fix some entity types
CheckEntityEntityAmbiguityNerf();
// 6. apply classifier if more than one type remains
ApplyEntityEntityClassifiersNerf();
// 7. apply defensive rules for very ambiguous types
CheckVeryAmbiguousTypes();
// 8. map final types
m_TokenList.MapNewNETypes(m_Mapping);
// 9. check boundaries with disambiguated types
PostCheckEntityBoundaryNerf();
//10. resolve unknown capitalized words in alias network
CheckUnknownCapitalizedExt();