Menu

NER algorithm

Help
2008-02-18
2013-04-01
  • Nobody/Anonymous

    How does Balie extract it's entities?  Does it do it via an algorithm or is it a bruce force through the lexicon lists provided?

     
    • Nobody/Anonymous

      Hello,

      a mix of brute force, hand-made rules and machine learning:

      > brute force lexicon lookup: in super class "NamedEntityRecognition.java"
      > hand-made rule system: in "NamedEntityRecognitionNerf.java" step 1 to 10 - except step 6
      > machine learning: in "NamedEntityRecognitionNerf.java" step 6

      excerpt from "NamedEntityRecognitionNerf.java"

                 // 1. check for entities ambiguous with common nouns
                  CheckEntityNounAmbiguityNerf();

                  // 2. temporary map final types
                  m_TokenList.MapNewNETypes(m_Mapping);
                 
                  // 3. check adjacent entities sharing a super-type (e.g., a first name and a last name)
                  PreCheckEntityBoundaryNerf();
                 
                  // 4. revert to intermediate types mapping
                  m_TokenList.MapNewNETypes(intermediateTagSet);

                  // 5. resolve alias network and fix some entity types
                  CheckEntityEntityAmbiguityNerf();

                  // 6. apply classifier if more than one type remains
                  ApplyEntityEntityClassifiersNerf();
                 
                  // 7. apply defensive rules for very ambiguous types
                  CheckVeryAmbiguousTypes();
                 
                  // 8. map final types
                  m_TokenList.MapNewNETypes(m_Mapping);
                 
                  // 9. check boundaries with disambiguated types
                  PostCheckEntityBoundaryNerf();
                 
                  //10. resolve unknown capitalized words in alias network
                  CheckUnknownCapitalizedExt();

       

Log in to post a comment.