Menu

#308 hfst-proc --weight-classes seems to just pick the first analysis

future
open
nobody
1
2015-08-27
2015-08-27
No

As

    $ echo allaskuvllain guollebivdu| hfst-proc -W sme-nob.automorf.hfst |tr '/ ' '\n' 
    ^allaskuvllain
    alas<n><sem_body><sggencmp><cmp>+kuvlla<n><pl><loc>~10~
    alas<n><sem_body><sggencmp><cmp>+kuvlla<n><sg><com>~10~
    allaskuvla<n><sem_edu_org><pl><loc>~0~
    allaskuvla<n><sem_edu_org><sg><com>~0~
    allat<adj><attr><cmp>+skuvla<n><sem_edu_org><pl><loc>~10~
    allat<adj><attr><cmp>+skuvla<n><sem_edu_org><sg><com>~10~$
    ^guollebivdu
    guollebivdu<n><sem_act><sg><nom>~0~
    guolli<n><sem_ani><sgnomcmp><cmp>+bivdu<n><sem_act><sg><nom>~10~$

shows, this fst has weights that should lower the priority of any analysis with a <cmp> tag.

However, when we try to use hfst-proc's --weight-classes 1 to just select the best analyses, it seems to just pick the first appearing weight-class instead:

$ echo allaskuvllain guollebivdu| hfst-proc --weight-classes 1 sme-nob.automorf.hfst |tr '/ ' '\n' 
^allaskuvllain
alas<n><sem_body><sggencmp><cmp>+kuvlla<n><pl><loc>
alas<n><sem_body><sggencmp><cmp>+kuvlla<n><sg><com>$
^guollebivdu
guollebivdu<n><sem_act><sg><nom>$

(so for each word, it picks the first set of analyses that have the same weight?)

Discussion

  • Kevin Brubeck Unhammer

    hfst-proc/formatter.cc says

      for(int i=0;i<maxAnalyses && it != new_finals.end();i++,it++) {
          // this condition filters out > maxWeightClasses
          if(dynamic_cast<const LookupPathW*>(*it) != NULL) { // if we actually have weights
          Weight current_weight = dynamic_cast<const LookupPathW*>(*it)->get_weight();
          if (classes_found == -1) { // we're just starting
              classes_found = 1;
              last_weight_class = current_weight;
          } else if (last_weight_class != current_weight) { // we might want to ignore the rest due to weight classes
              last_weight_class = current_weight;
              ++classes_found;
          }
          if (classes_found > maxWeightClasses) {
              break; // don't insert any more
          }
          }
          clipped_finals.insert(*it);
    

    So it should've sorted the analyses by weight first.

     
  • Kevin Brubeck Unhammer

    should be fixed in -r4426

     
MongoDB Logo MongoDB