Menu

#118 test if MAX_COMBINATIONS can be increased without too much slow-down

closed
nobody
2017-04-18
2017-03-03
No

in lttoolbox/fst_processor.cc:763 and on we have

FSTProcessor::compoundAnalysis(wstring input_word, bool uppercase, bool firstupper) {
  const int MAX_COMBINATIONS = 500;
  …
    if(current_state.size() > MAX_COMBINATIONS) {

Was this limit picked out of the air, or well-tested? Presumably computers are faster now than it was first written too, so it may be ripe for increasing. So we should compare time and memory usage with e.g. 500 vs 1000 vs 2000 vs 4000 on a large corpus and with different, large analysers (nob, deu, others?), and possibly increase to a size that doesn't hurt too much.

Discussion

  • Venkat Parthasarathy

    I got an idea of the source code and what I understood was that this function (compoundAnalysis) is called when lt-proc -e is called. So, what this task requires us to do is run lt-comp on a standard dictionary and lt-proc on a large corpus (cat large_corpus | lt-proc -e bin_file_generated.bin) and compare the time and memory usage for different values for the constant MAX_COMBINATIONS. I don't understand how analysers play a role. Can you please elaborate on that?

     

    Last edit: Venkat Parthasarathy 2017-03-13
    • Kevin Brubeck Unhammer

      Your file_generated.bin is a morphological analyser compiled as a finite state transducer. They're typically named things like deu.automorf.bin (for apertium-deu)

       
    • Venkat Parthasarathy

      So, we consider large analysers by compiling those dictionaries (like apertium-deu.deu.dix or apertium-nob.nob.dix)?

       
      • Kevin Brubeck Unhammer

        apertium-get apertium-deu (or -nob) will give you that (if you have apertium-all-dev installed)

         

        Last edit: Kevin Brubeck Unhammer 2017-03-13
  • Flammie Pirinen

    Flammie Pirinen - 2017-04-03

    I routinely run dewiki through apertium-deu if you need testings.

     
  • Flammie Pirinen

    Flammie Pirinen - 2017-04-13

    I did some unscientific testing with apertium-deu & dewiki and it seems to me that MAX_COMBINATIONS does not set a bottleneck regardless of how high it is.

     
  • Flammie Pirinen

    Flammie Pirinen - 2017-04-18
    • status: open --> pending
     
  • Flammie Pirinen

    Flammie Pirinen - 2017-04-18

    Fix3ed?

     
  • Kevin Brubeck Unhammer

    • status: pending --> closed
     
  • Kevin Brubeck Unhammer

    It seems to have been scientificly tested.

     

Log in to post a comment.