Menu

Speed Similarity

Help
2009-03-17
2013-01-23
  • Paul-Armand Verhaegen

    As a note to self (and others):

    I've experienced large differences in speed when calculating similarities between large amounts of words given as a --file option to similarity.pl. In my case the every words was compared to every other word, e.g. of the file:

    roll roll
    roll cutting
    roll feeding
    roll length
    ...
    cutting cutting
    cutting feeding
    cutting length
    ...
    feeding feeding
    feeding length
    ...

    Dont know why, but it is much faster if you sort the file according to the second field (sort +1 -2 inputfile > outputfile).

    Best regards,
    Paul-Armand Verhaegen

     
    • Ted Pedersen

      Ted Pedersen - 2009-07-04

      This is a fascinating observation, and off the top of my head I can't explain this behavior...but, we'll check into it!

      Cordially,
      Ted

       

Log in to post a comment.