Speed Similarity

  • Paul-Armand Verhaegen

    As a note to self (and others):

    I've experienced large differences in speed when calculating similarities between large amounts of words given as a --file option to similarity.pl. In my case the every words was compared to every other word, e.g. of the file:

    roll roll
    roll cutting
    roll feeding
    roll length
    cutting cutting
    cutting feeding
    cutting length
    feeding feeding
    feeding length

    Dont know why, but it is much faster if you sort the file according to the second field (sort +1 -2 inputfile > outputfile).

    Best regards,
    Paul-Armand Verhaegen

    • Ted Pedersen

      Ted Pedersen - 2009-07-04

      This is a fascinating observation, and off the top of my head I can't explain this behavior...but, we'll check into it!



Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks