Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Best Metrics: question

wouter
2007-03-06
2013-04-25
  • wouter
    wouter
    2007-03-06

    Hello,

    we are currently developing a plagiarism detection tool and thinking about using the SimMetrics package for evaluating word similarity.
    The specific use we have in mind is to compare documents that hold very little text, e.g. slides

    Can you recommend something or give some thoughts?

    thanks

     
    • ReverendSam
      ReverendSam
      2007-03-09

      if very little text and you are checking against another similar submissions then it maybe a good idea but is this on a slide by slide basis or across a large number of slides compared to a large number of slides?

       
      • wouter
        wouter
        2007-03-09

        We would have a lot of slides in a database (we use lucene) and then the idea would be to compare ONE given slide against the database to come up with (exact or partial) matches.