Document similarity in MinorThird?

  • sameendra

    sameendra - 2012-05-18

    I have a training dataset of web pages related to a particular domian (one class). I want to get a similarity score or how much a new document (web page) is related to my initial training domain when I fed it to the classifier/clusterer. How do I do this in MinorThird?
    Thank You.

  • Frank Lin

    Frank Lin - 2012-05-23

    MinorThird does not come with clustering methods (though you can write one using the API). Most of the classifiers do not compare instance-instance similarity directly; KnnClassifier/Learner is one that does, and by default it uses a cosine-related similarity function so you can readily apply it to documents. You can modify the KnnClassifier class to output or log the similarity scores you want when classifying a new document.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks