I have a training dataset of web pages related to a particular domian (one class). I want to get a similarity score or how much a new document (web page) is related to my initial training domain when I fed it to the classifier/clusterer. How do I do this in MinorThird?
MinorThird does not come with clustering methods (though you can write one using the API). Most of the classifiers do not compare instance-instance similarity directly; KnnClassifier/Learner is one that does, and by default it uses a cosine-related similarity function so you can readily apply it to documents. You can modify the KnnClassifier class to output or log the similarity scores you want when classifying a new document.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.