Unsupervised TXT classifier download

This program is made to address two most common issues with the known classifying algorithms. First, over-training and second, shortage of data for a training of categories. Instead, each TXT file is a category on its own, rather than an assigned category. In a way, this is similar to clustering but not really a clustering algorithm since there is some training involved. The summarizer from Classifier4J has been adjusted to accept two inputs (lets call them A and B). Then, the summarizer gets trained with A to summarize a document B, and vice versa. This extracts a relevant structure for both documents (and thus avoids the over-training) which are then compared using the Vector-Space analysis to give a range of belonging of one document to another (and thus avoids the shortage of information). This method can be used to create the user-defined classes by merging texts of certain categories and then to calculate the relevant distances between the documents, but this is not necessary.

Project Samples

Amount of necessary sentences to get accurate results, approx. 60 sentences are required.

Project Activity

See All Activity >

License

Creative Commons Attribution ShareAlike License V3.0

Follow Unsupervised TXT classifier

Unsupervised TXT classifier Web Site

User Reviews

Be the first to post a review of Unsupervised TXT classifier!

Additional Project Details

Intended Audience

Education, Developers, Testers

Programming Language

Java

Related Categories

Java Artificial Intelligence Software, Java Information Analysis Software, Java Linguistics Software

Registered

2013-12-18

Similar Business Software

Lilac

Lilac is an open source tool that enables data and AI practitioners to improve their products by improving their data. Understand your data with powerful search and filtering. Collaborate with your team on a single, centralized dataset. Apply best practices for data curation, like removing...

See Software
IBM watsonx Assistant

IBM watsonx Assistant (Formerly Watson Assistant) is a market-leading enterprise conversational AI platform that allows you to build intelligent virtual and voice assistants that can provide customers with fast, consistent and accurate answers across any messaging platform, application, device...

See Software
theGist

theGist lets you Cut through your work noise with personalized summaries for Gmail and Slack. Summarize Slack channel and threads, on-demand or in a Daily Digest. Clear your inbox in seconds with an actionable categorized summary of your Gmail, directly from Slack. We don't change anything in...

See Software