This classification problem involves classifying 20000 messages into 20 different classes. The dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. Four Machine Learning algorithms; Naïve Bayes, Logistic Regression, Regularized Logistic Regression Support Vector Machine (SVM) were implemented and there training and test dataset accuracy were compared. Arguably, one of the most important aspect to solving this problem is having the appropriate data set format. Each of these algorithms has its peculiar data format; the specific format and how to reconstruct the entire dataset are illustrated in other sections below. Out of all the methods, SVM using the Libsvm [1] produced the most accurate and optimized result for its classification accuracy for the 20 classes. All the algorithm implementation was written Matlab.
Download the code and Report here.

Project Activity

See All Activity >

Follow classify-20-NG-with-4-ML-Algo

classify-20-NG-with-4-ML-Algo Web Site

Other Useful Business Software
Grafana: The open and composable observability platform Icon
Grafana: The open and composable observability platform

Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

Grafana is the open source analytics & monitoring solution for every database.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of classify-20-NG-with-4-ML-Algo!

Additional Project Details

Registered

2016-01-21