This classification problem involves classifying 20000 messages into 20 different classes. The dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. Four Machine Learning algorithms; Naïve Bayes, Logistic Regression, Regularized Logistic Regression Support Vector Machine (SVM) were implemented and there training and test dataset accuracy were compared. Arguably, one of the most important aspect to solving this problem is having the appropriate data set format. Each of these algorithms has its peculiar data format; the specific format and how to reconstruct the entire dataset are illustrated in other sections below. Out of all the methods, SVM using the Libsvm [1] produced the most accurate and optimized result for its classification accuracy for the 20 classes. All the algorithm implementation was written Matlab.
Download the code and Report here.
classify-20-NG-with-4-ML-Algo
Problem involves classifying 20000 messages into different 20 classes
Brought to you by:
emmyt
Downloads:
0 This Week