Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
spark-classifier_1.3-SNAPSHOT.zip | 2018-03-19 | 127.0 MB | |
README.txt | 2018-03-19 | 3.3 kB | |
spark-classifier_SourceCode1.3.zip | 2018-03-19 | 17.6 kB | |
Totals: 3 Items | 127.1 MB | 0 |
Overview: This program can be used for both training and classifying purpose. You can train the model and use RESTFul web service to query the model. This program also exposes a RESTFul web service to (jetty and javaspark based) expose classification/prediction as a service. Please refer below details based on how you are planning to use this program Install: 1. Download the package the package Òspark-classifier_1.3-SNAPSHOT.zipÓ 2. Unzip the pre-built distribution and follow the below details 3. Understand the folder structure of release upon unzipping * spark-classifier_\<version> * /lib: contains all dependent jars * /conf: contains classifier.properties, please review this file before running the program * /model: the default model path where both model would saved (after training) and read (during classification service). You should have write access to this folder * /spark-classifier-\<version>.jar: the main driver jar Configuration: Currently it supports Random Forest and Multilayer Perceptron classifiers. Please set the same under Òconf/classifier.propertiesÓ # Currently supported algorithm RANDOM_FOREST or MULTILEVEL_PERCEPTRON classifier.algorithm=MULTILEVEL_PERCEPTRON #classifier.algorithm=RANDOM_FORES It takes Comma(,) separated list of columns for Feature and Label. * in label means it will take all columns to predict. It will skip feature columns if they in in predict or label column too. classifier.featurecols=Number,Follow up ####list of labels to be predicted #### '*' will process all the columns classifier.labelcols=Root Cause #classifier.labelcols=L1, L2, L3... Train the model: cmd > java -cp spark-classifier-<version>-SNAPSHOT.jar:lib/*:conf org.arrahtech.classifier.ClassifierTrainer The input file name and output model location can be defined inside `conf/classifier.properties` By default, above command would assume that `conf/classifier.properties` file is correctly setup. Use the model to predict or classify cmd > java -cp spark-classifier-<version>-SNAPSHOT.jar:lib/*:conf org.arrahtech.service.ClassifierService It will start default jetty server which will accept post requests. After this you may post the RESTFul API http://localhost:4567/classify/<algorithm_name>/<label_name> -d jsonfile Where \<algorithm_name> can be "randon_forest" or "multilevel_perceptron" and \<label_name> would be the label column name (column for which model was trained) in your training dataset and json file will have feature column and values which are input for prediction or classification cmd > curl -XPOST http://localhost:4567/classify/random_forest/LABEL1 -d '[{ "FeatureField1":"FeatureField1VALUE", "FeatureField2":" FeatureField2VALUE", "FeatureField3":" FeatureField3VALUE"}]' > Response JSON [{ "classifiedLabel": "PredictedValue", "probability": "0.951814884316891" }] Things to Remember 1.) Presently it takes only txt file with field separator 2.) Null is replaced by NULLVALUE as null cannot be used in model 3.) multilevel_perceptron does not give probability of predicted value. This feature is available in latest apache spark version. 4.) Currently label_name shouldn't have hyphen '-' character 5.) If there is space in label column name use Ô%20Õ for space.