WikiPage use_command_line modified by Shashank

Shashank — Thu, 26 Jul 2012 21:27:42 -0000

WikiPage use_command_line modified by Shashank

Shashank — Thu, 26 Jul 2012 20:16:37 -0000

--- v1
+++ v2
@@ -44,3 +44,8 @@
 
 
 In the above example, we have added two additional features. For article 1, the values of the additional features are 22 and 550. For article two, the values are 30 and 739. Simple Classifier will automatically pick up these additional feature values!
+
+Normalization parameters
+======================
+
+You can set how your text should be normalized, by editing the normalizer.properties file.

WikiPage use_command_line modified by Shashank

Shashank — Thu, 26 Jul 2012 17:26:25 -0000

These are the instructions on how to use Simple Classifier from the command line. First you need to download Simple Classifier from its project page on Source Forge - . Once you have downloaded Simple Classifier zip, extract it into a directory. Let's say the path to the directory where Simple Classifier was extracted is $SCWD. Navigate your terminal/command prompt to $SCWD. We will be issuing commands here. For all following steps, you need to add simpleclassifier.jar to your classpath. Running the Simple Classifier cross validator =========================== Let's assume we are dealing with three classes - 'class1', 'class2', and 'class3'. Add cases for each class in a separate file, one case per line, for example, class1.txt, class2.txt, class3.txt. Note that none of the files can have comma (,) in their file name. It is important that each case be contained in one line. If your text has multiple paragraphs, they will have to be concatenated so that all text appears in one line (possibly a very long line!). Also, it is important that your text does not contain tabs (\t in programming jargon!). Tabs are used to identify additional features; see the next section for more details. Then issue the following command to run SimpleClassifier cross validator - java edu.uwm.bionlp.simpleclassifier.MultiClassCrossValidator class1.txt,class2.txt,class3.txt class1_name,class2_name,class3_name 100 1 weka.classifiers.functions.SMO mutual_information 10 In the above command, the first and the second arguments (class1.txt,class2.txt,class3.txt class1_name,class2_name,class3_name) are the input files and class names. All other arguments are optional. The first argument (class1.txt,class2.txt,class3.txt) contains the path to the files containing the cases for each file, separated by commas. The second argument (class1_name,class2_name,class3_name) contains the corresponding class names separated by commas. The third argument is the number of top features to use for training. In the example above, we are training on top 100 features. Default value is 100. The fourth argument (1) is the type of n-grams to use for training. Here we are using unigrams (hence, 1) which means we are training on individual words. If we wanted to train on unigrams and bigrams, we would use 1,2 as the argument. Default value is 1. The fifth argument is the Weka class we are using for training. Default value is weka.classifiers.functions.SMO. The sixth argument (mutual_information) is the feature selection algorithm to use. We can use either mutual_information or chi_squared. Default value is mutual_information. The seventh and the last argument (10) is the number of folds to use for cross-validation. Default value is 10, for ten-fold cross validation. Running the Simple Classifier cross validator with additional features ===================== You can add additional NUMERIC features that can be used by the simple classifier. For example, if you might want to include a feature like the length of the text during training. All you have to change is the input file. Additional features can be added after the text separated by comma. So your text can look something like this - This is text of article 1. 22 550 This is text of article 2. 30 739 ... In the above example, we have added two additional features. For article 1, the values of the additional features are 22 and 550. For article two, the values are 30 and 739. Simple Classifier will automatically pick up these additional feature values!

Recent changes to use_command_line

WikiPage use_command_line modified by Shashank

WikiPage use_command_line modified by Shashank

WikiPage use_command_line modified by Shashank