From: Amit J. <ami...@gm...> - 2008-11-04 13:21:35
|
Hi, 1. The meaning of True+, Marked+ and Actual+ is as follows: True+: The tokens that were correctly tagged of class (say) 5 by the crf model. Marked+: The tokens that were marked as of class 5 by the crf model. Actual+: The tokens that were actually labeled as class 5 in the manually tagged test data. Precision = True/Marked Recall: True/Actual I am not sure how you counted the tokens belonging to class 5. There are certain delimiters which are removed by the preprocessor and should not be counted. 2. label_in_result_table = original_label - 1 This is correct. In the training/test data, the labels start from 1, but internally, they start from 0. 3. ConcatRegexFeatures has been used in different applications for identifying the patterns of regular expression that appear in a given window. The feature is self-documented and should be easy to understand. Please tell if you are looking for some specific info regarding this feature. I would suggest to download this code: http://www.it.iitb.ac.in/~amitj/software/CRFAppl.tar.gz. This is a sample application built using the template given in the CRF documentation: http://crf.sourceforge.net/introduction/ and is easy to understand. -regards Amit On Tue, Nov 4, 2008 at 5:38 PM, David Batista <dsb...@gm...> wrote: > First of all thanks for providing this software package to the > community, I've been working with it since a few weeks now, and have > a few questions: > > I did a run (train, test, calc) using the CVS version, on a file with > 10 labels, labelled from 1 to 10, checking the results there is > something I don't understand: > > How are the True+, Marked+ and Actual+ counted ? > > For instances: > > the file used for testing - test.tagged - had 431 occurrences of the label 5 > the file outputted - in the out/ dir - had 166 recognized occurrences > of the label 5 > > the results show: > True+ Marked+ Actual+ > 160 331 738 > > for the label 5, can anyone enlight me on what those results mean? > > > Another thing, I've noticed that labelling the input text starting at > label 0 will result in the ArrayOutofBounds Execption, therefore I > started labelling the input text at label 1, but the results > table shown at then end starts with label 0, so I'm just assuming that > the labels in the results are: > > label_in_result_table = original_label - 1 > > is that right? > > > On last question, as anyone used the ConcatRegexFeatures.java class > for generating features based on regular expressions? It seems to be a > great way of training the CRF for specific expressions, words. > > > Best regards, > > -- > ./david > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Crf-users mailing list > Crf...@li... > https://lists.sourceforge.net/lists/listinfo/crf-users > |