Menu

BAD PREDICT

Help
Frank
2011-04-07
2013-04-11
  • Frank

    Frank - 2011-04-07

    Hello, this is my problem:

    In my software there are 4 categories: SPORT, RELIGION, POLITIC, MOTOR

    For each category I have a training set of 70 files, and a test set of 30 files.

    I create 4 models, one for each category.

    1)
    To create the model of the category X I balance the training set with 70 files of the category X and 23 files of each other category
    So I have I balanced training set with 50% yes and 50% no (the other three categories).

    Then I balance the test set with the same procedure..

    2)
    I select a SET of 1000 most significative features for the YES category, with Chi Square method.

    3)

    I make the file.dat of training set (to create the model) in this way:

    For each row I write the features (ngrams) values of a document of the training set in this way:

    FOR EACH FEATURE "ngram" OF THE SELECTED SET,
    IF THE DOCUMENT CONTAINS THE FEATURE ngram1 = 1.0
    ELSE ngram = 0.0
    then, at the end of the row, if the document is a document of the category X, I write "YES", else "NO".

    ex.: ngram1 = 1.0 ngram2 = 0.0 ngram3 = 1.0 ……… …… ngramN = 0.0 yes
    .
    .
    .
    ngram1 = 0.0 ngram2 = 0.0 ngram3 = 0.0 ……… …… ngramN = 1.0 no

    4) In the same way I create the file.test of the test set

    5)
    and the file to predict is:

    ngram1 = 1.0 ngram2 = 0.0 ngram3 = 1.0 ……… …… ngramN = 0.0 ?

    But when I start opennlp.maxent with the model created with this type of files, the result for the document to predict is always NO

    and this is the output:

    Model Diverging: loglikelihood decreased
    Model Diverging: loglikelihood decreased
    Model Diverging: loglikelihood decreased

    RELIGION EVALUATION:
    Precision  0.48979592
    Recall     0.48979592
    F-Measure     0.48979592
    RELIGION prediction:
    For context:
    YES  NO

    MOTOR EVALUATION:
    Precision  0.48979592
    Recall     0.48979592
    F-Measure     0.48979592
    MOTOR prediction:
    For context:
    NO  YES

    SPORT EVALUATION:
    Precision  0.48979592
    Recall     0.48979592
    F-Measure     0.48979592
    SPORT prediction:
    For context:
    NO  YES

    POLITIC EVALUATION:
    Precision  0.48979592
    Recall     0.48979592
    F-Measure     0.48979592
    POLITIC prediction:
    For context:
    NO  YES

    IN WAHT I'M WRONK ??????

    Can anyone help me???

    thanks..

     
  • Joern Kottmann

    Joern Kottmann - 2011-05-18

    Hi, the project moved to Apache, please repost your question on the user mailing list,
    see our new website for details about how to subscribe to the mailing list:
    incubator.apache.org/opennlp

    Thanks,
    Jörn

     

Log in to post a comment.