Menu

Feature presence in Maxent training data

Help
frank
2008-05-19
2013-04-11
  • frank

    frank - 2008-05-19

    Hi,

    I'm trying to use Maxent package for text classification, but I get confused
    when I wrote the training data. I use unigram features for the experiment using their
    presence as value. The value is 1 if the feature exists in document and 0
    if the feature doesn't exist.

    I try 2 different approach or feature representation:

    First, if the feature exists in document I wrote 1_featurelabel to incorporate its existence in data and if it doesn't I wrote 0_featurelabel to give information about its non-existence.
    example --> 1_a 1_b 0_c 0_d 1_e topic1

    Second, if the feature exists in document I write 1_featurelabel and if it doesn't, I didn't write anything.
    example --> 1_a 1_b 1_e topic1

    Which one of the representations that is correct?

    Thanks

     
    • Thomas Morton

      Thomas Morton - 2008-05-19

      Hi,
         The first approach is the most typical as the lack of presence is implicitly modeled as getting zero weight.  The model will automatically assign the feature a value of 1 so you don't need to encode that in your features (not that it will hurt anything the way it is).

      Hope this helps...Tom

       
    • frank

      frank - 2008-05-21

      Hi,

      From your answer, I don't need to encode the lack of presence in my data. So, the second
      approach is the one I have to choose. Is it right? Just to clarify.

      Thanks for your help.

       
      • Thomas Morton

        Thomas Morton - 2008-05-21

        Correct.  I think I referred to them incorrectly in my last post...Tom

         
    • frank

      frank - 2008-05-23

      Hi,

      Ok. I got it. It's really helpful. Thanks.

       

Log in to post a comment.