The OpenNLP Maximum Entropy Package / Discussion / Help: Feature presence in Maxent training data

frank - 2008-05-19

Hi,

I'm trying to use Maxent package for text classification, but I get confused
when I wrote the training data. I use unigram features for the experiment using their
presence as value. The value is 1 if the feature exists in document and 0
if the feature doesn't exist.

I try 2 different approach or feature representation:

First, if the feature exists in document I wrote 1_featurelabel to incorporate its existence in data and if it doesn't I wrote 0_featurelabel to give information about its non-existence.
example --> 1_a 1_b 0_c 0_d 1_e topic1

Second, if the feature exists in document I write 1_featurelabel and if it doesn't, I didn't write anything.
example --> 1_a 1_b 1_e topic1

Which one of the representations that is correct?

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-05-19
  
  Hi,
  The first approach is the most typical as the lack of presence is implicitly modeled as getting zero weight. The model will automatically assign the feature a value of 1 so you don't need to encode that in your features (not that it will hurt anything the way it is).
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- frank - 2008-05-21
  
  Hi,
  
  From your answer, I don't need to encode the lack of presence in my data. So, the second
  approach is the one I have to choose. Is it right? Just to clarify.
  
  Thanks for your help.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-05-21
    
    Correct. I think I referred to them incorrectly in my last post...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- frank - 2008-05-23
  
  Hi,
  
  Ok. I got it. It's really helpful. Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Feature presence in Maxent training data

Forums

Help

Feature presence in Maxent training data document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Feature presence in Maxent training data