The OpenNLP Maximum Entropy Package / Discussion / Help: can maxent handle numeric features?

Matthew Gerber - 2008-01-29

It seems like the context predicates only permit nominal feature values (e.g., previous="the"). I would like to use numeric features as well (e.g., distance-from-verb=5). Can MaxEnt handle this?

Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-01-29
  
  Hi,
  There are a coupe things you can do. You can have features like: distance-from-verb=5, but there is no inherent relationship between this and distance-from-verb=4. Each will be assigned a unique parameter.
  
  Another option is to repeat the feature and you'll get a counting like effect:
  distance-from-verb=4 -> distance-from-verb distance-from-verb distance-from-verb distance-from-verb
  Now this feature will have a single parameter, but it will be 4 times as strong in the case above.
  
  Finally, the upcoming 2.5 release adds support for real-valued features so that you can do things like: distance-from-verb=4 or distance-from-verb=3.5 and get behavior like the above case.
  
  This code is basically ready to go as I just finished regression testing it last night, and I'm planing on putting out the 2.5 release soon. I still need to review the docs and double check that the examples work as written. In the meantime you could just check out the trunk. See the main of RealValueFileEventStream or the docs for how to use this new feature.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Matthew Gerber - 2008-01-29
    
    Hi Tom,
    
    Thanks for the quick reply. Glad to hear it's being added; however, will it be possible to mix numeric with nominal features in the same event, or will events be limited to either (1) the previous behavior, or (2) numeric feature values only?
    
    Best,
    Matt
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Thomas Morton - 2008-01-29
      
      Hi,
      The feature types can be mixed. The numeric features have a specific syntax name=value. If a feature doesn't have that syntax, then it is treated as it was before which is basically, name=1.
      
      Hope this helps...Tom
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Matthew Gerber - 2008-01-30
        
        Hi Tom,
        
        I'm a little unclear on the feature specification. You say the numeric feature syntax is "name=value". That's the syntax I currently use for nominal features in 2.4 (e.g., "currentword=that"). Will numeric/nominal features share the same syntax ("name=value") in 2.5, the difference being whether or not the value can be parsed as a floating point number? Or some other way...?
        
        Thanks,
        Matt
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Thomas Morton - 2008-01-30
        
        Hi,
        For the "nominal features" as you call them, there is no syntax. Only strings that are the identical are treated as the same feature. I too tend to use the pattern that you describe, but its just for readability.
        For numeric features you have to use the RealValue event streams and data indexers to get the new behavior. If you have a feature which doesn't have a numeric value it will give it a value of one (which is the "nominal feature" behavior, but it figures this out by trying to parse the float and catching the NumberFormatException. This will be slow and you'll see an error message for everytime this happens. If you need numeric features, I would change your other features to use a different syntax.
        
        Hope this helps...Tom
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Matthew Gerber - 2008-01-30
        
        Okay, that makes sense. Thanks for the clarification.
        
        -Matt
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philip Ogren - 2008-08-26
        
        Tom,
        
        I am looking at the RealValueFileEventStream as described in this thread and I would like to verify that I understand how to call eval on a model that has been created using RealValueFileEventStrem. It seems that one should call model.eval(String[] context, float[] values) such that for each entry in context there is an entry in values where nominal entries in context would have a corresponding value of 1.
        
        Also, do you have some intuition of what kinds of float values are ideal for a maxent model? I am considering some features that have probability values (between 0 and 1). Are these likely to be effective? In the example earlier in this thread the example values were 4 and 3.5. Should I be looking to use float values that are greater than 1? (e.g. between 0 and 100). Any insight you might have that would give me some intuition about good float-valued features would be appreciated. Thanks!
        
        Philip
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Thomas Morton - 2008-08-27
        
        Hi,
        
        That's correct. You put your real values in or 1 if you are treating it as a binary feature.
        
        > It seems that one should call model.eval(String[] context, float[] values) such that for each entry in context there is an entry in values where nominal entries in context would have a corresponding value of 1.
        
        I haven't had much of a chance to play with this feature so I don't have any intuition on values or ranges. I know some papers on re-ranking parses have used the log of a probability rather than the actual probability but that may have been to avoid issues with very small probabilities rather than anything else.
        
        Also there was a bug fix to this that I checked in on 8/21 so make sure you've updated since then.
        
        Hope this helps...Tom
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philip Ogren - 2008-09-01
  
  Tom,
  
  I am having troubles understanding how to use real values to build a model. I have two questions:
  
  Why are negative values disallowed?
  
  Why can't I build a model that has a single context/feature that discriminates between two outcomes using the value of that context/feature? For example, if I provide the following training data, it will train for two iterations print out the following:
  1: .. loglikelihood=-16.635532333438686 0.5
  2: .. loglikelihood=-27.363613831481224 0.5
  
  I feel like I am having a fundamental disconnect understanding what it means to have real valued contexts/features. Any insight would be appreciated.
  Thanks,
  Philip
  
  A hello=1234.0
  A hello=1000.0
  A hello=900.0
  A hello=1500.0
  A hello=2000.0
  A hello=1235.0
  A hello=1001.0
  A hello=901.0
  A hello=1501.0
  A hello=2001.0
  A hello=1502.0
  A hello=2003.0
  B hello=10.0
  B hello=8.0
  B hello=60.0
  B hello=80.0
  B hello=4.0
  B hello=11.0
  B hello=5.0
  B hello=61.0
  B hello=81.0
  B hello=7.0
  B hello=82.0
  B hello=3.0
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-09-21
    
    Hi Philip,
    Sorry it has taken a while for me to get back to this. I borrowed from the future in terms of time to get the release out and had to pay that back before I could work on this again.
    So I've tracked down a couple bugs in the real-valued stuff. I just fixed them and check the changes into trunk. Please update.
    
    To answer your question about the meaning of the values here is an example (which I used to track down the bugs). The following real-valued data set is equivalent to the following non-real valued data set:
    
    real-valued data set
    hello=5.0 A
    hello=1.0 B
    
    non-real-valued data set
    hello hello hello hello hello A
    hello B
    
    So it basically another way of saying how many times a feature occurs, and if that is fractional you can now represent it. This also means that the model only has 2 parameters, hello,A and hello,B regardless of how many different values you have in your data.
    
    I also have a request. Given the above, can you submit a unit test to build the two models above (in memory):
    1) with regular event space
    2) the other with a real event space
    and then check that the distributions across the 2 model's events are the same within an epsilon of each other?
    
    Let me know if you can and I'll add that test in before I roll a new release with this fix.
    
    Thanks...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philip Ogren - 2008-09-25
  
  I think that it is not accurate to say that maxent handles numeric features as asked above. For the example given above, it may be that for a certain data set that a verb is almost always followed by an adverb which is almost always followed by a preposition. Using a feature like distance-from-verb is not going to help discriminate whether the word following the verb is an adverb and the 2nd word from the verb is a preposition. The first of the two bug reports to follow will demonstrate this. I think that the way you should describe numeric features in opennlp maxent is as weights for a binary feature.
  
  I have created the unit test you suggest above and will post it shortly on the bug tracker. It seems that the model is the same regardless of which event stream you use - so that is good. However, I am not satisfied with the results that print out. More details in the bug report.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-09-26
    
    Hi,
    I understand what you are saying. When you look at the data, it is obvious that there is cut-point between the two outcomes based on the values.
    
    You could model with with a boolean feature like f1(if x>100 => 1 else 0) and f2(if x<100 => 1 else 0).
    
    The inability for the model to distinguish outputs based on a single feature is a function of the formalism.
    
    p(a|b) = 1/Z(b) prod_j(a_j^f(a,b))
    
    The algorithm selects values for the weights a_j. With a single feature there will be the same number of weights as outcomes, one that contributes to each outcome. For the simplest case:
    
    hello=5.0 A
    hello=1.0 B
    
    p(A|hello=5) = 1/Z prod(a_hello_A^5)
    p(B|hello=5) = 1/Z prod(a_hello_B^5)
    
    p(A|hello=1) = 1/Z prod(a_hello_A^1)
    p(B|hello=1) = 1/Z prod(a_hello_B^1)
    
    There are no values for a_hello_A and a_hello_B that will give more weight to A in the first case and more to B in the second case.
    
    In the sense that the of portion of the formula represented by f() enforced a range of 0 and 1 in the previous implementation and not supports a range from R+ then features now support real values.
    
    I think this will be helpful in the coref code where as you get farther from the antecedent it is less likely that you are referent but not impossible.
    
    Thanks for the unit test. I'll incorporate the testRealValuedWeightsVsRepeatWeighting test.
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philip Ogren - 2008-09-26
  
  Tom,
  
  Thank you for the detailed response and for being kind when I expose my ignorance! :)
  
  For what its worth, I ran the same data in the unit test using Mallet's maxent implementation (--trainer MaxEnt) and got the same results. The decision tree, of course, found the cut point and worked. There is another trainer called MCMaxEnt that was able to work with the data set - so it must have a different underlying function.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philip Ogren - 2008-10-16
  
  Tom,
  
  I just wanted to verify - does the downloadable version 2.5.1 have all of the bug fixes associated with real-valued features? I couldn't remember if they all ended up in that release or were only available via cvs. I'm not ready to move to the work-in-progress that is 3.0.0.
  
  Thanks,
  Philip
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-10-17
    
    Hi,
    Yeah they all made it to that release and are ported to 3.0 which is not anywhere close to ready for prime time. There is a branch for 2.4 and currently there are no know outstanding issues with it. Thanks...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

can maxent handle numeric features?

Forums

Help

can maxent handle numeric features? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

can maxent handle numeric features?