The OpenNLP Maximum Entropy Package / Discussion / Open Discussion: GIS iteration

ilophblue - 2008-08-12

I have a problem with the GIS iteration, when it reaches 10 iteration the probability of (a|b) becomes NaN. I have read Adwait's paper, especially in define the next estimate of the probability function based on the new Alpha's.

(n)                            (n) Fj(a,b)
P   (a|b) = 1/Z(b)   *   PHI(Aj)

For example :
    a |        b       |   F
-----------------------------------
Adwait | person_unique | 1,2,3,4,5 ("Adwait" has a TRUE value for person_unique in feature no. 1-5)
Adwait | person_unique | 1,2,3,4,5
Adwait | person_unique | 1,2,3,4,5
Adwait | person_start | 6,7 ("Adwait" has a TRUE value for person_unique in feature no. 6-7)

in 1st iteration, I have :
p(Adwait|person_unique) = 0.5
p(Adwait|person_start) = 0.5

but in the few next iteration :
p(Adwait|person_unique) = 0.001
p(Adwait|person_start) = 0.009

Is it true that more F it has (more Alpha to multiply), so the value of p(a|b) becomes smaller in next iteration? If it's true, it doesn't suit with the training corpus where the value of p(Adwait|person_unique) must be greater than p(Adwait|person_start). Any little help?

Thank you so much!
Dee

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-08-13
  
  Hi,
  I'm confused by your event space. Usually if you have p(a|b) then "a" is an outcome and b is the context of that outcome. You need to have at least 2 outcomes or is doesn't make sense to build a classifier but I see only one in the examples and the numbers at the end labeled F don't make any sense to me. Please clarify. Thanks...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ilophblue - 2008-08-17
  
  Sorry to make you confused and sorry to reply this message so long, I have done something stupid. I was checked the Open NLP forum, not Maxent forum... ^^
  
  with A is set of possible classes (ex : person_start, person_unique), and B is set of possible contexts.
  In the training corpus, I have tagged 8 "Adwait" as person_start and 2 "Adwait" as person_unique (my training corpus is a set of article which a people name can be found many times)
  
  So in my events space which contains all pair of (context-classes), i will have :
  
  1. Adwait - person_start
  2. Adwait - person_unique
  
  And I have 8 features (f1, f2,...f8).
  For the 1st events space, (Adwait - person_start), the features will be fired are f1, f2, f3, f4, f5, f6, f7
  For the 2nd events space, (Adwait - person_unique), the features will be fired are f7, f8
  
  Before the iteration, p(Adwait,person_start) = 0.5 and p(Adwait,person_unique)=0.5
  After a few iteration, p(Adwait,person_start) = 0.1 and p(Adwait,person_unique)=0.9
  
  In the training corpus the p(Adwait,person_start) is 8/10 but why after every iteration the probability becomes smaller. Is it because the multiply of every alpha in features? Do this help for you? I hope so.
  
  Thanks,
  dee
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-08-21
    
    Hi,
    Ok let me summarize what I'm reading to make sure I get this. You have a series of events. There are 8 features and 2 outcomes (person_start and person_unique). You have 10 events in total which are based on the lexical token "Adwait" which distribute 8 person_start and 2 person_unique. You are asking why for some event which involves this lexical token "Adwait" is the model is predicting 0.9 for person_unique and 0.1 for person_start.
    
    This last part of the question is where I this there is some mis-understanding. The prediction is based on the features (f1..f8) alone. If the lexical token being tagged is represented as a feature then it must be the other features that are over-ridding the weight of the "word=Adwait" feature.
    
    If you are not representing the word as one of your features then the model won't try and model the distributional relationship between the outcome and the word. My guess is that the prediction is consistent with the distribution of the features (f1..f8) and the outcomes.
    
    Also, its not clear which event you are watching as the model trains which is moving towards:
    person_start=0.1 person_unique=0.9 and what set of features are active for that event. If you can, post some of your events so I can see what you are actually modeling, and that should help.
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ilophblue - 2008-08-27
  
  I think I understand what are you talking about, some mistakes appear in my training corpus also. Anyway, can I have exactly same person name in training corpus. For example :
  
  1. <START> Bob <END> buy the computer.
  2. <START> Bob <END> goes to school.
  3. I will call <START> Bob <END> tomorrow.
  
  or there is MUST BE one "Bob" in the training corpus?
  
  Thanks!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-08-31
    
    Hi,
    For a name-entity task which is what your data appears to be for, then the same name can occur in multiple sentence or even the same sentence. By default a name needs to occur at least five times in the data to be included in the model.
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GIS iteration

Forums

Help

GIS iteration

GIS iteration

Forums

Help

GIS iteration document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

GIS iteration