I'm a beginner, and I would be glad if you could help me understand the relationship between features, contextual predicates and events.
I think I understand the basic Maximum Entropy theory - i.e. the concept of feature as reported in Ratnaparkhi's dissertation. However, I can't quite figure out how does it map to the implementation in OpenNLP maxent. I went through the HOWTO, which is very clear, save for the fact that neglects exactly what I'm interested into (... "you really don't need to know the theoretical side to start selecting features with opennlp.maxent" ...).
I really need this as I'm interested in understanding HOW are features internally represented - or better, what is represented, and in what data structure.
I think I get the idea that a feature is a boolean-output function F of
1)ONE outcome
2) ONE OR MORE contextual predicates (say a,b,c..., where C = {a,b,c,...} is known as the CONTEXT);
If all the contextual predicates in a feature return TRUE, and "outcome" is the one requested, then we say that the feature is ACTIVE, F = 1.
Is this correct?
Then the problems begin :). I guess what happens is that, both in Ratnaparkhi's and in your implementation, "features" as such don't exist anymore: instead, we call each "contextual predicate" a "feature".
Correct?
Now, supposing it holds, how does the previous map to the concept of "event"? From what I know, an event
is a single line of text of the form
cont_pred1 cont_pred2 ... cont_predi OUTCOME
Where are the features in this event? Is it maybe event = feature?? Or is the relationship more complex?
Sorry if this may sound trivial to you, I'm probably lacking some necessary background for this.
The thing is, I need to know how the whole thing is organised in terms of data structures for a project of mine. Is each event hashed in memory? Or, is each feature hashed?
Thanks very much!
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
This is a reasonable question which has confused many people (myself included). So in the paper features, f(a,b) are the combination of a contextual predicate, b, and an outcome that they occur with, a. So f is something like: "the word is "the" AND the outcome is "DT" even though in the event you just say w=the. Each contextual predicate outcome pair is associated with a weight/parameter. Thus, a particular contextual predicate can have one weight for one outcome (like a high weight for f(w=the, outcome=DT) and a very different weight for a different outcome like f(w=the outcome=VB).
Hi Tom, thanks for your prompt reply. Let me just ask a few more questions to check if I get this straight.
[Tom]
...features, f(a,b) are the combination of a contextual predicate, b, and an outcome that they occur with, a. So f is something like: "the word is "the" AND the outcome is "DT" even though in the event you just say w=the.
[/Tom]
Right, so in other words both the contextual predicate and the outcome define the feature: if either of the two changes, what we have is a different feature. Is this true?
Do you also mean that two features, provided they have a different outcome, MAY have the same contextual predicate? ( I was thinking, say, of a "pool" of contextual predicates from which you pick to build several features )
[Tom]
Each contextual predicate outcome pair is associated with a weight/parameter.
[/Tom]
I'm a little unsure here. Are you talking about the output of the feature here? Like, the boolean 1/0 we have in case of binary features (that is, instead, real valued in case features are not boolean)?
Because in the context of maximum entropy I see that people call "weights" also the lambdas of the model: the values we get as the outcome of the training, that are provided by the GIS (or similar) algorithm.
[Tom]
Thus, a particular contextual predicate can have one weight for one outcome (like a high weight for f(w=the, outcome=DT) and a very different weight for a different outcome like f(w=the outcome=VB).
[/Tom]
Umm, this seems to reinforce the first answer. So, what is really an event? Is it a collection of contextual predicates ALL SHARING a common outcome? In the following example event
cp1 cp2 ... cpi OUT
I see the following features
f1 = (cp1, OUT)
f2 = (cp2, OUT)
...
fi = (cpi, OUT)
Makes sense?
Ok, thank you again so much for your help Tom and keep up the good work!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Right, so in other words both the contextual predicate and the outcome define the feature: if either of the > two changes, what we have is a different feature. Is this true?
Yes.
> Do you also mean that two features, provided they have a different outcome, MAY have the same contextual
> predicate? ( I was thinking, say, of a "pool" of contextual predicates from which you pick to build several > features )
yes. In the pos case: prev_word=the will occur with several outcomes.
> Because in the context of maximum entropy I see that people call "weights" also the lambdas of the model:
> the values we get as the outcome of the training, that are provided by the GIS (or similar) algorithm.
yeah weights, parameter, lambdas all the same.
The rest makes sense and seem like you understand it...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm a beginner, and I would be glad if you could help me understand the relationship between features, contextual predicates and events.
I think I understand the basic Maximum Entropy theory - i.e. the concept of feature as reported in Ratnaparkhi's dissertation. However, I can't quite figure out how does it map to the implementation in OpenNLP maxent. I went through the HOWTO, which is very clear, save for the fact that neglects exactly what I'm interested into (... "you really don't need to know the theoretical side to start selecting features with opennlp.maxent" ...).
I really need this as I'm interested in understanding HOW are features internally represented - or better, what is represented, and in what data structure.
I think I get the idea that a feature is a boolean-output function F of
1)ONE outcome
2) ONE OR MORE contextual predicates (say a,b,c..., where C = {a,b,c,...} is known as the CONTEXT);
If all the contextual predicates in a feature return TRUE, and "outcome" is the one requested, then we say that the feature is ACTIVE, F = 1.
Is this correct?
Then the problems begin :). I guess what happens is that, both in Ratnaparkhi's and in your implementation, "features" as such don't exist anymore: instead, we call each "contextual predicate" a "feature".
Correct?
Now, supposing it holds, how does the previous map to the concept of "event"? From what I know, an event
is a single line of text of the form
cont_pred1 cont_pred2 ... cont_predi OUTCOME
Where are the features in this event? Is it maybe event = feature?? Or is the relationship more complex?
Sorry if this may sound trivial to you, I'm probably lacking some necessary background for this.
The thing is, I need to know how the whole thing is organised in terms of data structures for a project of mine. Is each event hashed in memory? Or, is each feature hashed?
Thanks very much!
Mike
Hi,
This is a reasonable question which has confused many people (myself included). So in the paper features, f(a,b) are the combination of a contextual predicate, b, and an outcome that they occur with, a. So f is something like: "the word is "the" AND the outcome is "DT" even though in the event you just say w=the. Each contextual predicate outcome pair is associated with a weight/parameter. Thus, a particular contextual predicate can have one weight for one outcome (like a high weight for f(w=the, outcome=DT) and a very different weight for a different outcome like f(w=the outcome=VB).
The data structure looks like:
event -> list(contextual_predicates), outcome
contextual_predicate -> name, (map(outcomes)-> parameters)
Note: most contextual_predicates don't occur with every outcome and their parameters are 0 and thus not explicitly represented.
The exact data structures used in the code are highly optimized and so these relationship may not be immediately apparent.
Hope this helps...Tom
Hi Tom, thanks for your prompt reply. Let me just ask a few more questions to check if I get this straight.
[Tom]
...features, f(a,b) are the combination of a contextual predicate, b, and an outcome that they occur with, a. So f is something like: "the word is "the" AND the outcome is "DT" even though in the event you just say w=the.
[/Tom]
Right, so in other words both the contextual predicate and the outcome define the feature: if either of the two changes, what we have is a different feature. Is this true?
Do you also mean that two features, provided they have a different outcome, MAY have the same contextual predicate? ( I was thinking, say, of a "pool" of contextual predicates from which you pick to build several features )
[Tom]
Each contextual predicate outcome pair is associated with a weight/parameter.
[/Tom]
I'm a little unsure here. Are you talking about the output of the feature here? Like, the boolean 1/0 we have in case of binary features (that is, instead, real valued in case features are not boolean)?
Because in the context of maximum entropy I see that people call "weights" also the lambdas of the model: the values we get as the outcome of the training, that are provided by the GIS (or similar) algorithm.
[Tom]
Thus, a particular contextual predicate can have one weight for one outcome (like a high weight for f(w=the, outcome=DT) and a very different weight for a different outcome like f(w=the outcome=VB).
[/Tom]
Umm, this seems to reinforce the first answer. So, what is really an event? Is it a collection of contextual predicates ALL SHARING a common outcome? In the following example event
cp1 cp2 ... cpi OUT
I see the following features
f1 = (cp1, OUT)
f2 = (cp2, OUT)
...
fi = (cpi, OUT)
Makes sense?
Ok, thank you again so much for your help Tom and keep up the good work!
> Right, so in other words both the contextual predicate and the outcome define the feature: if either of the > two changes, what we have is a different feature. Is this true?
Yes.
> Do you also mean that two features, provided they have a different outcome, MAY have the same contextual
> predicate? ( I was thinking, say, of a "pool" of contextual predicates from which you pick to build several > features )
yes. In the pos case: prev_word=the will occur with several outcomes.
> Because in the context of maximum entropy I see that people call "weights" also the lambdas of the model:
> the values we get as the outcome of the training, that are provided by the GIS (or similar) algorithm.
yeah weights, parameter, lambdas all the same.
The rest makes sense and seem like you understand it...Tom