Help
uml
2007-02-20
2013-04-09
• uml - 2007-02-20

Took a while to figure out what's going on for TADM.

TADM learns a maximum entropy model, which yields the event probabilities matching with the empirical probabilities in the input event space. However, the probabilities associated with features may not be matching with those in the input file.

Can I set the TADM to fit a model that generates the feature probabilities matching with those in the input file?

This is like a constraint satisfying model learning problem. Each feature and its empirical probability is viewed as a constraint on the underlying unknown model.

Thanks!
brook

• uml - 2007-02-20

My question is confusing. If we look at empirical feature probabilities within the event space defined in input file, they still match with the probabilited generated by TADM.

I guess my question should be: if I want to model three random binary variables (x1, x2, x3), and the sample I have contains only one event, say: (x1=1, x2=1, x3=0). Then how to learn the model (joint distribution)?

What confuses me is that: assume we want to model some P(X), if we know all possible events and their frequency, why we still need fit a model?
The model should be used to predict for some unobserved event. Am I right? It seems to me any model learned from TADM can only predict for events in the input file (observed already!!)

Thanks for any clarification.

• Jason Baldridge - 2007-02-21

If you do know the entire probability space, then you don't need to fit a model. You have it by definition...

I suspect there is confusion w.r.t. what is an event and so on. With discriminative maxent models, you know all possible outcomes, but you don't know how all possible data points look like. (E.g., we know all possible labels for a text classification problem, but we will see new documents not in the training material that have some overlap in terms of the features which are active for them. The weights assigned by maxent help us figure out where the new document lies in the probability space, as long as there is some overlap.)

It might help if you have a look at the third homework from my computational linguistics II course last fall:

http://comp.ling.utexas.edu/jbaldrid/courses/2006/cl2/cl2-hw3.tgz

Jason