#9 GISModel Breakage

open-fixed
nobody
None
9
2010-08-11
2010-08-06
James Kosin
No

The latest updates in CVS break the models and training.

The issue is with line 174...

The patch below regresses the modification that breaks the models.

Discussion

1 2 > >> (Page 1 of 2)
  • James Kosin
    James Kosin
    2010-08-06

    GISModel patch

     
  • James Kosin
    James Kosin
    2010-08-06

    • priority: 5 --> 9
     
  • James Kosin
    James Kosin
    2010-08-06

    The old line:
    prior[oid] = Math.exp(prior[oid]*model.getConstantInverse());

    the new line in CVS that breaks:
    prior[oid] = Math.exp(prior[oid]);

    The comment says we are using constant inverse when we are not using correlation constant... However, this constant inverse helps scale the training... without it the models quickly diverge and the training stops with no training causing unusalbe models in most cases.
    With OpenNLP tools the models diverge in 2-3 iterations almost always or end up with NaN values for the loglikelihood values.

    Can we get an explaination as to why getConstantInverse() when getCorrectionParam() is 0, is not expected?

     
  • Joern Kottmann
    Joern Kottmann
    2010-08-07

    The reason for the change was a fix in the code path that is executed when gaussian prior is used during training, because that was broken before.

    So we have to wait until Jason can look into it.

    Jörn

     
  • James Kosin
    James Kosin
    2010-08-09

    I'm sorry for coming over as being harsh... and I do know there is a risk of things breaking when using CVS code.

    I'm willing to discuss the issue; however, for now I'm returning to the broken method that seems to work... Til we get a clear patch and explaination of any changes we may need to do to the OpenNLP code to acomidate the changes.

    What appears to be happening now is the train() function originally calls with 0 for the constant param and a inverse of the count for the constant inverse parameter. This second seems to serve two purposes.

    The current result however seems to be that the models are trained in one pass or two. This causes the models to behave erratically at best.

    Maybe, we should at least come up with a good model to test Maxent with in the future to prevent model breakage like this from affecting the outcome of the model.

    Thanks.

     
  • I've reviewed the code, and what appears to have happened is that the use of the correction constant has been moved from the parameter update portion (in GISTrainer) into the probability of the classes for each event (in GISModel). I frankly don't understand why that happened (the 1.2 version was correct). It should be used for the update:

    alpha_j^(n+1) = alpha_j^n (empirical_expectation_of_feature_j / model_expectation_of_feature_j )^(1/C)

    Where alpha_j is the parameter for feature_j and C is the correction constant. Raising to the power 1/C makes for smaller steps. It seems that the incorrect placement in GISModel was having a somewhat similar effect of making for less exuberant predictions for any given class; this appears to have roughly worked in terms of still finding good models (which I'm a bit surprised by, actually). However, I just put the correction constant back to where it is supposed to be, which meant making the following change in GISTrainer:

    params[pi].updateParameter(aoi,(Math.log(observed[aoi]) - Math.log(model[aoi])));

    becomes:

    params[pi].updateParameter(aoi,((Math.log(observed[aoi]) - Math.log(model[aoi]))/evalParams.getCorrectionConstant()));

    For the datasets I am working with, models trained with this fix perform more accurately. Note that you had divergence probably because it was taking overly aggressive steps for parameter updates and it shot into a lower-likelihood region.

    The gaussian update correctly uses the correction constant, which is why it was *not* working appropriately with the GISModel code that said:

    prior[oid] = Math.exp(prior[oid]*model.getConstantInverse());

    (You can actually use massive values for the gaussian sigma and it will find an okay model, but not the desired one.)

    I'm going to commit the updated files -- can you try training and see what happens with model convergence and performance on the datasets you are using?

    FWIW, I recommend using the gaussian update -- it means not having to fiddle with feature cutoffs or number of iterations. You do need to choose a reasonable sigma, but this is usually a much less sensitive choice.

    Jason

     
    • status: open --> pending-fixed
     
  • Joern Kottmann
    Joern Kottmann
    2010-08-10

    After reviewing the referenced papers and our implementation I came to the same conclusion as Jason.

    In case we use GIS without smoothing do we then need the correction feature or not ? For me it looks like it has been used, and then it was disabled. Do you know more about that ?

    I will do testing with the training data I have and report here later.

    Thanks,
    Jörn

     
  • Joern Kottmann
    Joern Kottmann
    2010-08-10

    After testing a little with the Name Finder I could not see a difference in the tagged data between the version before your fix and after in case the model is re-trained. Sadly I think retraining will not be possible in time for the 1.5 release for thai and spanish models.

    We either drop spanish and thai support or we keep backward compatibility with already trained models. In the case we choose to keep backward compatibility it might be tricky to integrate your fix for the gaussian smoothing case.

    Jörn

     
  • Joern Kottmann
    Joern Kottmann
    2010-08-10

    After a short discussion with Tom I think we should not remove the correction constant from the calculations
    in the GISModel.eval method because it breaks backward compatibility with already trained models.

    New models trained with 3.0 can just write 1 for this constant into the model.

    The calculation performed would be:
    prior[oid] = Math.exp(prior[oid]) * 1/C; // where C is 1 for new models

    We then have to update the training code slightly to not store the correction constant
    used during training in the model. Its the wrong place anyway in my opinion, it could be an
    instance variable in GISTrainer or just passed to the static method doing the calculations.

    Any opinions ?

    Jörn

     
1 2 > >> (Page 1 of 2)