Menu

Adding Unseen outcomes

Help
2008-08-27
2013-04-11
  • Daniel Neiberg

    Daniel Neiberg - 2008-08-27

    Hi,

    I want to create an unigram language model that is dependent on a second modality, in this case position. For example:

    pos=home word=hello hello
    pos=home word=hello hello
    pos=home word=goodbye goodbye
    pos=out word=hello hello

    i sthis an appropiate way to structure the data?

    How do I add unseen outcomes, for example a word "thanks", that is not seen in data?

     
    • Thomas Morton

      Thomas Morton - 2008-08-31

      Hi,
         There are a couple of things to say.  You're set up looks fine from the "technically correct perspective". 

        Unfortunately, maxent is poorly suited for language modeling or really any task where there are a large number of outcomes (say more than 100).  This is because the code is set up to always produce the distribution for all outcomes in order to normalize that distribution.

        For unknown outcomes you need to simulate their occurrence in the training data.  To do this you might convert some selection of your data (say all words occurring just once) to be treated as unknown.

      Hope this helps...Tom

       

Log in to post a comment.