Menu

How to use POSDictionary at run time?

Help
2013-10-07
2013-10-10
  • Jung-wei Fan

    Jung-wei Fan - 2013-10-07

    Hi - I am able to construct a POSDictionary but have trouble figuring out
    how to supply it to a POSTaggerME in tagging. The documentation is not
    clear. Any advice?

    Thanks,
    Fan

     
  • Jung-wei Fan

    Jung-wei Fan - 2013-10-09

    I figured out. A custom tag dictionary, if supplied at training, will be built in as part of the model (the codes I put together are pasted below).
    However, one thing to note is that the dev manual also describes a command-line training approach with POSTaggerTrainer, which according to the API doc is deprecated and appears to generate model files that cannot be loaded at run time. This probably deserves some verification and update as needed.

    /** */
        public static void main(String[] args) throws IOException {
        String trainingFileName = args[0];
        String tagDictionaryFileName = args[1];
        String outputModelFileName = args[2];
    
        // load the training instances
        InputStream dataIn = null;
        dataIn = new FileInputStream(trainingFileName);
        ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn,
                "UTF-8");
        ObjectStream<POSSample> trainingSamples = new WordTagSampleStream(
                lineStream);
    
        FileInputStream fis = new FileInputStream(tagDictionaryFileName);
        POSDictionary tagDictionary = POSDictionary.create(fis);
        // create a POSTaggerFactory with null ngram dictionary and
        // a custom POSDictionary
        POSTaggerFactory factory = new POSTaggerFactory(null, tagDictionary);
    
        // create the model
        POSModel model = POSTaggerME.train("en", trainingSamples,
                TrainingParameters.defaultParams(), factory);
    
        OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(
                outputModelFileName));
        if (modelOut != null) {
            model.serialize(modelOut);
            modelOut.close();
        }
        fis.close();
        dataIn.close();
     }
    
     
    • Thilo

      Thilo - 2013-10-10

      OpenNlp has moved to Apache: http://opennlp.apache.org/
      You can direct your questions and comments there.

      --Thilo

      On 10/09/2013 08:58 PM, Jung-wei Fan wrote:

      I figured out. A custom tag dictionary, if supplied at training, will be
      built in as part of the model (the codes I put together are pasted below).
      However, one thing to note is that the dev manual also describes a
      command-line training approach with POSTaggerTrainer, which according to
      the API doc is deprecated and appears to generate model files that
      cannot be loaded at run time. This probably deserves some verification
      and update as needed.

      /* /
      public static void main(String[] args) throws IOException {
      String trainingFileName = args[0];
      String tagDictionaryFileName = args[1];
      String outputModelFileName = args[2];

       // load the training instances
       InputStream  dataIn  =  null;
       dataIn  =  new  FileInputStream(trainingFileName);
       ObjectStream<String>  lineStream  =  new  PlainTextByLineStream(dataIn,
               "UTF-8");
       ObjectStream<POSSample>  trainingSamples  =  new  WordTagSampleStream(
               lineStream);
      
       FileInputStream  fis  =  new  FileInputStream(tagDictionaryFileName);
       POSDictionary  tagDictionary  =  POSDictionary.create(fis);
       // create a POSTaggerFactory with null ngram dictionary and
       // a custom POSDictionary
       POSTaggerFactory  factory  =  new  POSTaggerFactory(null,  tagDictionary);
      
       // create the model
       POSModel  model  =  POSTaggerME.train("en",  trainingSamples,
               TrainingParameters.defaultParams(),  factory);
      
       OutputStream  modelOut  =  new  BufferedOutputStream(new  FileOutputStream(
               outputModelFileName));
       if  (modelOut  !=  null)  {
           model.serialize(modelOut);
           modelOut.close();
       }
       fis.close();
       dataIn.close();
      

      }


      How to use POSDictionary at run time?
      https://sourceforge.net/p/opennlp/discussion/9943/thread/8a9fd310/?limit=25#eb84


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/opennlp/discussion/9943/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.