Menu

Specify sentence breaks for sentence detector

Help
cuernavaca
2010-05-24
2013-04-16
  • cuernavaca

    cuernavaca - 2010-05-24

    Hi,
    I am wondering whether I can specify patterns that sentence detector must consider as a sentence break. For example, in my corpus, every sentence ends with a period followed by two spaces.

    Thanks very much,
    Yuan

     
  • niceday

    niceday - 2010-05-25

    With what you have specified it should be able to detect those type of sentences. Since they end with periods.

    I ran it my self on these sentences and it seemed to work ok. "The dog crossed the road.  The cat is fluffy.  The house is big."

    However, if you want to use different sentence endings other than the default ones '. ! ? " )' you would have to train a new model. Although I not sure how to go about this, but what you probably want to look at is the

    method
    train(File inFile, int iterations, int cut, EndOfSentenceScanner scanner)

    in

    class
    SentenceDetectorME

    hope it helps

    mark

     

Log in to post a comment.