Menu

Making Training Corpus for NameFind

ilophblue
2008-05-05
2013-04-16
  • ilophblue

    ilophblue - 2008-05-05

    I want to make a training corpus for NameFind. In one line must be only one tag <START> <END> to learn. example :
    I want to go to <START> New York <END>.

    New York is place. I make my training corpus from article, so can i use this example :

    Simon wants to go to <START> New York <END>.

    Since there is a person "Simon" and place "New York" in one line, but I put one tag on place entity because I want to learn the New York place. Can i do that kind of example? Thank you...

    Best regards,
    Dee

     
    • Thomas Morton

      Thomas Morton - 2008-05-06

      Hi,
         One one line there is only examples of a specific type: person, location, etc.  If there are multiple people on a single line, that is fine.  So using your example you would use data like:

      Simon wants to go to <START> New York <END> .

      to train the location model, and data like:

      <START> Simon <END> wants to go to New York .

      to train the person model.

      Hope this helps...Tom

       
    • ilophblue

      ilophblue - 2008-05-11

      Another example :

      1. Simon, Susan, and Susie want to go to New York.
      ---- Since they are different person, not a person with First-Middle-Last name, so I will have 3 line of training. Is it right?
      the training file is :
      <START> Simon <END>, Susan, and Susie want to go to New York.
      Simon, <START> Susan <END>, and Susie want to go to New York.
      Simon, Susan, and <START> Susie <END> want to go to New York.

      2. I put all of my money in Hongkong Shanghai Bank Corporation (HSBC).
      ---- In order to train organization entity, which tag should I do ? (like training file1 or training file2)

      training file 1 :
      line 1 : I put all of my money in <START> Hongkong Shanghai Bank Corporation (HSBC) <END>.

      training file 2 :
      line 1 : I put all of my money in <START> Hongkong Shanghai Bank Corporation <END> (HSBC).
      line 2 : I put all of my money in Hongkong Shanghai Bank Corporation (<START> HSBC <END>).

      Thank you and regards. Dee.

       
      • Thomas Morton

        Thomas Morton - 2008-05-12

        Hi,
           I should be one line as they are all the same type; person:

        <START> Simon <END>, <START> Susan <END> , and <START> Susie <END> want to go to New York.

          I don't know what the annotation guidelines are for the bank case.  I would go with input one, or:

        I put all of my money in <START> Hongkong Shanghai Bank Corporation <END> ( <START> HSBC <END> ) .

        be sure to put spaces between tags and words.  Hope this helps...Tom

         

Log in to post a comment.