From: Amit J. <ami...@gm...> - 2006-06-30 08:30:32
|
Hi, There is nothing wrong with the package. You need to set the java option for maximum heap size. See "man java" on a linux box for more help. Otherwise, try the following command java -Xmx1000M iitb.Segment.Segment all -f samples/us50.conf Hope this helps. -amit |
From: Anthony L. <ant...@ya...> - 2006-07-01 17:26:01
|
Hmm, it looks like only a couple of people in the world are using this package. --- Anthony Liu <ant...@ya...> wrote: > Hi, Thanks. > > I used only half of my training data and then the > training was successful. > > However, when I test it, I got the following > exception (scroll down for more please.) > > Exception in thread "main" > java.lang.ArrayIndexOutOfBoundsException: -1 > at > iitb.CRF.Viterbi.viterbiSearch(Viterbi.java:168) > at > iitb.CRF.Viterbi.bestLabelSequence(Viterbi.java:137) > at iitb.CRF.CRF.apply(CRF.java:118) > at > iitb.Segment.Segment.segment(Segment.java:191) > at > iitb.Segment.Segment.doTest(Segment.java:252) > at > iitb.Segment.Segment.test(Segment.java:236) > at > iitb.Segment.Segment.main(Segment.java:58) > > I am actually using this Segment application for my > named entity recognition project. I am not sure if > it is gonna work. > > My training data are composed of annotated corpus > from Linguistics Data Consortium (LDC) of Univ of > Pennsylvania. The corpus has a total of 37 tags (or > labels depending upon how you call it). I've > modified the configuration file us50.conf so that > numlabels=37. > > The reason that I am trying to use the Segment > application for my NER project is only because I > have a hard time figuring out what components I need > to > write upon this package for a named entity > recognition task. The documentation is a little bit > confusing. > > It is highly appreciated if you guys could share > your ideas about using this package for NER tasks. > > My training data look like below (faked by me in 2 > minutes, so pls just assume its validity). > > Bangalore/LOC ,/PUNC which/WH is/IS essentially/ADV > a/A brand/NN that/P has/VB3 been/VBP created/VBP > painfully/ADV in/P the/DT last/ADJ so/ADV many/ADJ > years/NN, is/IS fast/ADV disappearing/VBG ,/PUNC > "/PUNC said/VBD Rajendra Misra/PER, managing/ADJ > director/NN of/P private/ABJ equity/NN firm/NN Tenet > Holdings Private/ORG ./PUNC > > Any good idea to share? Thanks. > > > > > > > > > > > > --- "Roger P. Menezes" <ro...@ya...> > wrote: > > > > > Roger P. Menezes wrote: > > > > > > >The number of features looks very large to me. > > The > > > sample application by > > > >default has only 220 features. Even for some > > > complex tasks (our IE > > > >applications) that I have been running, it > > > generates no more than 8000 > > > >features. You may want to take a look at that. > > > Anyways, you can also try > > > >the "java -Xmx<size>" option to increase the > heap > > > size allocated to JVM. > > > > > > > >-regards, > > > >Roger > > > > > > > >Anthony Liu wrote: > > > > > > > > > > > > > > > >>Hi, > > > >> > > > >>Any users of Sunita's Java CRF package out > > there? > > > >> > > > >>I am using the sample Segment application to > > train > > > a > > > >>corpus of 2.7M bytes with 37 labels. > > > >> > > > >> > > > >> > > > I'm sorry, I didn't consider the above. Yes, the > > > number of features may > > > not be unusual then. Increasing the heap size > may > > be > > > the last resort. > > > > > > Roger > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Sunita S. <su...@it...> - 2006-07-03 05:14:06
|
Could you let me know if this is the latest cvs checkout or the latest cut release (which is somewhat old)? This is primarily a single faculty effort code, provided "as is" to daring users. The semi-crf stuff should work, and there are several options to reduce the number of features, etc. Let me know what version of the code you are using. In general, using the latest cvs checkout is recommended. Anthony Liu wrote: > Hmm, it looks like only a couple of people in the > world are using this package. > > --- Anthony Liu <ant...@ya...> wrote: > > >>Hi, Thanks. >> >>I used only half of my training data and then the >>training was successful. >> >>However, when I test it, I got the following >>exception (scroll down for more please.) >> >>Exception in thread "main" >>java.lang.ArrayIndexOutOfBoundsException: -1 >> at >>iitb.CRF.Viterbi.viterbiSearch(Viterbi.java:168) >> at >>iitb.CRF.Viterbi.bestLabelSequence(Viterbi.java:137) >> at iitb.CRF.CRF.apply(CRF.java:118) >> at >>iitb.Segment.Segment.segment(Segment.java:191) >> at >>iitb.Segment.Segment.doTest(Segment.java:252) >> at >>iitb.Segment.Segment.test(Segment.java:236) >> at >>iitb.Segment.Segment.main(Segment.java:58) >> >>I am actually using this Segment application for my >>named entity recognition project. I am not sure if >>it is gonna work. >> >>My training data are composed of annotated corpus >>from Linguistics Data Consortium (LDC) of Univ of >>Pennsylvania. The corpus has a total of 37 tags (or >>labels depending upon how you call it). I've >>modified the configuration file us50.conf so that >>numlabels=37. >> >>The reason that I am trying to use the Segment >>application for my NER project is only because I >>have a hard time figuring out what components I need >>to >>write upon this package for a named entity >>recognition task. The documentation is a little bit >>confusing. >> >>It is highly appreciated if you guys could share >>your ideas about using this package for NER tasks. >> >>My training data look like below (faked by me in 2 >>minutes, so pls just assume its validity). >> >>Bangalore/LOC ,/PUNC which/WH is/IS essentially/ADV >>a/A brand/NN that/P has/VB3 been/VBP created/VBP >>painfully/ADV in/P the/DT last/ADJ so/ADV many/ADJ >>years/NN, is/IS fast/ADV disappearing/VBG ,/PUNC >>"/PUNC said/VBD Rajendra Misra/PER, managing/ADJ >>director/NN of/P private/ABJ equity/NN firm/NN Tenet >>Holdings Private/ORG ./PUNC >> >>Any good idea to share? Thanks. >> >> >> >>> >>> >>> >>>--- "Roger P. Menezes" <ro...@ya...> >> >>wrote: >> >>>>Roger P. Menezes wrote: >>>> >>>> >>>>>The number of features looks very large to me. >>> >>>The >>> >>>>sample application by >>>> >>>>>default has only 220 features. Even for some >>>> >>>>complex tasks (our IE >>>> >>>>>applications) that I have been running, it >>>> >>>>generates no more than 8000 >>>> >>>>>features. You may want to take a look at that. >>>> >>>>Anyways, you can also try >>>> >>>>>the "java -Xmx<size>" option to increase the >> >>heap >> >>>>size allocated to JVM. >>>> >>>>>-regards, >>>>>Roger >>>>> >>>>>Anthony Liu wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>Hi, >>>>>> >>>>>>Any users of Sunita's Java CRF package out >>> >>>there? >>> >>>>>>I am using the sample Segment application to >>> >>>train >>> >>>>a >>>> >>>>>>corpus of 2.7M bytes with 37 labels. >>>>>> >>>>>> >>>>>> >>>> >>>>I'm sorry, I didn't consider the above. Yes, the >>>>number of features may >>>>not be unusual then. Increasing the heap size >> >>may >> >>>be >>> >>>>the last resort. >>>> >>>>Roger >>>> >>> >>> >>>__________________________________________________ >>>Do You Yahoo!? >>>Tired of spam? Yahoo! Mail has the best spam >>>protection around >>>http://mail.yahoo.com >>> >> >> >>__________________________________________________ >>Do You Yahoo!? >>Tired of spam? Yahoo! Mail has the best spam >>protection around >>http://mail.yahoo.com >> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Crf-users mailing list > Crf...@li... > https://lists.sourceforge.net/lists/listinfo/crf-users |
From: Anthony L. <ant...@ya...> - 2006-07-13 16:33:32
|
Hi, Dear Dr. Sarawagi, Thanks a lot for your kind reply and I am sorry for the late response, but I just returned from a two-week trip to New York. I believe I downloaded the latested version from sourceforge, and I am yet to figure out how to do cvs check out. --- Sunita Sarawagi <su...@it...> wrote: > Could you let me know if this is the latest cvs > checkout or the latest > cut release (which is somewhat old)? > > This is primarily a single faculty effort code, > provided "as is" to > daring users. > > The semi-crf stuff should work, and there are > several options to reduce > the number of features, etc. Let me know what > version of the code you > are using. In general, using the latest cvs checkout > is recommended. > > Anthony Liu wrote: > > Hmm, it looks like only a couple of people in the > > world are using this package. > > > > --- Anthony Liu <ant...@ya...> wrote: > > > > > >>Hi, Thanks. > >> > >>I used only half of my training data and then the > >>training was successful. > >> > >>However, when I test it, I got the following > >>exception (scroll down for more please.) > >> > >>Exception in thread "main" > >>java.lang.ArrayIndexOutOfBoundsException: -1 > >> at > >>iitb.CRF.Viterbi.viterbiSearch(Viterbi.java:168) > >> at > >>iitb.CRF.Viterbi.bestLabelSequence(Viterbi.java:137) > >> at iitb.CRF.CRF.apply(CRF.java:118) > >> at > >>iitb.Segment.Segment.segment(Segment.java:191) > >> at > >>iitb.Segment.Segment.doTest(Segment.java:252) > >> at > >>iitb.Segment.Segment.test(Segment.java:236) > >> at > >>iitb.Segment.Segment.main(Segment.java:58) > >> > >>I am actually using this Segment application for > my > >>named entity recognition project. I am not sure > if > >>it is gonna work. > >> > >>My training data are composed of annotated corpus > >>from Linguistics Data Consortium (LDC) of Univ of > >>Pennsylvania. The corpus has a total of 37 tags > (or > >>labels depending upon how you call it). I've > >>modified the configuration file us50.conf so that > >>numlabels=37. > >> > >>The reason that I am trying to use the Segment > >>application for my NER project is only because I > >>have a hard time figuring out what components I > need > >>to > >>write upon this package for a named entity > >>recognition task. The documentation is a little > bit > >>confusing. > >> > >>It is highly appreciated if you guys could share > >>your ideas about using this package for NER tasks. > >> > >>My training data look like below (faked by me in 2 > >>minutes, so pls just assume its validity). > >> > >>Bangalore/LOC ,/PUNC which/WH is/IS > essentially/ADV > >>a/A brand/NN that/P has/VB3 been/VBP created/VBP > >>painfully/ADV in/P the/DT last/ADJ so/ADV many/ADJ > >>years/NN, is/IS fast/ADV disappearing/VBG ,/PUNC > >>"/PUNC said/VBD Rajendra Misra/PER, managing/ADJ > >>director/NN of/P private/ABJ equity/NN firm/NN > Tenet > >>Holdings Private/ORG ./PUNC > >> > >>Any good idea to share? Thanks. > >> > >> > >> > >>> > >>> > >>> > >>>--- "Roger P. Menezes" <ro...@ya...> > >> > >>wrote: > >> > >>>>Roger P. Menezes wrote: > >>>> > >>>> > >>>>>The number of features looks very large to me. > >>> > >>>The > >>> > >>>>sample application by > >>>> > >>>>>default has only 220 features. Even for some > >>>> > >>>>complex tasks (our IE > >>>> > >>>>>applications) that I have been running, it > >>>> > >>>>generates no more than 8000 > >>>> > >>>>>features. You may want to take a look at that. > >>>> > >>>>Anyways, you can also try > >>>> > >>>>>the "java -Xmx<size>" option to increase the > >> > >>heap > >> > >>>>size allocated to JVM. > >>>> > >>>>>-regards, > >>>>>Roger > >>>>> > >>>>>Anthony Liu wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>Hi, > >>>>>> > >>>>>>Any users of Sunita's Java CRF package out > >>> > >>>there? > >>> > >>>>>>I am using the sample Segment application to > >>> > >>>train > >>> > >>>>a > >>>> > >>>>>>corpus of 2.7M bytes with 37 labels. > >>>>>> > >>>>>> > >>>>>> > >>>> > >>>>I'm sorry, I didn't consider the above. Yes, the > >>>>number of features may > >>>>not be unusual then. Increasing the heap size > >> > >>may > >> > >>>be > >>> > >>>>the last resort. > >>>> > >>>>Roger > >>>> > >>> > >>> > >>>__________________________________________________ > >>>Do You Yahoo!? > >>>Tired of spam? Yahoo! Mail has the best spam > >>>protection around > >>>http://mail.yahoo.com > >>> > >> > >> > >>__________________________________________________ > >>Do You Yahoo!? > >>Tired of spam? Yahoo! Mail has the best spam > >>protection around > >>http://mail.yahoo.com > >> > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > Using Tomcat but need to do more? Need to support > web services, security? > > Get stuff done quickly with pre-integrated > technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 > based on Apache Geronimo > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Crf-users mailing list > > Crf...@li... > > > https://lists.sourceforge.net/lists/listinfo/crf-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Anthony L. <ant...@ya...> - 2006-07-16 19:33:41
|
Will it help to increase the JVM? I am training again my issuing java -Xms256M -Xmx1024M iitb.blahblahblah Not sure if it's gonna work, since it'll take a little while if anything happens. --- Sunita Sarawagi <su...@it...> wrote: > Could you let me know if this is the latest cvs > checkout or the latest > cut release (which is somewhat old)? > > This is primarily a single faculty effort code, > provided "as is" to > daring users. > > The semi-crf stuff should work, and there are > several options to reduce > the number of features, etc. Let me know what > version of the code you > are using. In general, using the latest cvs checkout > is recommended. > > Anthony Liu wrote: > > Hmm, it looks like only a couple of people in the > > world are using this package. > > > > --- Anthony Liu <ant...@ya...> wrote: > > > > > >>Hi, Thanks. > >> > >>I used only half of my training data and then the > >>training was successful. > >> > >>However, when I test it, I got the following > >>exception (scroll down for more please.) > >> > >>Exception in thread "main" > >>java.lang.ArrayIndexOutOfBoundsException: -1 > >> at > >>iitb.CRF.Viterbi.viterbiSearch(Viterbi.java:168) > >> at > >>iitb.CRF.Viterbi.bestLabelSequence(Viterbi.java:137) > >> at iitb.CRF.CRF.apply(CRF.java:118) > >> at > >>iitb.Segment.Segment.segment(Segment.java:191) > >> at > >>iitb.Segment.Segment.doTest(Segment.java:252) > >> at > >>iitb.Segment.Segment.test(Segment.java:236) > >> at > >>iitb.Segment.Segment.main(Segment.java:58) > >> > >>I am actually using this Segment application for > my > >>named entity recognition project. I am not sure > if > >>it is gonna work. > >> > >>My training data are composed of annotated corpus > >>from Linguistics Data Consortium (LDC) of Univ of > >>Pennsylvania. The corpus has a total of 37 tags > (or > >>labels depending upon how you call it). I've > >>modified the configuration file us50.conf so that > >>numlabels=37. > >> > >>The reason that I am trying to use the Segment > >>application for my NER project is only because I > >>have a hard time figuring out what components I > need > >>to > >>write upon this package for a named entity > >>recognition task. The documentation is a little > bit > >>confusing. > >> > >>It is highly appreciated if you guys could share > >>your ideas about using this package for NER tasks. > >> > >>My training data look like below (faked by me in 2 > >>minutes, so pls just assume its validity). > >> > >>Bangalore/LOC ,/PUNC which/WH is/IS > essentially/ADV > >>a/A brand/NN that/P has/VB3 been/VBP created/VBP > >>painfully/ADV in/P the/DT last/ADJ so/ADV many/ADJ > >>years/NN, is/IS fast/ADV disappearing/VBG ,/PUNC > >>"/PUNC said/VBD Rajendra Misra/PER, managing/ADJ > >>director/NN of/P private/ABJ equity/NN firm/NN > Tenet > >>Holdings Private/ORG ./PUNC > >> > >>Any good idea to share? Thanks. > >> > >> > >> > >>> > >>> > >>> > >>>--- "Roger P. Menezes" <ro...@ya...> > >> > >>wrote: > >> > >>>>Roger P. Menezes wrote: > >>>> > >>>> > >>>>>The number of features looks very large to me. > >>> > >>>The > >>> > >>>>sample application by > >>>> > >>>>>default has only 220 features. Even for some > >>>> > >>>>complex tasks (our IE > >>>> > >>>>>applications) that I have been running, it > >>>> > >>>>generates no more than 8000 > >>>> > >>>>>features. You may want to take a look at that. > >>>> > >>>>Anyways, you can also try > >>>> > >>>>>the "java -Xmx<size>" option to increase the > >> > >>heap > >> > >>>>size allocated to JVM. > >>>> > >>>>>-regards, > >>>>>Roger > >>>>> > >>>>>Anthony Liu wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>Hi, > >>>>>> > >>>>>>Any users of Sunita's Java CRF package out > >>> > >>>there? > >>> > >>>>>>I am using the sample Segment application to > >>> > >>>train > >>> > >>>>a > >>>> > >>>>>>corpus of 2.7M bytes with 37 labels. > >>>>>> > >>>>>> > >>>>>> > >>>> > >>>>I'm sorry, I didn't consider the above. Yes, the > >>>>number of features may > >>>>not be unusual then. Increasing the heap size > >> > >>may > >> > >>>be > >>> > >>>>the last resort. > >>>> > >>>>Roger > >>>> > >>> > >>> > >>>__________________________________________________ > >>>Do You Yahoo!? > >>>Tired of spam? Yahoo! Mail has the best spam > >>>protection around > >>>http://mail.yahoo.com > >>> > >> > >> > >>__________________________________________________ > >>Do You Yahoo!? > >>Tired of spam? Yahoo! Mail has the best spam > >>protection around > >>http://mail.yahoo.com > >> > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > Using Tomcat but need to do more? Need to support > web services, security? > > Get stuff done quickly with pre-integrated > technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 > based on Apache Geronimo > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Crf-users mailing list > > Crf...@li... > > > https://lists.sourceforge.net/lists/listinfo/crf-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |