From: Amit J. <ami...@gm...> - 2006-07-14 04:31:47
|
Hi, It is easy to do a CVS checkout of CRF package ( http://sourceforge.net/cvs/?group_id=105386) cvs -d:pserver:ano...@cr...:/cvsroot/crf login cvs -z3 -d:pserver:ano...@cr...:/cvsroot/crf co -P CRF -amit On 7/14/06, crf...@li... < crf...@li...> wrote: > > Send Crf-users mailing list submissions to > crf...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/crf-users > or, via email, send a message with subject or body 'help' to > crf...@li... > > You can reach the person managing the list at > crf...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Crf-users digest..." > > > Today's Topics: > > 1. Re: I got java.lang.OutOfMemoryError with Sunita's CRF > package (Anthony Liu) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 13 Jul 2006 09:33:26 -0700 (PDT) > From: Anthony Liu <ant...@ya...> > Subject: Re: [Crf-users] I got java.lang.OutOfMemoryError with > Sunita's CRF package > To: Sunita Sarawagi <su...@it...> > Cc: crf...@li... > Message-ID: <200...@we...> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, Dear Dr. Sarawagi, > > Thanks a lot for your kind reply and I am sorry for > the late response, but I just returned from a two-week > trip to New York. > > I believe I downloaded the latested version from > sourceforge, and I am yet to figure out how to do cvs > check out. > > > > --- Sunita Sarawagi <su...@it...> wrote: > > > Could you let me know if this is the latest cvs > > checkout or the latest > > cut release (which is somewhat old)? > > > > This is primarily a single faculty effort code, > > provided "as is" to > > daring users. > > > > The semi-crf stuff should work, and there are > > several options to reduce > > the number of features, etc. Let me know what > > version of the code you > > are using. In general, using the latest cvs checkout > > is recommended. > > > > Anthony Liu wrote: > > > Hmm, it looks like only a couple of people in the > > > world are using this package. > > > > > > --- Anthony Liu <ant...@ya...> wrote: > > > > > > > > >>Hi, Thanks. > > >> > > >>I used only half of my training data and then the > > >>training was successful. > > >> > > >>However, when I test it, I got the following > > >>exception (scroll down for more please.) > > >> > > >>Exception in thread "main" > > >>java.lang.ArrayIndexOutOfBoundsException: -1 > > >> at > > >>iitb.CRF.Viterbi.viterbiSearch(Viterbi.java:168) > > >> at > > > >>iitb.CRF.Viterbi.bestLabelSequence(Viterbi.java:137) > > >> at iitb.CRF.CRF.apply(CRF.java:118) > > >> at > > >>iitb.Segment.Segment.segment(Segment.java:191) > > >> at > > >>iitb.Segment.Segment.doTest(Segment.java:252) > > >> at > > >>iitb.Segment.Segment.test(Segment.java:236) > > >> at > > >>iitb.Segment.Segment.main(Segment.java:58) > > >> > > >>I am actually using this Segment application for > > my > > >>named entity recognition project. I am not sure > > if > > >>it is gonna work. > > >> > > >>My training data are composed of annotated corpus > > >>from Linguistics Data Consortium (LDC) of Univ of > > >>Pennsylvania. The corpus has a total of 37 tags > > (or > > >>labels depending upon how you call it). I've > > >>modified the configuration file us50.conf so that > > >>numlabels=37. > > >> > > >>The reason that I am trying to use the Segment > > >>application for my NER project is only because I > > >>have a hard time figuring out what components I > > need > > >>to > > >>write upon this package for a named entity > > >>recognition task. The documentation is a little > > bit > > >>confusing. > > >> > > >>It is highly appreciated if you guys could share > > >>your ideas about using this package for NER tasks. > > >> > > >>My training data look like below (faked by me in 2 > > >>minutes, so pls just assume its validity). > > >> > > >>Bangalore/LOC ,/PUNC which/WH is/IS > > essentially/ADV > > >>a/A brand/NN that/P has/VB3 been/VBP created/VBP > > >>painfully/ADV in/P the/DT last/ADJ so/ADV many/ADJ > > >>years/NN, is/IS fast/ADV disappearing/VBG ,/PUNC > > >>"/PUNC said/VBD Rajendra Misra/PER, managing/ADJ > > >>director/NN of/P private/ABJ equity/NN firm/NN > > Tenet > > >>Holdings Private/ORG ./PUNC > > >> > > >>Any good idea to share? Thanks. > > >> > > >> > > >> > > >>> > > >>> > > >>> > > >>>--- "Roger P. Menezes" <ro...@ya...> > > >> > > >>wrote: > > >> > > >>>>Roger P. Menezes wrote: > > >>>> > > >>>> > > >>>>>The number of features looks very large to me. > > >>> > > >>>The > > >>> > > >>>>sample application by > > >>>> > > >>>>>default has only 220 features. Even for some > > >>>> > > >>>>complex tasks (our IE > > >>>> > > >>>>>applications) that I have been running, it > > >>>> > > >>>>generates no more than 8000 > > >>>> > > >>>>>features. You may want to take a look at that. > > >>>> > > >>>>Anyways, you can also try > > >>>> > > >>>>>the "java -Xmx<size>" option to increase the > > >> > > >>heap > > >> > > >>>>size allocated to JVM. > > >>>> > > >>>>>-regards, > > >>>>>Roger > > >>>>> > > >>>>>Anthony Liu wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>>Hi, > > >>>>>> > > >>>>>>Any users of Sunita's Java CRF package out > > >>> > > >>>there? > > >>> > > >>>>>>I am using the sample Segment application to > > >>> > > >>>train > > >>> > > >>>>a > > >>>> > > >>>>>>corpus of 2.7M bytes with 37 labels. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>>>I'm sorry, I didn't consider the above. Yes, the > > >>>>number of features may > > >>>>not be unusual then. Increasing the heap size > > >> > > >>may > > >> > > >>>be > > >>> > > >>>>the last resort. > > >>>> > > >>>>Roger > > >>>> > > >>> > > >>> > > > >>>__________________________________________________ > > >>>Do You Yahoo!? > > >>>Tired of spam? Yahoo! Mail has the best spam > > >>>protection around > > >>>http://mail.yahoo.com > > >>> > > >> > > >> > > >>__________________________________________________ > > >>Do You Yahoo!? > > >>Tired of spam? Yahoo! Mail has the best spam > > >>protection around > > >>http://mail.yahoo.com > > >> > > > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > > http://mail.yahoo.com > > > > > > Using Tomcat but need to do more? Need to support > > web services, security? > > > Get stuff done quickly with pre-integrated > > technology to make your job easier > > > Download IBM WebSphere Application Server v.1.0.1 > > based on Apache Geronimo > > > > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > > _______________________________________________ > > > Crf-users mailing list > > > Crf...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/crf-users > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > > ------------------------------ > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > > ------------------------------ > > _______________________________________________ > Crf-users mailing list > Crf...@li... > https://lists.sourceforge.net/lists/listinfo/crf-users > > > End of Crf-users Digest, Vol 2, Issue 5 > *************************************** > |