From: Amit J. <ami...@gm...> - 2007-01-25 04:24:20
|
Hi, A small correction. There is an error in the example given in point (2). The correct one is: Now consider the global feature vector F() for difference possible label sequences a) For Y = [ 0 0 0 0] F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 1 0 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 2 0 ] b) For Y = [ 1 0 0 0 ] F(Y|X) = f(y = 1, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 0 1 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 1 1 ] -regards Amit On 1/25/07, Amit Jaiswal <ami...@gm...> wrote: > > Hi, > A quick reply now (will give a more detailed reply in a day or two) > 1. If there are n classes, then there are n features for a particular > "property". If you see the definition of a feature, it is a function of both > the property that you want to represent and the state/label. For example, > say the property is "isCapitalized" and there are 2 classes, then there will > be 2 features (with different feature ids) > i) isCapitalized is true and state = 0 ( featureID = 0) > ii) isCapitalized is true and state = 1 ( featureID = 1) > > 2. If you look at the maximum log likelihood equation in the paper(Shallow > Parsing), then you will see that the numerator contains the label sequence > of all the training instances, and the denominator contains a normalizing > term for all possible label sequences. > > For a position (say) x=0, we fire features for all the states in which the > feature will be true (in all the FeatureTypes class). > > Then in the CRF trainer, while computing the F(Y|X) for a particular label > sequence Y, we take only those feature values whose state matches with the > state present in that label sequence . > > For example, say n=2 [ 0 = other, 1 = NounPhrase], and feature = > "isCapitalized", and datasequence = "Today is Thursday ." > Trainig data = Y = [ 1 0 1 0] > > > We fire 2 features at each position in the sequence. So at pos = 0 where > the word is capitalized, , the features we fire are > i) isCapitalized=1 and y=0 (featureId=0) > ii) isCapitalized=1 and y=1 (featureId=1) > > Note that the length of the feature vector = 2. > > Similarly at the rest of the positions, we fire features by looking if the > word at that position is capitalized or not. > > Now consider the global feature vector F() for difference possible label > sequences > a) > For Y = [ 0 0 0 0] > F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) > F(Y|X) = [ 1 0 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] > F(Y|X) = [ 2 0 ] > > b) > For Y = [ 1 0 0 0 ] > F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) > F(Y|X) = [ 0 1 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] > F(Y|X) = [ 1 1 ] > > > If you look at the equations in the paper, then it is this global feature > vector F(,) which is used. > So for each possible label sequence Y, such F(Y|X) needs to be computed, > and thus we fire features for all the possible states, and the CRF trainer > takes care of generating the proper F(Y|X) for a particular label sequence > Y. > > 3. Answer to your second question is bit simple. All the features are > mapped to a contiguous array and each feature is given a unique id. You can > look at iitb.Model.FeatureGenImpl class to see the implementation details. > > 4. Now about the unseen words seen during testing. WordFeatures is the > feature that fires all the word features. There is an integer parameter > called RARE_THRESHOLD. Any word that is not seen atleast RARE_THRESHOLD > times in the training data is considered as a rare / unknown word and is not > fired as a feature. There is another feature called UnknownFeatures which is > fired only for such rare words. > > So, the UnknownFeatures is fired for any word that is seen only in testing > data (because its frequency in the training data would be 0 and thus will be > less than RARE_THRESHOLD) > > 5. The last question is bit confusing. First try to understand the meaning > of the feature vector and how this is implemented in CRF package. If you > need any specific detail about any class/package, then please send a mail. > > 6. An excellent documentation about using the CRF package is given in > http://crf.sourceforge.net/introduction/ > > Hope this helps. > > -amit > > > > > > > > > > > > > > > > > > On 1/25/07, crf...@li... <crf...@li...> > wrote: > > > > Send Crf-users mailing list submissions to > > crf...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/crf-users > > or, via email, send a message with subject or body 'help' to > > crf...@li... > > > > You can reach the person managing the list at > > crf...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Crf-users digest..." > > > > > > Today's Topics: > > > > 1. CRF Implementation (deb...@th...) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 24 Jan 2007 09:13:50 -0500 > > From: <deb...@th...> > > Subject: [Crf-users] CRF Implementation > > To: < crf...@li...> > > Message-ID: > > <FD3...@tl... > > > > > > > Content-Type: text/plain; charset="us-ascii" > > > > > > > > > Dear Dr. Sarawagi, > > > I am working in the Information Extraction research group of Thomson > > Corp. and recently got a chance to use the CRF package that you have > > created. Thanks for the excellent implementation, it is really a very > > useful module and personally I feel more comfortable than any other > > available implementations. > > > > > > However, I have few very basic doubts regarding the code, especially > > on the usage and values of the weight vector (lambda) during the > > training procedure. I will be grateful if you can clarify them. > > > > > > Firstly, I started working with the same data corpus (address > > sequence) you have implemented in the sample example. From my > > understanding of the original Mccallum paper I thought the lambda vector > > (weight vector) will have the same length as number of the feature > > functions. I generated 4 state feature functions (depending upon the > > address data) and 3 transition (emission) feature functions. So the > > weight vector has a length of 7 in my case. Whenever I find any feature > > from the training data (that is if the data passes any particular > > feature function - which is a boolean function) I update the lambda of > > the particular feature function index. Where as, in the java CRF > > implementation, the weight vector has a length of the size of total > > possible feature vectors (I think it is 220). This is little confusing > > for me. > > > > > > Secondly, I checked out that the generated features are composed of > > some functions (all caps, alpha-numeric property, the "word" etc.). > > Based on the "feature index" for the weight vector, you apply the > > Viterbi algorithm. My doubt is, while finding out the index from weight > > vector (during evaluation) how do you match the index of the trained > > weight vector? In the training implementation every word is a feature in > > > > your case, if some unseen word (which is very possible) occurs during > > the testing procedure then do you index this word as "unseen feature"? > > > > > > Lastly, many words can be represented as various features (such as the > > > > "word", alphanumeric or not, starts with caps etc.). While finding the > > index of the weight vector to match during evaluation, how do you select > > which feature index to be used (from the weight vector)? Do you give > > weight to any particular feature (say, if the word matches with a > > training data, it is of higher priority) than the other? > > > > > > Thanks in advance, > > > > > > Regards, > > > Debanjan > > > > > > > > ------------------------------ > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > ------------------------------ > > > > _______________________________________________ > > Crf-users mailing list > > Crf...@li... > > https://lists.sourceforge.net/lists/listinfo/crf-users > > > > > > End of Crf-users Digest, Vol 5, Issue 3 > > *************************************** > > > > |