## crf-users

You can subscribe to this list here.

 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Jan Feb Mar Apr May Jun Jul Aug (1) Sep Oct Nov (2) Dec Jan (4) Feb Mar Apr (3) May Jun (5) Jul (9) Aug Sep (3) Oct Nov (2) Dec Jan (6) Feb (5) Mar (3) Apr (7) May (3) Jun Jul Aug (7) Sep (8) Oct (6) Nov (1) Dec Jan Feb (3) Mar (9) Apr (2) May (2) Jun (2) Jul (10) Aug (4) Sep (12) Oct (7) Nov (29) Dec (35) Jan (10) Feb (16) Mar (17) Apr (20) May (42) Jun (19) Jul (32) Aug (8) Sep (2) Oct (2) Nov Dec (10) Jan (7) Feb (8) Mar (3) Apr May (5) Jun (1) Jul (2) Aug (2) Sep (1) Oct Nov (1) Dec Jan (1) Feb Mar Apr May Jun Jul (2) Aug Sep (2) Oct Nov Dec Jan Feb (1) Mar (10) Apr (1) May (3) Jun Jul (7) Aug Sep Oct Nov Dec (1) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec (1) Jan Feb Mar Apr May (1) Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun (2) Jul (2) Aug Sep Oct Nov Dec
S M T W T F S

1

2

3

4

5

6

7
(1)
8
(1)
9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24
(1)
25
(2)
26

27

28

29

30

31
(1)

Showing 2 results of 2

 Re: [Crf-users] Crf-users Digest, Vol 5, Issue 3 From: Amit Jaiswal - 2007-01-25 04:24:20 Attachments: Message as HTML ```Hi, A small correction. There is an error in the example given in point (2). The correct one is: Now consider the global feature vector F() for difference possible label sequences a) For Y = [ 0 0 0 0] F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 1 0 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 2 0 ] b) For Y = [ 1 0 0 0 ] F(Y|X) = f(y = 1, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 0 1 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 1 1 ] -regards Amit On 1/25/07, Amit Jaiswal wrote: > > Hi, > A quick reply now (will give a more detailed reply in a day or two) > 1. If there are n classes, then there are n features for a particular > "property". If you see the definition of a feature, it is a function of both > the property that you want to represent and the state/label. For example, > say the property is "isCapitalized" and there are 2 classes, then there will > be 2 features (with different feature ids) > i) isCapitalized is true and state = 0 ( featureID = 0) > ii) isCapitalized is true and state = 1 ( featureID = 1) > > 2. If you look at the maximum log likelihood equation in the paper(Shallow > Parsing), then you will see that the numerator contains the label sequence > of all the training instances, and the denominator contains a normalizing > term for all possible label sequences. > > For a position (say) x=0, we fire features for all the states in which the > feature will be true (in all the FeatureTypes class). > > Then in the CRF trainer, while computing the F(Y|X) for a particular label > sequence Y, we take only those feature values whose state matches with the > state present in that label sequence . > > For example, say n=2 [ 0 = other, 1 = NounPhrase], and feature = > "isCapitalized", and datasequence = "Today is Thursday ." > Trainig data = Y = [ 1 0 1 0] > > > We fire 2 features at each position in the sequence. So at pos = 0 where > the word is capitalized, , the features we fire are > i) isCapitalized=1 and y=0 (featureId=0) > ii) isCapitalized=1 and y=1 (featureId=1) > > Note that the length of the feature vector = 2. > > Similarly at the rest of the positions, we fire features by looking if the > word at that position is capitalized or not. > > Now consider the global feature vector F() for difference possible label > sequences > a) > For Y = [ 0 0 0 0] > F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) > F(Y|X) = [ 1 0 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] > F(Y|X) = [ 2 0 ] > > b) > For Y = [ 1 0 0 0 ] > F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) > F(Y|X) = [ 0 1 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] > F(Y|X) = [ 1 1 ] > > > If you look at the equations in the paper, then it is this global feature > vector F(,) which is used. > So for each possible label sequence Y, such F(Y|X) needs to be computed, > and thus we fire features for all the possible states, and the CRF trainer > takes care of generating the proper F(Y|X) for a particular label sequence > Y. > > 3. Answer to your second question is bit simple. All the features are > mapped to a contiguous array and each feature is given a unique id. You can > look at iitb.Model.FeatureGenImpl class to see the implementation details. > > 4. Now about the unseen words seen during testing. WordFeatures is the > feature that fires all the word features. There is an integer parameter > called RARE_THRESHOLD. Any word that is not seen atleast RARE_THRESHOLD > times in the training data is considered as a rare / unknown word and is not > fired as a feature. There is another feature called UnknownFeatures which is > fired only for such rare words. > > So, the UnknownFeatures is fired for any word that is seen only in testing > data (because its frequency in the training data would be 0 and thus will be > less than RARE_THRESHOLD) > > 5. The last question is bit confusing. First try to understand the meaning > of the feature vector and how this is implemented in CRF package. If you > need any specific detail about any class/package, then please send a mail. > > 6. An excellent documentation about using the CRF package is given in > http://crf.sourceforge.net/introduction/ > > Hope this helps. > > -amit > > > > > > > > > > > > > > > > > > On 1/25/07, crf-users-request@... > wrote: > > > > Send Crf-users mailing list submissions to > > crf-users@... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/crf-users > > or, via email, send a message with subject or body 'help' to > > crf-users-request@... > > > > You can reach the person managing the list at > > crf-users-owner@... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Crf-users digest..." > > > > > > Today's Topics: > > > > 1. CRF Implementation (debanjan.ghosh@...) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 24 Jan 2007 09:13:50 -0500 > > From: > > Subject: [Crf-users] CRF Implementation > > To: < crf-users@...> > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="us-ascii" > > > > > > > > > Dear Dr. Sarawagi, > > > I am working in the Information Extraction research group of Thomson > > Corp. and recently got a chance to use the CRF package that you have > > created. Thanks for the excellent implementation, it is really a very > > useful module and personally I feel more comfortable than any other > > available implementations. > > > > > > However, I have few very basic doubts regarding the code, especially > > on the usage and values of the weight vector (lambda) during the > > training procedure. I will be grateful if you can clarify them. > > > > > > Firstly, I started working with the same data corpus (address > > sequence) you have implemented in the sample example. From my > > understanding of the original Mccallum paper I thought the lambda vector > > (weight vector) will have the same length as number of the feature > > functions. I generated 4 state feature functions (depending upon the > > address data) and 3 transition (emission) feature functions. So the > > weight vector has a length of 7 in my case. Whenever I find any feature > > from the training data (that is if the data passes any particular > > feature function - which is a boolean function) I update the lambda of > > the particular feature function index. Where as, in the java CRF > > implementation, the weight vector has a length of the size of total > > possible feature vectors (I think it is 220). This is little confusing > > for me. > > > > > > Secondly, I checked out that the generated features are composed of > > some functions (all caps, alpha-numeric property, the "word" etc.). > > Based on the "feature index" for the weight vector, you apply the > > Viterbi algorithm. My doubt is, while finding out the index from weight > > vector (during evaluation) how do you match the index of the trained > > weight vector? In the training implementation every word is a feature in > > > > your case, if some unseen word (which is very possible) occurs during > > the testing procedure then do you index this word as "unseen feature"? > > > > > > Lastly, many words can be represented as various features (such as the > > > > "word", alphanumeric or not, starts with caps etc.). While finding the > > index of the weight vector to match during evaluation, how do you select > > which feature index to be used (from the weight vector)? Do you give > > weight to any particular feature (say, if the word matches with a > > training data, it is of higher priority) than the other? > > > > > > Thanks in advance, > > > > > > Regards, > > > Debanjan > > > > > > > > ------------------------------ > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > ------------------------------ > > > > _______________________________________________ > > Crf-users mailing list > > Crf-users@... > > https://lists.sourceforge.net/lists/listinfo/crf-users > > > > > > End of Crf-users Digest, Vol 5, Issue 3 > > *************************************** > > > > ```
 Re: [Crf-users] Crf-users Digest, Vol 5, Issue 3 From: Amit Jaiswal - 2007-01-25 04:17:20 Attachments: Message as HTML ```Hi, A quick reply now (will give a more detailed reply in a day or two) 1. If there are n classes, then there are n features for a particular "property". If you see the definition of a feature, it is a function of both the property that you want to represent and the state/label. For example, say the property is "isCapitalized" and there are 2 classes, then there will be 2 features (with different feature ids) i) isCapitalized is true and state = 0 ( featureID = 0) ii) isCapitalized is true and state = 1 ( featureID = 1) 2. If you look at the maximum log likelihood equation in the paper(Shallow Parsing), then you will see that the numerator contains the label sequence of all the training instances, and the denominator contains a normalizing term for all possible label sequences. For a position (say) x=0, we fire features for all the states in which the feature will be true (in all the FeatureTypes class). Then in the CRF trainer, while computing the F(Y|X) for a particular label sequence Y, we take only those feature values whose state matches with the state present in that label sequence . For example, say n=2 [ 0 = other, 1 = NounPhrase], and feature = "isCapitalized", and datasequence = "Today is Thursday ." Trainig data = Y = [ 1 0 1 0] We fire 2 features at each position in the sequence. So at pos = 0 where the word is capitalized, , the features we fire are i) isCapitalized=1 and y=0 (featureId=0) ii) isCapitalized=1 and y=1 (featureId=1) Note that the length of the feature vector = 2. Similarly at the rest of the positions, we fire features by looking if the word at that position is capitalized or not. Now consider the global feature vector F() for difference possible label sequences a) For Y = [ 0 0 0 0] F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 1 0 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 2 0 ] b) For Y = [ 1 0 0 0 ] F(Y|X) = f(y = 0, pos=0) + f(y=0, pos=1) + f(y=0, pos=2) + f(y=0, pos=3) F(Y|X) = [ 0 1 ] + [ 0 0 ] + [ 1 0 ] + [ 0 0] F(Y|X) = [ 1 1 ] If you look at the equations in the paper, then it is this global feature vector F(,) which is used. So for each possible label sequence Y, such F(Y|X) needs to be computed, and thus we fire features for all the possible states, and the CRF trainer takes care of generating the proper F(Y|X) for a particular label sequence Y. 3. Answer to your second question is bit simple. All the features are mapped to a contiguous array and each feature is given a unique id. You can look at iitb.Model.FeatureGenImpl class to see the implementation details. 4. Now about the unseen words seen during testing. WordFeatures is the feature that fires all the word features. There is an integer parameter called RARE_THRESHOLD. Any word that is not seen atleast RARE_THRESHOLD times in the training data is considered as a rare / unknown word and is not fired as a feature. There is another feature called UnknownFeatures which is fired only for such rare words. So, the UnknownFeatures is fired for any word that is seen only in testing data (because its frequency in the training data would be 0 and thus will be less than RARE_THRESHOLD) 5. The last question is bit confusing. First try to understand the meaning of the feature vector and how this is implemented in CRF package. If you need any specific detail about any class/package, then please send a mail. 6. An excellent documentation about using the CRF package is given in http://crf.sourceforge.net/introduction/ Hope this helps. -amit On 1/25/07, crf-users-request@... < crf-users-request@...> wrote: > > Send Crf-users mailing list submissions to > crf-users@... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/crf-users > or, via email, send a message with subject or body 'help' to > crf-users-request@... > > You can reach the person managing the list at > crf-users-owner@... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Crf-users digest..." > > > Today's Topics: > > 1. CRF Implementation (debanjan.ghosh@...) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 24 Jan 2007 09:13:50 -0500 > From: > Subject: [Crf-users] CRF Implementation > To: > Message-ID: > < > FD393E0888D1E342AC6129FE8AF2AA8906BCD472@...> > > Content-Type: text/plain; charset="us-ascii" > > > > > Dear Dr. Sarawagi, > > I am working in the Information Extraction research group of Thomson > Corp. and recently got a chance to use the CRF package that you have > created. Thanks for the excellent implementation, it is really a very > useful module and personally I feel more comfortable than any other > available implementations. > > > > However, I have few very basic doubts regarding the code, especially > on the usage and values of the weight vector (lambda) during the > training procedure. I will be grateful if you can clarify them. > > > > Firstly, I started working with the same data corpus (address > sequence) you have implemented in the sample example. From my > understanding of the original Mccallum paper I thought the lambda vector > (weight vector) will have the same length as number of the feature > functions. I generated 4 state feature functions (depending upon the > address data) and 3 transition (emission) feature functions. So the > weight vector has a length of 7 in my case. Whenever I find any feature > from the training data (that is if the data passes any particular > feature function - which is a boolean function) I update the lambda of > the particular feature function index. Where as, in the java CRF > implementation, the weight vector has a length of the size of total > possible feature vectors (I think it is 220). This is little confusing > for me. > > > > Secondly, I checked out that the generated features are composed of > some functions (all caps, alpha-numeric property, the "word" etc.). > Based on the "feature index" for the weight vector, you apply the > Viterbi algorithm. My doubt is, while finding out the index from weight > vector (during evaluation) how do you match the index of the trained > weight vector? In the training implementation every word is a feature in > your case, if some unseen word (which is very possible) occurs during > the testing procedure then do you index this word as "unseen feature"? > > > > Lastly, many words can be represented as various features (such as the > "word", alphanumeric or not, starts with caps etc.). While finding the > index of the weight vector to match during evaluation, how do you select > which feature index to be used (from the weight vector)? Do you give > weight to any particular feature (say, if the word matches with a > training data, it is of higher priority) than the other? > > > > Thanks in advance, > > > > Regards, > > Debanjan > > > > ------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > ------------------------------ > > _______________________________________________ > Crf-users mailing list > Crf-users@... > https://lists.sourceforge.net/lists/listinfo/crf-users > > > End of Crf-users Digest, Vol 5, Issue 3 > *************************************** > ```

Showing 2 results of 2