I will take an example to explain the difference between feature label and
Suppose, you have 4 labels (starting from 0) - NN, J, VB, DT for doing POS
tagging. Now, a sequence can start from only some specific labels and you
want to encode this fact as a feature. (Lets call it
There will be in all four features for this feature type (one for each
During training of this feature, you try to determine that which of the
labels occur at pos = 0, by looking at the actual labels in training data.
Suppose it turns out that only label 0 and 3 occur at the starting position
(i.e. NN and DT)
Now, while firing features, in startScanFeaturesAt(DataSequence dataSeq, int
prev, int pos)
if (pos != 0) then you will not fire any features, and will return false.
Implicity, that means that all the feature values for pos != 0 will be 0.
Now, for pos = 0, you will fire two features, one for label = 0 and one for
label = 3 (and setting the feature value = 1)
For the rest of the labels, no feature is fired so that implicity means that
their feature value is 0. You can achieve the same by setting the feature
values like this:
i) yend = 0, val = 1, ystart = -1, id = 0
ii) yend = 1, val = 0, ystart = -1, id = 1
iii) yend = 2, val = 0, ystart = -1, id = 2
iv) yend = 3, val = 1, ystart = -1, id = 3
And just to clarify, the actual label of the data is seen only during the
training. (And in a FeatureType, the actual label is seen only in the
train() method, and not while firing the features)
When you are firing features, you fire it for all possible labels for which
the feature will hold true. If you search the previous posting on this
mailing list, you will get some explanation on this.