Hi,
The default formats (such as the ones shown in the sample data) are designed to be sparse with zero valued features not being explicitly represented. Any feature which is listed is considered to occur once and features which occur more than once in an event are simply listed as many times as they occur. For instance if I'm build a "how good was my day classifier" and I think the number of meetings I have is a good indicator, then one day might look like:
indicating that I had two meetings. Notice there is no representation of the "snowboarding" feature (which highly correlated with good days) since it didn't occur on this day.
Thus, if you have event data with 0 valued features, simply remove those features when you construct your event space.
Take a look at the samples since they go through this in much more detail. Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi is there some file format for entering sparse
feature arrays? I have training data with 13K features most of them are zeros.
Craig
Hi,
The default formats (such as the ones shown in the sample data) are designed to be sparse with zero valued features not being explicitly represented. Any feature which is listed is considered to occur once and features which occur more than once in an event are simply listed as many times as they occur. For instance if I'm build a "how good was my day classifier" and I think the number of meetings I have is a good indicator, then one day might look like:
so-so: traffic meeting meeting coding no_traffic ice_cream
indicating that I had two meetings. Notice there is no representation of the "snowboarding" feature (which highly correlated with good days) since it didn't occur on this day.
Thus, if you have event data with 0 valued features, simply remove those features when you construct your event space.
Take a look at the samples since they go through this in much more detail. Hope this helps...Tom