I have a question regarding the formatting of a sparse matrix file and the labels:
What should it be?
The input features I use are ngrams derived from some strings. So far I tried the following formats:
1. A binary representation (0,1) of the ngrams (where column denotes ngram and row data instance) without any extra information.
2. A sparse unique integer representation (where the integer indicates the presence of a particular ngram) without any extra information.
In both cases the labels are formatted according to the arff format (with just the label attribute).
In both cases I get "Not an obj" error.
Should the sparse matrix file include some extra information (e.g. formatted according to arff)? Or should the label file include the extra information?
You seem to have CSS turned off.
Please don't fill out this field.
Currently, we use a JSON-based format for sparse matrices.
For example, this 4x3 matrix
1.1 0 2.2
0 3.3 0
0 0 0
0 0 7
would be encoded in sparse format as
Each row specifies a column-index followed by a value. Most of the
tools in the waffles_sparse application expect the input features
to be provided in this JSON-based sparse format, and the output
labels to be provided in ARFF format (as a dense matrix).
I have not really done much work with sparse matrices, so our tools
and documentation in this area are in need of much improvement.
Thanks for the quick answer! Now I know how to rewrite my data...