I have a question regarding the formatting of a sparse matrix file and the labels:
What should it be?
The input features I use are ngrams derived from some strings. So far I tried the following formats:
1. A binary representation (0,1) of the ngrams (where column denotes ngram and row data instance) without any extra information.
2. A sparse unique integer representation (where the integer indicates the presence of a particular ngram) without any extra information.
In both cases the labels are formatted according to the arff format (with just the label attribute).
In both cases I get "Not an obj" error.
Should the sparse matrix file include some extra information (e.g. formatted according to arff)? Or should the label file include the extra information?
Jorn
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Each row specifies a column-index followed by a value. Most of the
tools in the waffles_sparse application expect the input features
to be provided in this JSON-based sparse format, and the output
labels to be provided in ARFF format (as a dense matrix).
I have not really done much work with sparse matrices, so our tools
and documentation in this area are in need of much improvement.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2012-12-29
Hi,
Thanks for the quick answer! Now I know how to rewrite my data...
Regards,
Jorn
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have a question regarding the formatting of a sparse matrix file and the labels:
What should it be?
The input features I use are ngrams derived from some strings. So far I tried the following formats:
1. A binary representation (0,1) of the ngrams (where column denotes ngram and row data instance) without any extra information.
2. A sparse unique integer representation (where the integer indicates the presence of a particular ngram) without any extra information.
In both cases the labels are formatted according to the arff format (with just the label attribute).
In both cases I get "Not an obj" error.
Should the sparse matrix file include some extra information (e.g. formatted according to arff)? Or should the label file include the extra information?
Jorn
Currently, we use a JSON-based format for sparse matrices.
For example, this 4x3 matrix
would be encoded in sparse format as
Each row specifies a column-index followed by a value. Most of the
tools in the waffles_sparse application expect the input features
to be provided in this JSON-based sparse format, and the output
labels to be provided in ARFF format (as a dense matrix).
I have not really done much work with sparse matrices, so our tools
and documentation in this area are in need of much improvement.
Hi,
Thanks for the quick answer! Now I know how to rewrite my data...
Regards,
Jorn