Sparse representation format

Help
Anonymous
2012-12-29
2012-12-30

  • Anonymous
    2012-12-29

    Hi,

    I have a question regarding the formatting of a sparse matrix file and the labels:
    What should it be?
    The input features I use are ngrams derived from some strings. So far I tried the following formats:
    1. A binary representation (0,1) of the ngrams (where column denotes ngram and row data instance) without any extra information.
    2. A sparse unique integer representation (where the integer indicates the presence of a particular ngram) without any extra information.
    In both cases the labels are formatted according to the arff format (with just the label attribute).
    In both cases I get "Not an obj" error.

    Should the sparse matrix file include some extra information (e.g. formatted according to arff)? Or should the label file include the extra information?

    Jorn

     
  • Mike Gashler
    Mike Gashler
    2012-12-29

    Currently, we use a JSON-based format for sparse matrices.
    For example, this 4x3 matrix

    1.1  0  2.2
     0  3.3  0
     0   0   0
     0   0   7
    

    would be encoded in sparse format as

    {
     "def":0,
     "cols":3,
     "rows":
     [
      [0,1.1,2,2.2],
      [1,3.3],
      [],
      [2,7]
     ]
    }
    

    Each row specifies a column-index followed by a value. Most of the
    tools in the waffles_sparse application expect the input features
    to be provided in this JSON-based sparse format, and the output
    labels to be provided in ARFF format (as a dense matrix).

    I have not really done much work with sparse matrices, so our tools
    and documentation in this area are in need of much improvement.

     

  • Anonymous
    2012-12-29

    Hi,

    Thanks for the quick answer! Now I know how to rewrite my data...

    Regards,
    Jorn

     


Anonymous


Cancel   Add attachments