Menu

Sparse representation format

Help
Anonymous
2012-12-29
2012-12-30
  • Anonymous

    Anonymous - 2012-12-29

    Hi,

    I have a question regarding the formatting of a sparse matrix file and the labels:
    What should it be?
    The input features I use are ngrams derived from some strings. So far I tried the following formats:
    1. A binary representation (0,1) of the ngrams (where column denotes ngram and row data instance) without any extra information.
    2. A sparse unique integer representation (where the integer indicates the presence of a particular ngram) without any extra information.
    In both cases the labels are formatted according to the arff format (with just the label attribute).
    In both cases I get "Not an obj" error.

    Should the sparse matrix file include some extra information (e.g. formatted according to arff)? Or should the label file include the extra information?

    Jorn

     
  • Mike Gashler

    Mike Gashler - 2012-12-29

    Currently, we use a JSON-based format for sparse matrices.
    For example, this 4x3 matrix

    1.1  0  2.2
     0  3.3  0
     0   0   0
     0   0   7
    

    would be encoded in sparse format as

    {
     "def":0,
     "cols":3,
     "rows":
     [
      [0,1.1,2,2.2],
      [1,3.3],
      [],
      [2,7]
     ]
    }
    

    Each row specifies a column-index followed by a value. Most of the
    tools in the waffles_sparse application expect the input features
    to be provided in this JSON-based sparse format, and the output
    labels to be provided in ARFF format (as a dense matrix).

    I have not really done much work with sparse matrices, so our tools
    and documentation in this area are in need of much improvement.

     
  • Anonymous

    Anonymous - 2012-12-29

    Hi,

    Thanks for the quick answer! Now I know how to rewrite my data...

    Regards,
    Jorn

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.