Re: [Clus-general] [Clus-devel] Clus sparse ARFF file implementation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jan,

Unfortunately I can't send my dataset. The problem I had first was that
ARFF uses 0-based indices for attributes in a sparse dataset, but Clus
expects 1-based. The second problem I had was that my labels were not being
read properly. I was using categorical labels with possible values in {0,
1} and only the 1 values were actually explicitly written in the ARFF file.
For some reason, Clus was detecting all my labels as having a value of 1.

The only way I was able to resolve the issues was to not use a sparse
dataset.

Regards,
Pascal

On 28 February 2013 14:26, Jan Struyf <jan...@st...> wrote:

> Dear Pascal,
>
> I'm not sure if I follow your explanation below. Can you send (part of) a
> data set that does not work with Clus? Then I'll try to fix this bug and
> make a new version available.
>
> Best Regards,
>
> Jan
>
>
> On Feb 27, 2013 13:35 "Pascal Brandt" <psb...@gm...><psb...@gm...>wrote:
>
> Hi Bernard,
>
> I wrote the script below [code segment 1] to convert my ARFF file from
> 0-based to 1-based, but I was still having problems because my label
> attributes were not being read properly. I have 11 categorical labels with
> the possible values {0, 1}. Weka generates a sparse ARFF file where only
> the 1 values are actually written to file.
>
> The only way I could get Clus to work properly was to convert my dataset
> to non-sparse before exporting my ARFF file [code segment 2].
>
> I hope this helps someone else out in the future.
>
> Ciao,
> Pascal
>
> p.s. Is there any documentation regarding the information that gets dumped
> to the console when generating trees? It seems only the contents of the
> output file are (somewhat) documented?
>
> [code segment 1]
> #!/bin/bash
>
> cat $1 | gawk '
>  BEGIN { FS = ","; found_data="FALSE" }; {
> if(found_data == "FALSE") {
>  print $0
> if($1 == "@data")
>  found_data="TRUE"
> } else {
>  for (i = 1; i <= NF; i++) {
> matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
>  matched_attr_index = strtonum(matched_attr_index_str)
> matched_attr_index++
>  new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
>                 printf new_str
>                 if(i == NF) {
>                 print ""
>                 } else {
>                 printf ","
>                 }
>         }
> }
> }'
>
> [code segment 2]
> Instances newData = null;
>
> try {
>         SparseToNonSparse stns = new SparseToNonSparse();   // new
> instance of filter
>         stns.setInputFormat(trainingData);                  // inform
> filter about dataset
>         newData = Filter.useFilter(trainingData, stns);   // apply filter
> } catch (Exception e) {
>         logger.info("Error converting from sparse to non-sparse: " +
> e.getMessage());
> }
>
>
> On 27 February 2013 14:25, Bernard Zenko <ber...@ij...> wrote:
>
>> Dear Pascal,
>>
>> many thanks for this bug report! At the moment, we're not using any bug
>> tracking system, so clus-devel mailing list is the right address report
>> bugs.
>>
>> Regards, Bernard
>>
>>
>>
>> On 26.2.13 12:12, Pascal Brandt wrote:
>>
>>> Hi,
>>>
>>> Firstly, my apologies if I'm directing this email to the wrong audience.
>>> I've just tried to use Clus with a sparse ARFF file and have seen that
>>> it uses a 1-based indexing system for the attributes as opposed to the
>>> 0-based system defined here
>>> <http://weka.wikispaces.com/**ARFF+%28book+version%29<http://weka.wikispaces.com/ARFF+%28book+version%29>>.
>>> If there's a
>>>
>>> issue/bug tracking system used to manage development of this project I'd
>>> be happy to log a bug for this.
>>>
>>> Regards,
>>> Pascal
>>>
>>>
>>> ------------------------------**------------------------------**
>>> ------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_**d2d_feb<http://p.sf.net/sfu/appdyn_d2d_feb>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Clus-devel mailing list
>>> Clu...@li....**net <Clu...@li...>
>>> https://lists.sourceforge.net/**lists/listinfo/clus-devel<https://lists.sourceforge.net/lists/listinfo/clus-devel>
>>>
>>>
>