|
From: Pascal B. <psb...@gm...> - 2013-02-28 13:20:14
|
Hi Jan,
Unfortunately I can't send my dataset. The problem I had first was that
ARFF uses 0-based indices for attributes in a sparse dataset, but Clus
expects 1-based. The second problem I had was that my labels were not being
read properly. I was using categorical labels with possible values in {0,
1} and only the 1 values were actually explicitly written in the ARFF file.
For some reason, Clus was detecting all my labels as having a value of 1.
The only way I was able to resolve the issues was to not use a sparse
dataset.
Regards,
Pascal
On 28 February 2013 14:26, Jan Struyf <jan...@st...> wrote:
> Dear Pascal,
>
> I'm not sure if I follow your explanation below. Can you send (part of) a
> data set that does not work with Clus? Then I'll try to fix this bug and
> make a new version available.
>
> Best Regards,
>
> Jan
>
>
> On Feb 27, 2013 13:35 "Pascal Brandt" <psb...@gm...><psb...@gm...>wrote:
>
> Hi Bernard,
>
> I wrote the script below [code segment 1] to convert my ARFF file from
> 0-based to 1-based, but I was still having problems because my label
> attributes were not being read properly. I have 11 categorical labels with
> the possible values {0, 1}. Weka generates a sparse ARFF file where only
> the 1 values are actually written to file.
>
> The only way I could get Clus to work properly was to convert my dataset
> to non-sparse before exporting my ARFF file [code segment 2].
>
> I hope this helps someone else out in the future.
>
> Ciao,
> Pascal
>
> p.s. Is there any documentation regarding the information that gets dumped
> to the console when generating trees? It seems only the contents of the
> output file are (somewhat) documented?
>
> [code segment 1]
> #!/bin/bash
>
> cat $1 | gawk '
> BEGIN { FS = ","; found_data="FALSE" }; {
> if(found_data == "FALSE") {
> print $0
> if($1 == "@data")
> found_data="TRUE"
> } else {
> for (i = 1; i <= NF; i++) {
> matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
> matched_attr_index = strtonum(matched_attr_index_str)
> matched_attr_index++
> new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
> printf new_str
> if(i == NF) {
> print ""
> } else {
> printf ","
> }
> }
> }
> }'
>
> [code segment 2]
> Instances newData = null;
>
> try {
> SparseToNonSparse stns = new SparseToNonSparse(); // new
> instance of filter
> stns.setInputFormat(trainingData); // inform
> filter about dataset
> newData = Filter.useFilter(trainingData, stns); // apply filter
> } catch (Exception e) {
> logger.info("Error converting from sparse to non-sparse: " +
> e.getMessage());
> }
>
>
> On 27 February 2013 14:25, Bernard Zenko <ber...@ij...> wrote:
>
>> Dear Pascal,
>>
>> many thanks for this bug report! At the moment, we're not using any bug
>> tracking system, so clus-devel mailing list is the right address report
>> bugs.
>>
>> Regards, Bernard
>>
>>
>>
>> On 26.2.13 12:12, Pascal Brandt wrote:
>>
>>> Hi,
>>>
>>> Firstly, my apologies if I'm directing this email to the wrong audience.
>>> I've just tried to use Clus with a sparse ARFF file and have seen that
>>> it uses a 1-based indexing system for the attributes as opposed to the
>>> 0-based system defined here
>>> <http://weka.wikispaces.com/**ARFF+%28book+version%29<http://weka.wikispaces.com/ARFF+%28book+version%29>>.
>>> If there's a
>>>
>>> issue/bug tracking system used to manage development of this project I'd
>>> be happy to log a bug for this.
>>>
>>> Regards,
>>> Pascal
>>>
>>>
>>> ------------------------------**------------------------------**
>>> ------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_**d2d_feb<http://p.sf.net/sfu/appdyn_d2d_feb>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Clus-devel mailing list
>>> Clu...@li....**net <Clu...@li...>
>>> https://lists.sourceforge.net/**lists/listinfo/clus-devel<https://lists.sourceforge.net/lists/listinfo/clus-devel>
>>>
>>>
>
|