|
From: Pascal B. <psb...@gm...> - 2013-02-27 12:36:48
|
Hi Bernard,
I wrote the script below [code segment 1] to convert my ARFF file from
0-based to 1-based, but I was still having problems because my label
attributes were not being read properly. I have 11 categorical labels with
the possible values {0, 1}. Weka generates a sparse ARFF file where only
the 1 values are actually written to file.
The only way I could get Clus to work properly was to convert my dataset to
non-sparse before exporting my ARFF file [code segment 2].
I hope this helps someone else out in the future.
Ciao,
Pascal
p.s. Is there any documentation regarding the information that gets dumped
to the console when generating trees? It seems only the contents of the
output file are (somewhat) documented?
[code segment 1]
#!/bin/bash
cat $1 | gawk '
BEGIN { FS = ","; found_data="FALSE" }; {
if(found_data == "FALSE") {
print $0
if($1 == "@data")
found_data="TRUE"
} else {
for (i = 1; i <= NF; i++) {
matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
matched_attr_index = strtonum(matched_attr_index_str)
matched_attr_index++
new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
printf new_str
if(i == NF) {
print ""
} else {
printf ","
}
}
}
}'
[code segment 2]
Instances newData = null;
try {
SparseToNonSparse stns = new SparseToNonSparse(); // new instance
of filter
stns.setInputFormat(trainingData); // inform
filter about dataset
newData = Filter.useFilter(trainingData, stns); // apply filter
} catch (Exception e) {
logger.info("Error converting from sparse to non-sparse: " +
e.getMessage());
}
On 27 February 2013 14:25, Bernard Zenko <ber...@ij...> wrote:
> Dear Pascal,
>
> many thanks for this bug report! At the moment, we're not using any bug
> tracking system, so clus-devel mailing list is the right address report
> bugs.
>
> Regards, Bernard
>
>
>
> On 26.2.13 12:12, Pascal Brandt wrote:
>
>> Hi,
>>
>> Firstly, my apologies if I'm directing this email to the wrong audience.
>> I've just tried to use Clus with a sparse ARFF file and have seen that
>> it uses a 1-based indexing system for the attributes as opposed to the
>> 0-based system defined here
>> <http://weka.wikispaces.com/**ARFF+%28book+version%29<http://weka.wikispaces.com/ARFF+%28book+version%29>>.
>> If there's a
>>
>> issue/bug tracking system used to manage development of this project I'd
>> be happy to log a bug for this.
>>
>> Regards,
>> Pascal
>>
>>
>> ------------------------------**------------------------------**
>> ------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_**d2d_feb <http://p.sf.net/sfu/appdyn_d2d_feb>
>>
>>
>>
>> ______________________________**_________________
>> Clus-devel mailing list
>> Clu...@li....**net <Clu...@li...>
>> https://lists.sourceforge.net/**lists/listinfo/clus-devel<https://lists.sourceforge.net/lists/listinfo/clus-devel>
>>
>>
|