You can subscribe to this list here.
| 2013 |
Jan
|
Feb
(4) |
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2015 |
Jan
(1) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Luca A. <luc...@ut...> - 2018-04-23 23:37:14
|
Good evening, I am a PhD student from Tartu University, Estonia. I started to use CLUS some days ago. The program work but I would like to use a dataset of time series. I saw in the manual an example of time series but the problem is that my dataset is a little bit different. I tried different possibilities but I have still encounter problems. Before asking more, I would like to understand if my way to create the arff file is correct. Herewith, a little fac-smile of the structure of my dataset: sasdate RPI INDPRO UNRATE 1/1/1959 11234 456 4 2/1/1959 11456 567 3,6 3/1/1959 11768 678 3,2 ..... ..... ..... ....... I was able to create the simple ARFF file and I tried to change the attribute in time series but I still have the problem. What do you think it should be the correct way to proceed? Thank you very much for your help. Best regards, Luca Alfieri - PhD student- Tartu University |
|
From: Carolin V. <a.c...@we...> - 2018-04-09 09:30:26
|
Hello, I have just downloaded Clus to use on my data. I have trouble running the software as I keep getting the error message that the arff file cannot be found (File not found: java.io.FileNotFoundException: 'weather.arff'). I have checked and the path is correct and there is no protection on the file. Would you have any suggestions what could be the issue? Thank you! Carolin (PhD student) |
|
From: Oladele I. Z. <zan...@gm...> - 2016-10-26 19:47:30
|
Dear, I have downloaded the package and decompressed it on my desktop. After running the code of installation in a terminal, i have got this message; Please give me a help MacBook-Pro-de-IBILOLA:~ IBILOLA$ java -jar$Volumes/Macintosh HD/Users/IBILOLA/Bureau/Clus/Clus.jar weather.s Unrecognized option: -jar/Macintosh Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Best regards |
|
From: 寂. <944...@qq...> - 2016-08-07 10:14:22
|
To Whom It May Concern,
Firstly, thank all of you very much in advance. I have a multi-label classification task and I'm trying using Clus to deal with it. But, first the problem is that the Clus manual doesn’t contain all features, and I don't know whether the output model is a RF-PCT model or a PCT model with no ensemble. I indeed choose the RF model in the setting file but the output model looks like a singe PCT tree. The second problem is that can I get some details about the metrics to measure the output model? The only metric I see in the output file is ACCURACY. But if I want other metrics such as SubsetAccuracy, Hamming Loss, or something like that, what should I do in the setting file?Thanks in advance!Best regards,Zhimeng Luo
Here are my settings and output model:-------------------------------------------------------------------------------------------------------------SETTINGS:[General] RandomSeed = 0 [Data] File = IMDB-F1.arff TestSet = 0.333 PruneSet = None [Attributes] Target = 1-28 Weights = 1 [Model] MinimalWeight = 5 [Tree] FTest = [0.001,0.01] Heuristic = VarianceReduction [Ensemble] EnsembleMethod = RForest VotingType = Majority [Output] WritePredictions = {Train,Test} TrainErrors = Yes AllFoldErrors = y WriteCurves = Yes [Hierarchical] Type = TREE WType = ExpAvgParentWeight HSeparator = / WParam = 0.75-------------------------------------------------------------------------------------------------------------MODEL:Pruned Model ************ vg > 0.0 +--yes: [0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0] [548.0,715.0,759.0,637.0,774.0,662.0,763.0,674.0,776.0,444.0,776.0,772.0,673.0,776.0,659.0,776.0,776.0,701.0,506.0,776.0,774.0,769.0,696.0,775.0,558.0,772.0,757.0,776.0]: 776 +--no: [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] [76489.0,73927.0,71623.0,75068.0,77397.0,59131.0,77680.0,74596.0,79610.0,74957.0,79557.0,77055.0,72174.0,78986.0,76476.0,58173.0,79676.0,50653.0,72698.0,68521.0,77784.0,78120.0,74474.0,79437.0,76679.0,79528.0,78518.0,78292.0]: 79877
------------------------------------------------------------------------------------------------------------- |
|
From: Hendrik B. <hen...@cs...> - 2015-10-07 09:59:38
|
Dear Lea, I'm not sure if Clus can produce this output. It is possible that the tree shown in the paper has been edited for readability. I can't say for sure though, it's too long ago. If someone has different information, please cc me on that. Best regards, Hendrik On 29 Sep 2015, at 12:40, Lea Brohsonn <Lea...@ru...> wrote: > Dear Clus group, > > unfortunately the clus manual doesn’t contain all features. I am trying > to get an output that is similar to the one shown in Table 5 in the > paper ‘Relational Ranking with Predictive Clustering Trees‘ written by > Dzeroski, Todorovski and Blockeel. How do I get an output in form of a > greater relation (eg. C50 < ripper < ltree < C50boost)? > > Thanks in advance for your help! > > > ------------------------------------------------------------------------------ > _______________________________________________ > Clus-general mailing list > Clu...@li... > https://lists.sourceforge.net/lists/listinfo/clus-general > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm |
|
From: Lea B. <Lea...@ru...> - 2015-09-29 10:40:58
|
Dear Clus group, unfortunately the clus manual doesn’t contain all features. I am trying to get an output that is similar to the one shown in Table 5 in the paper ‘Relational Ranking with Predictive Clustering Trees‘ written by Dzeroski, Todorovski and Blockeel. How do I get an output in form of a greater relation (eg. C50 < ripper < ltree < C50boost)? Thanks in advance for your help! |
|
From: Nair b. N. Y. <nai...@ya...> - 2015-03-09 14:35:27
|
Dear Clus group, I am currently looking for one or a couple of multi-label datasets of a very large size in terms of both attributes and labels. A dataset of thousands attributes and thousands labels can be sufficient. Do you know any dataset with these characteristics ? Thanks in advance, Noureddine Yassine NAIR BENREKIAPhD student at Orange Labs, France |
|
From: Nair b. N. Y. <nai...@ya...> - 2015-02-13 10:54:25
|
Dear, I have a multi-label regression problem and I want to use RF-PCT (random forest of clustering trees) for that. Actually, I only have tried RF-PCT for classification problems, using the following setting file and this command line: java -jar Clus.jar -forest data.s . So what do I need to change or add to run random forest of regression trees instead ? surely, the voting type and what else ? Thanks in advanceNoureddine Yassine NAIR BENREKIAPhD student ----------------------------------- [Data]File = data_train.arffTestSet = data_test.arff [Attributes] Target = 1-20 [Tree]Heuristic = VarianceReduction [Output]AllFoldModels = NoAllFoldErrors = YesWritePredictions = Test [Ensemble]Iterations = 100EnsembleMethod = RForestVotingType = ProbabilityDistributionSelectRandomSubspaces = 52Optimize = Yes |
|
From: Massimo G. <mas...@gm...> - 2015-01-10 14:28:17
|
Dear Sir, I'm writing to report a bug found while the output file FILE_NAME.test.pred.arff is created: in particular, if the name of an attribute is too long the name is concatenated to the type: example *AttributeName*: Rules-Pruned-pSub_Status=Awaiting_Assignment_Sub_Status=In_Progress *in ARFF FILE*: @ATTRIBUTE Rules-Pruned-pSub_Status=Awaiting_Assignment_Sub_Status=In_Progress*numeric* With best regards, Massimo Guarascio |
|
From: Nair b. N. Y. <nai...@ya...> - 2014-12-10 14:56:16
|
Hello, I have a question about the training computational complexity of the random forest of clustering trees (RF-PCT) algorithm. In the papers, the authors mention that it is around O(100 x n x m_i x log(n)) where 1. m_i: number of features selected at each node, representing 10% of the original number of features2. n: number of training examples However, it seems that they ignore here the number of labels. If we imagine having to learn from a large dataset with hundreds or thousands labels, how would you rewrite this training computational complexity ? is it O(100 x n x q x m_i x log(n)) where q is the number of labels ? Thanks ! NAIR BENREKIA Noureddine YassinePhD student at Orange Labs, Lannion (France |
|
From: Nair b. N. Y. <nai...@ya...> - 2014-01-10 16:36:28
|
Hi,
I am using Clus library for some experiments. I have evaluated RF-PCT (Random Forst Predective Clustering) but I am not able to interpret the obtained results without your help. I used Music dataset. The setting file (Music.s) and the output file (Music.test.pred.arff) are described below.
Thanks in advance,
Yacine
---------------------------------
Music.s :
[General]
RandomSeed = 1
[Data]
TestSet = Music_Test.arff
File = Music_Train.arff
XVal = 5
[Attributes]
Target = 1-6
[Model]
[Tree]
Heuristic = VarianceReduction
[Ensemble]
Iterations = 10
EnsembleMethod = RForest
VotingType = ProbabilityDistribution
[Output]
WritePredictions = {Test}
-------------------------------------
Music.test.pred.arff
@RELATION 'Music: -C 6-predictions'
@ATTRIBUTE amazed-suprised {1,0}
@ATTRIBUTE happy-pleased {1,0}
@ATTRIBUTE relaxing-clam {1,0}
@ATTRIBUTE quiet-still {1,0}
@ATTRIBUTE sad-lonely {1,0}
@ATTRIBUTE angry-aggresive {1,0}
@ATTRIBUTE Original-p-amazed-suprised {1,0}
@ATTRIBUTE Original-p-happy-pleased {1,0}
@ATTRIBUTE Original-p-relaxing-clam {1,0}
@ATTRIBUTE Original-p-quiet-still {1,0}
@ATTRIBUTE Original-p-sad-lonely {1,0}
@ATTRIBUTE Original-p-angry-aggresive {1,0}
@ATTRIBUTE Original-p-amazed-suprised-1 numeric
@ATTRIBUTE Original-p-amazed-suprised-0 numeric
@ATTRIBUTE Original-p-happy-pleased-1 numeric
@ATTRIBUTE Original-p-happy-pleased-0 numeric
@ATTRIBUTE Original-p-relaxing-clam-1 numeric
@ATTRIBUTE Original-p-relaxing-clam-0 numeric
@ATTRIBUTE Original-p-quiet-still-1 numeric
@ATTRIBUTE Original-p-quiet-still-0 numeric
@ATTRIBUTE Original-p-sad-lonely-1 numeric
@ATTRIBUTE Original-p-sad-lonely-0 numeric
@ATTRIBUTE Original-p-angry-aggresive-1 numeric
@ATTRIBUTE Original-p-angry-aggresive-0 numeric
@ATTRIBUTE Original-models string
@ATTRIBUTE Pruned-p-amazed-suprised {1,0}
@ATTRIBUTE Pruned-p-happy-pleased {1,0}
@ATTRIBUTE Pruned-p-relaxing-clam {1,0}
@ATTRIBUTE Pruned-p-quiet-still {1,0}
@ATTRIBUTE Pruned-p-sad-lonely {1,0}
@ATTRIBUTE Pruned-p-angry-aggresive {1,0}
@ATTRIBUTE Pruned-p-amazed-suprised-1 numeric
@ATTRIBUTE Pruned-p-amazed-suprised-0 numeric
@ATTRIBUTE Pruned-p-happy-pleased-1 numeric
@ATTRIBUTE Pruned-p-happy-pleased-0 numeric
@ATTRIBUTE Pruned-p-relaxing-clam-1 numeric
@ATTRIBUTE Pruned-p-relaxing-clam-0 numeric
@ATTRIBUTE Pruned-p-quiet-still-1 numeric
@ATTRIBUTE Pruned-p-quiet-still-0 numeric
@ATTRIBUTE Pruned-p-sad-lonely-1 numeric
@ATTRIBUTE Pruned-p-sad-lonely-0 numeric
@ATTRIBUTE Pruned-p-angry-aggresive-1 numeric
@ATTRIBUTE Pruned-p-angry-aggresive-0 numeric
@ATTRIBUTE Pruned-models string
@DATA
0,1,1,0,0,0,0,1,1,0,0,0,0.0,2.0,1.0,1.0,2.0,0.0,0.0,2.0,0.0,2.0,0.0,2.0,"135",0,1,1,0,0,0,2.0,26.0,19.0,9.0,25.0,3.0,2.0,26.0,1.0,27.0,1.0,27.0,"22"
1,0,0,0,0,1,0,1,0,0,1,1,0.0,2.0,1.0,1.0,0.0,2.0,0.0,2.0,1.0,1.0,1.0,1.0,"99",1,0,0,0,0,1,60.0,24.0,12.0,72.0,0.0,84.0,0.0,84.0,5.0,79.0,63.0,21.0,"16"
0,1,0,0,0,1,1,0,0,0,0,1,18.0,0.0,0.0,18.0,0.0,18.0,0.0,18.0,0.0,18.0,18.0,0.0,"89",1,0,0,0,0,1,60.0,24.0,12.0,72.0,0.0,84.0,0.0,84.0,5.0,79.0,63.0,21.0,"16"
0,0,1,0,0,0,0,1,1,0,0,0,0.0,13.0,12.0,1.0,12.0,1.0,0.0,13.0,0.0,13.0,0.0,13.0,"33",0,0,1,0,0,0,10.0,82.0,34.0,58.0,74.0,18.0,17.0,75.0,23.0,69.0,10.0,82.0,"5"
.
.
.
Le Mardi 22 octobre 2013 15h18, Denny Verbeeck <Den...@cs...> a écrit :
Dear Yacine,
Thank you for your interest in Clus! Clus is designed to be run from the command line, without the need to modify the source files. In general, you provide a settings file (e.g. settings.s), and then you can run Clus at the command line with the following command:
java -jar Clus.jar settings.s
In the manual section 4 is dedicated to this settings file, where you will find the options for selecting a dataset, the target attributes, the ensemble method etc. The other sections in the manual provide examples on how to run Clus and additional command line parameters that can be of interest (e.g. you have to specify the -forest option as a command line parameter, and have an [Ensemble] section in the settings file in order to run ensembles).
Kind regards,
Denny Verbeeck
On Oct 17, 2013, at 18:26 , Nair benrekia Noureddine Yacine wrote:
>
>
>
>Hi Denny,
>
>
>I am looking to evaluate a Random Forst Predictive Clustering Tree (RF-PCT) and a Bagging of PCT on some multi-label datasets and for that i need your help please.
>
>
>Below, a part of the main class (Clus.java). The vector args includes the required parameters for running an algorithm. It is also possible to fill in the vector right after (in green color) but which are the entries of the vector necessary to be able to run RF-PCT and Bagging of PCT ?
>
>
>
>
>Thanks so much,
>
>
>Yacine NAIR BENREKIA
>
>
>
>
>public static void main(String[] args) {
>try {
>args[0] = "";
>args[1] = "";
>args[2] = "";
> ...
>ClusOutput.printHeader();
>Clus clus = new Clus();
>Settings sett = clus.getSettings();
>CMDLineArgs cargs = new CMDLineArgs(clus);
>cargs.process(args);
>if (cargs.hasOption("copying")) {
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more information. |
|
From: Madeleine S. <mad...@in...> - 2013-10-21 11:28:37
|
Hi, I tried to run Clus using ensembles instead of a single tree. Using random forests as ensemble method, I noticed that this setting does not seem to work in combination with pruning. The output only contains two models: the default model and the original model, but no pruning model. My setting file looks as follows: [General] Verbose = 0 [Attributes] Descriptive = 1-242 Target = 243-282 Clustering = 243-282 [Tree] Heuristic = VarianceReduction PruningMethod = C4.5 [Ensemble] Iterations = 10 EnsembleMethod = RForest [Data] File = train.arff TestSet = test.arff Did I miss anything? With regards to the single tree case, are there pruning methods other than C4.5 that work in combination with classification? Thank you and best regards, Madeleine -- Dipl.-Wirt.-Inf. Madeleine Seeland Technische Universität München Institut für Informatik Lehrstuhl I12 - Bioinformatik Boltzmannstr. 3 85748 Garching b. München, Germany Room: MI 01.09.040 Phone: +49 (89) 289-19443 Email: mad...@in... Web: http://wwwkramer.in.tum.de/people/seeland |
|
From: Hendrik B. <hen...@cs...> - 2013-08-20 18:18:10
|
Dear Martin, I don't know if anyone has answered you yet. I can't answer the pruning question. But regarding your question about learning with missing labels: - Learning from data with lots of missing labels is related to "semi-supervised learning". - Most methods for semi-supervised learning essentially use the information that is in the distribution of the unlabeled instances to get improved performance. - Such a setting can be mimicked in Clus by using VarianceReduction with variance being defined as a mix of variance in the predictor space and in the target space. I believe there is a setting in Clus that allows you to list the attributes that should be used for the variance computation. By including non-target attributes here, Clus will behave more like a semi-supervised learner, which means it may work better on your data. No guarantee though - this much depends on the data. Hope this helps. Best, Hendrik On 15 Aug 2013, at 12:17, Martin Guetlein wrote: > Hi, > > First of all: hello to everybody on the list, and thanks to the > authors of Clus to provide such a nice tool. > > We are using Clus as one possible MLC method, applied to our > toxicological dataset. The data has around 700 instances, around 25 > labels and lots (!) of missing labels. > > Unfortunately Clus (and the other MLC) methods do not perform as good > as we would like to (Clus predictions have only around 60 percent > accuracy and auc). > > Are there any settings especially suited for datasets with many > missing labels? So far, I tried using GainRatio instead of > VarianceReduction as heuristic. > > I have disabled pruning, and it showed no effect (identical results > with Pruning=C4.5). Is there an explanation for this? > > Thanks and kind regards, > Martin > > > P.S.: > I have adjusted the Clus source code to commons-math3-3.2 and > weka-3-7-6, if s.o. is interested, I can share my changes > > > -- > Dipl-Inf. Martin Gütlein > Phone: > +49 (0)761 203 8442 (office) > +49 (0)177 623 9499 (mobile) > Email: > gue...@in... > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Clus-general mailing list > Clu...@li... > https://lists.sourceforge.net/lists/listinfo/clus-general Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm |
|
From: Martin G. <mar...@gm...> - 2013-08-15 10:18:00
|
Hi, First of all: hello to everybody on the list, and thanks to the authors of Clus to provide such a nice tool. We are using Clus as one possible MLC method, applied to our toxicological dataset. The data has around 700 instances, around 25 labels and lots (!) of missing labels. Unfortunately Clus (and the other MLC) methods do not perform as good as we would like to (Clus predictions have only around 60 percent accuracy and auc). Are there any settings especially suited for datasets with many missing labels? So far, I tried using GainRatio instead of VarianceReduction as heuristic. I have disabled pruning, and it showed no effect (identical results with Pruning=C4.5). Is there an explanation for this? Thanks and kind regards, Martin P.S.: I have adjusted the Clus source code to commons-math3-3.2 and weka-3-7-6, if s.o. is interested, I can share my changes -- Dipl-Inf. Martin Gütlein Phone: +49 (0)761 203 8442 (office) +49 (0)177 623 9499 (mobile) Email: gue...@in... |
|
From: Jan S. <jan...@st...> - 2013-03-01 07:39:39
|
Dear Pascal,
I'm not sure if I follow your explanation below. Can you send (part of)
a data set that does not work with Clus? Then I'll try to fix this bug
and make a new version available.
Best Regards,
Jan
On Feb 27, 2013 13:35 "Pascal Brandt" <psb...@gm...> wrote:
> Hi Bernard,
>
>
>
> I wrote the script below [code segment 1] to convert my ARFF file from
> 0-based to 1-based, but I was still having problems because my label
> attributes were not being read properly. I have 11 categorical labels
> with the possible values {0, 1}. Weka generates a sparse ARFF file
> where only the 1 values are actually written to file.
>
>
>
>
> The only way I could get Clus to work properly was to convert my
> dataset to non-sparse before exporting my ARFF file [code segment 2].
>
>
>
>
> I hope this helps someone else out in the future.
>
>
>
>
> Ciao,
>
> Pascal
>
>
>
>
> p.s. Is there any documentation regarding the information that gets
> dumped to the console when generating trees? It seems only the
> contents of the output file are (somewhat) documented?
>
>
>
>
> [code segment 1]
>
> #!/bin/bash
>
>
>
> cat $1 | gawk '
>
> BEGIN { FS = ","; found_data="FALSE" }; {
>
> if(found_data == "FALSE") {
>
> print $0
>
> if($1 == "@data")
>
> found_data="TRUE"
>
> } else {
>
> for (i = 1; i <= NF; i++) {
>
> matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
>
> matched_attr_index = strtonum(matched_attr_index_str)
>
> matched_attr_index++
>
> new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
>
> printf new_str
>
> if(i == NF) {
>
> print ""
>
> } else {
>
> printf ","
>
> }
>
> }
>
> }
>
> }'
>
>
>
>
> [code segment 2]
> Instances newData = null;
>
> try {
> SparseToNonSparse stns = new SparseToNonSparse(); // new instance of
> filter
> stns.setInputFormat(trainingData); // inform filter about dataset
> newData = Filter.useFilter(trainingData, stns); // apply filter
> } catch (Exception e) {
> <http://logger.info>("Error converting from sparse to non-sparse: " +
> e.getMessage()); }
>
>
>
>
> On 27 February 2013 14:25, Bernard Zenko <<ber...@ij...>>
> wrote:
>
> > Dear Pascal,
> >
> > many thanks for this bug report! At the moment, we're not using any
> > bug tracking system, so clus-devel mailing list is the right address
> > report bugs.
> >
> > Regards, Bernard
> >
> >
> >
> > On 26.2.13 12:12, Pascal Brandt wrote:
> >
> >
> > > Hi,
> > >
> > > Firstly, my apologies if I'm directing this email to the wrong
> > > audience.
> > > I've just tried to use Clus with a sparse ARFF file and have seen
> > > that
> > > it uses a 1-based indexing system for the attributes as opposed to
> > > the
> > > 0-based system defined here
> > >
> > > <<http://weka.wikispaces.com/ARFF+%28book+version%29>>. If there's
> > > a
> > >
> > > issue/bug tracking system used to manage development of this
> > > project I'd
> > > be happy to log a bug for this.
> > >
> > > Regards,
> > > Pascal
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > ------------
> > > Everyone hates slow websites. So do we.
> > > Make your web apps faster with AppDynamics
> > > Download AppDynamics Lite for free today:
> > > <http://p.sf.net/sfu/appdyn_d2d_feb>
> > >
> > >
> > >
> > > _______________________________________________
> > > Clus-devel mailing list
> > > <Clu...@li...>
> > > <https://lists.sourceforge.net/lists/listinfo/clus-devel>
> > >
> > >
>
> |
|
From: Pascal B. <psb...@gm...> - 2013-02-28 13:20:14
|
Hi Jan,
Unfortunately I can't send my dataset. The problem I had first was that
ARFF uses 0-based indices for attributes in a sparse dataset, but Clus
expects 1-based. The second problem I had was that my labels were not being
read properly. I was using categorical labels with possible values in {0,
1} and only the 1 values were actually explicitly written in the ARFF file.
For some reason, Clus was detecting all my labels as having a value of 1.
The only way I was able to resolve the issues was to not use a sparse
dataset.
Regards,
Pascal
On 28 February 2013 14:26, Jan Struyf <jan...@st...> wrote:
> Dear Pascal,
>
> I'm not sure if I follow your explanation below. Can you send (part of) a
> data set that does not work with Clus? Then I'll try to fix this bug and
> make a new version available.
>
> Best Regards,
>
> Jan
>
>
> On Feb 27, 2013 13:35 "Pascal Brandt" <psb...@gm...><psb...@gm...>wrote:
>
> Hi Bernard,
>
> I wrote the script below [code segment 1] to convert my ARFF file from
> 0-based to 1-based, but I was still having problems because my label
> attributes were not being read properly. I have 11 categorical labels with
> the possible values {0, 1}. Weka generates a sparse ARFF file where only
> the 1 values are actually written to file.
>
> The only way I could get Clus to work properly was to convert my dataset
> to non-sparse before exporting my ARFF file [code segment 2].
>
> I hope this helps someone else out in the future.
>
> Ciao,
> Pascal
>
> p.s. Is there any documentation regarding the information that gets dumped
> to the console when generating trees? It seems only the contents of the
> output file are (somewhat) documented?
>
> [code segment 1]
> #!/bin/bash
>
> cat $1 | gawk '
> BEGIN { FS = ","; found_data="FALSE" }; {
> if(found_data == "FALSE") {
> print $0
> if($1 == "@data")
> found_data="TRUE"
> } else {
> for (i = 1; i <= NF; i++) {
> matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
> matched_attr_index = strtonum(matched_attr_index_str)
> matched_attr_index++
> new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
> printf new_str
> if(i == NF) {
> print ""
> } else {
> printf ","
> }
> }
> }
> }'
>
> [code segment 2]
> Instances newData = null;
>
> try {
> SparseToNonSparse stns = new SparseToNonSparse(); // new
> instance of filter
> stns.setInputFormat(trainingData); // inform
> filter about dataset
> newData = Filter.useFilter(trainingData, stns); // apply filter
> } catch (Exception e) {
> logger.info("Error converting from sparse to non-sparse: " +
> e.getMessage());
> }
>
>
> On 27 February 2013 14:25, Bernard Zenko <ber...@ij...> wrote:
>
>> Dear Pascal,
>>
>> many thanks for this bug report! At the moment, we're not using any bug
>> tracking system, so clus-devel mailing list is the right address report
>> bugs.
>>
>> Regards, Bernard
>>
>>
>>
>> On 26.2.13 12:12, Pascal Brandt wrote:
>>
>>> Hi,
>>>
>>> Firstly, my apologies if I'm directing this email to the wrong audience.
>>> I've just tried to use Clus with a sparse ARFF file and have seen that
>>> it uses a 1-based indexing system for the attributes as opposed to the
>>> 0-based system defined here
>>> <http://weka.wikispaces.com/**ARFF+%28book+version%29<http://weka.wikispaces.com/ARFF+%28book+version%29>>.
>>> If there's a
>>>
>>> issue/bug tracking system used to manage development of this project I'd
>>> be happy to log a bug for this.
>>>
>>> Regards,
>>> Pascal
>>>
>>>
>>> ------------------------------**------------------------------**
>>> ------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_**d2d_feb<http://p.sf.net/sfu/appdyn_d2d_feb>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Clus-devel mailing list
>>> Clu...@li....**net <Clu...@li...>
>>> https://lists.sourceforge.net/**lists/listinfo/clus-devel<https://lists.sourceforge.net/lists/listinfo/clus-devel>
>>>
>>>
>
|
|
From: Pascal B. <psb...@gm...> - 2013-02-27 12:36:48
|
Hi Bernard,
I wrote the script below [code segment 1] to convert my ARFF file from
0-based to 1-based, but I was still having problems because my label
attributes were not being read properly. I have 11 categorical labels with
the possible values {0, 1}. Weka generates a sparse ARFF file where only
the 1 values are actually written to file.
The only way I could get Clus to work properly was to convert my dataset to
non-sparse before exporting my ARFF file [code segment 2].
I hope this helps someone else out in the future.
Ciao,
Pascal
p.s. Is there any documentation regarding the information that gets dumped
to the console when generating trees? It seems only the contents of the
output file are (somewhat) documented?
[code segment 1]
#!/bin/bash
cat $1 | gawk '
BEGIN { FS = ","; found_data="FALSE" }; {
if(found_data == "FALSE") {
print $0
if($1 == "@data")
found_data="TRUE"
} else {
for (i = 1; i <= NF; i++) {
matched_attr_index_str = gensub(/([0-9]+)/, "\\1", 1, $i)
matched_attr_index = strtonum(matched_attr_index_str)
matched_attr_index++
new_str = gensub(/([0-9]+)/, matched_attr_index, 1, $i)
printf new_str
if(i == NF) {
print ""
} else {
printf ","
}
}
}
}'
[code segment 2]
Instances newData = null;
try {
SparseToNonSparse stns = new SparseToNonSparse(); // new instance
of filter
stns.setInputFormat(trainingData); // inform
filter about dataset
newData = Filter.useFilter(trainingData, stns); // apply filter
} catch (Exception e) {
logger.info("Error converting from sparse to non-sparse: " +
e.getMessage());
}
On 27 February 2013 14:25, Bernard Zenko <ber...@ij...> wrote:
> Dear Pascal,
>
> many thanks for this bug report! At the moment, we're not using any bug
> tracking system, so clus-devel mailing list is the right address report
> bugs.
>
> Regards, Bernard
>
>
>
> On 26.2.13 12:12, Pascal Brandt wrote:
>
>> Hi,
>>
>> Firstly, my apologies if I'm directing this email to the wrong audience.
>> I've just tried to use Clus with a sparse ARFF file and have seen that
>> it uses a 1-based indexing system for the attributes as opposed to the
>> 0-based system defined here
>> <http://weka.wikispaces.com/**ARFF+%28book+version%29<http://weka.wikispaces.com/ARFF+%28book+version%29>>.
>> If there's a
>>
>> issue/bug tracking system used to manage development of this project I'd
>> be happy to log a bug for this.
>>
>> Regards,
>> Pascal
>>
>>
>> ------------------------------**------------------------------**
>> ------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_**d2d_feb <http://p.sf.net/sfu/appdyn_d2d_feb>
>>
>>
>>
>> ______________________________**_________________
>> Clus-devel mailing list
>> Clu...@li....**net <Clu...@li...>
>> https://lists.sourceforge.net/**lists/listinfo/clus-devel<https://lists.sourceforge.net/lists/listinfo/clus-devel>
>>
>>
|
|
From: Bernard Z. <ber...@ij...> - 2013-02-27 12:28:01
|
Dear Pascal, many thanks for this bug report! At the moment, we're not using any bug tracking system, so clus-devel mailing list is the right address report bugs. Regards, Bernard On 26.2.13 12:12, Pascal Brandt wrote: > Hi, > > Firstly, my apologies if I'm directing this email to the wrong audience. > I've just tried to use Clus with a sparse ARFF file and have seen that > it uses a 1-based indexing system for the attributes as opposed to the > 0-based system defined here > <http://weka.wikispaces.com/ARFF+%28book+version%29>. If there's a > issue/bug tracking system used to manage development of this project I'd > be happy to log a bug for this. > > Regards, > Pascal > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > > > > _______________________________________________ > Clus-devel mailing list > Clu...@li... > https://lists.sourceforge.net/lists/listinfo/clus-devel > |
|
From: Pascal B. <psb...@gm...> - 2013-02-26 11:14:28
|
Hi, Firstly, my apologies if I'm directing this email to the wrong audience. I've just tried to use Clus with a sparse ARFF file and have seen that it uses a 1-based indexing system for the attributes as opposed to the 0-based system defined here <http://weka.wikispaces.com/ARFF+%28book+version%29>. If there's a issue/bug tracking system used to manage development of this project I'd be happy to log a bug for this. Regards, Pascal |