Can you give me a complete example how to train GDecisionTree withing program
using the array of floats and with one target attribute and how to predict
results with it? Im not sure what should i do because there are no examples on
the offsite and some things arent obvious to me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi. I've discovered Waffles a few days ago when I was looking for a decision
tree classifier implementation.
I could use decision trees trough your command line "learner" utility but I
didn't manage to understand your API well enough to use them from my code.
Your last example has been very helpful and I want to ask you a question about
it : what modifications should I do if I need to work with categorical
attributes ? and, if the target attribute is categorical ?
Thanks in advance! I would appreciate a lot more simple examples like this
explaining the API usage!
Regards from Argentina.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The GDecisionTree class automatically handles both continuous and nominal
attributes, so all you need to do is put nominal (categorical) attributes in
your data, and it should just work.
Often, people like to store their data in the text-based ARFF format. Here is
an example ARFF file:
@RELATION mydata
@ATTRIBUTE x { red, green, blue }
@ATTRIBUTE y continuous
@ATTRIBUTE z { true, false }
@DATA
red,3.4,true
green,1.0,false
red,2.9,true
blue,5.5,true
Then, you can load this data and train a decision tree with code like this:
GData* pData = GData::loadArff("mydata.arff");
Holder<gdata> hData(pData);</gdata>
GRand prng(0);
GDecisionTree tree(&prng);
tree.train(pData, 1);
The first line loads the data. The second line makes sure it gets deleted
later. The third line makes a pseudo-random number generator. The fourth line
makes a decision tree. The fifth line trains it.
You could also construct this same data in code like this:
(Sorry, I should have built that code before I posted it. To make it build,
you will need to replace "addAttribute" with "addAttr", and replace "pData"
with "&data".)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I also have a question about GDecissionTree class. I am writing a program
in which the tree will be drawn. After i get tree, how can i get information
about how it looks? I mean nodes, leaf, split function.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The waffles_plot tool has a command called "printdecisiontree" that will print
an ascii-representation of the decision tree model to the console. Here is an
example for how to use it:
...if you want to do it in code, instead of using the command-line tools, the
interface you want may not be implemented. If you take a look at how the
GDecisionTree::print method works, you will see where such an interface could
be added.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I just noticed that I never replied to this question. I'm sure it is
now too late, but I will answer in case someone else has the same question...
The value "0" is a seed for the pseudo-random number generator (PRNG). The
GDecisionTree class requires a reference to a PRNG because the use may specify
for it to make random divisions. If you use the tree with its default
parameters, it will not really use the PRNG that you supply to the constructor
(except maybe in obscure cases, like to break ties, etc.).
I suppose I could add a constructor that does not require this parameter, but
then the user would be limited to only use options that never require a PRNG
for any reason.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you give me a complete example how to train GDecisionTree withing program
using the array of floats and with one target attribute and how to predict
results with it? Im not sure what should i do because there are no examples on
the offsite and some things arent obvious to me.
Here is an example. Please let me know if this is not clear, or if you want it
to do something else.
// Train a decision tree
GData d(3); // cols=3
d.newRows(5); // rows=5
d = 1.4; d = 7.3; d = 4.0;
d = 2.4; d = 6.2; d = 4.0;
d = 5.1; d = 5.8; d = 5.0;
d = 2.4; d = 4.7; d = 5.0;
d = 0.2; d = 6.3; d = 4.0;
GRand prng(0);
GDecisionTree tree(&prng);
tree.train(&d, 1);
// Test it
double v;
v = 5.0; v = 5.7;
tree.predict(v, v + 2);
cout << "Prediction: " << v << "\n";
Hi. I've discovered Waffles a few days ago when I was looking for a decision
tree classifier implementation.
I could use decision trees trough your command line "learner" utility but I
didn't manage to understand your API well enough to use them from my code.
Your last example has been very helpful and I want to ask you a question about
it : what modifications should I do if I need to work with categorical
attributes ? and, if the target attribute is categorical ?
Thanks in advance! I would appreciate a lot more simple examples like this
explaining the API usage!
Regards from Argentina.
The GDecisionTree class automatically handles both continuous and nominal
attributes, so all you need to do is put nominal (categorical) attributes in
your data, and it should just work.
Often, people like to store their data in the text-based ARFF format. Here is
an example ARFF file:
@RELATION mydata
@ATTRIBUTE x { red, green, blue }
@ATTRIBUTE y continuous
@ATTRIBUTE z { true, false }
@DATA
red,3.4,true
green,1.0,false
red,2.9,true
blue,5.5,true
Then, you can load this data and train a decision tree with code like this:
GData* pData = GData::loadArff("mydata.arff");
Holder<gdata> hData(pData);</gdata>
GRand prng(0);
GDecisionTree tree(&prng);
tree.train(pData, 1);
The first line loads the data. The second line makes sure it gets deleted
later. The third line makes a pseudo-random number generator. The fourth line
makes a decision tree. The fifth line trains it.
You could also construct this same data in code like this:
GMixedRelation* pRel = new GMixedRelation();
pRel->addAttribute(3); // 3 categories
pRel->addAttribute(0); // continuous
pRel->addAttribute(2); // 2 categories
sp_relation spRel = pRel;
GData data(spRel);
data.newRows(4);
data.row(0) = 0; data.row(0) = 3.4; data.row(0) = 0;
data.row(1) = 1; data.row(1) = 1.0; data.row(1) = 1;
data.row(2) = 0; data.row(2) = 2.9; data.row(2) = 0;
data.row(3) = 2; data.row(3) = 5.5; data.row(3) = 0;
GRand prng(0);
GDecisionTree tree(&prng);
tree.train(pData, 1);
(Sorry, I should have built that code before I posted it. To make it build,
you will need to replace "addAttribute" with "addAttr", and replace "pData"
with "&data".)
Hi, I also have a question about GDecissionTree class. I am writing a program
in which the tree will be drawn. After i get tree, how can i get information
about how it looks? I mean nodes, leaf, split function.
The waffles_plot tool has a command called "printdecisiontree" that will print
an ascii-representation of the decision tree model to the console. Here is an
example for how to use it:
waffles_learn train mydata.arff decisiontree > dt.twt
waffles_plot printdecisiontree dt.twt mydata.arff
...if you want to do it in code, instead of using the command-line tools, the
interface you want may not be implemented. If you take a look at how the
GDecisionTree::print method works, you will see where such an interface could
be added.
I have read all documentation but i still can't figure it out, what is that
random value stands for when you create a tree? In example above:
GRand prng(0);
GDecisionTree tree(&prng);
Why it is so important that is always needed for creating a tree? And why in
this example will always be equal to "0"
Sorry, I just noticed that I never replied to this question. I'm sure it is
now too late, but I will answer in case someone else has the same question...
The value "0" is a seed for the pseudo-random number generator (PRNG). The
GDecisionTree class requires a reference to a PRNG because the use may specify
for it to make random divisions. If you use the tree with its default
parameters, it will not really use the PRNG that you supply to the constructor
(except maybe in obscure cases, like to break ties, etc.).
I suppose I could add a constructor that does not require this parameter, but
then the user would be limited to only use options that never require a PRNG
for any reason.