Menu

Random forest nominal variables

Help
Anonymous
2014-05-30
2014-06-01
  • Anonymous

    Anonymous - 2014-05-30

    Is the Random Forest algorithm in Waffles able to handle nominal features with a large number of distinct values (> 100)? If I pass such a dataset to waffles_learn train randomforest, will it be transformed in some way before the randomforest algorithm is called?

    Thanks,
    Ganesh

     
  • Mike Gashler

    Mike Gashler - 2014-05-31

    There is no theoretical limit, but I have not done much testing with this case. I suspect that random forest would do much better in this case with binary divisions. Example:

    waffles_learn crossvalidate mushroom.arff bag 50 decisiontree -random 1 -binary end

    (If it does not work, I would like to have repro steps, so I could debug it and make it work.)

    There would be no automatic transformation of the data, since the GDecisionTree implicitly supports both nominal and continuous features and labels.

     
  • Anonymous

    Anonymous - 2014-06-01

    Thanks for the response. Random forest does work in a test example I created that has a nominal variable with nearly 100 levels. Specifically, I did: ~/waffles/bin/waffles_learn train training_data.arff randomforest 100 -samples 2

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.