[gplab-users] classification of 120 length by 30 cases matrix From: James Griffin - 2007-07-07 19:20 ```I am currently having problems to get a useful output from the GPlab. I realise this is due to my understanding of the problem and wondered if anyone could help me get started on this. I have some useful ideas and already classified my data using fuzzy clustering. To that, I would like to classify the data using GP as I have worked on this for the last 4 months and dont really want to say it cant classify the data when in fact its down to my understanding and not the strategy behind GP. The problem: If I have 30 cases of data - for argument sake, the 1st ten represent classification phenomenon 1, the next ten (11-20) represent classification phenomenon 2 and lastly, the last ten (21-30), represent classification phenomenon 3. In short I wish to segregate classes 1 to 3 using if, else, if greater than, and less than rules. I use two text inputs one with the data (120*30 matrix) and one with the targets 1 to 3. I have got useful rules from tree regression software (Matlab) however not GP, I believe its my fitness function (standard GP toolbox) and terminals are currently set to nil. The functions are as stated above (implemented my own min, max, kurtosis, stddev, although these just confused matters. The GP lab is converging however I am having problems in interpreting the output string/tree for classifying the 3 classes of data (it would be useful to get an output similar to the NN GP output against desired output. Each of the 30 cases are 120 pts in size and each of the classes are significantly different from the other classes. I.e. first class has first 30 pts larger than the 2nd batch and the 3rd batch has 60 -90 pts both larger than the other classes. Based on this problem I would like to use the fitness measure to continually update a correct classification and penalize a misclassification. Perhaps a figure of merit could be associated to ensure the best individual is found. Can anyone help me getting started with such problem? I.e what functions and terminals do they think are appropriate (I am looking at le, gr and myif), not sure what terminals to use ??? The fitness function should look if the rules/tree segregates the data giving the correct classification .....to this end, I believe something similar to the ant demo is required but not as complex. I would like to express my thanks to Sara for producing such a user friendly and powerful tool box and once I get into this in greater detail I would like to share some ideas I have for enabling your GP toolbox to handle n-dimensionally huge data ??? Although automating may take some effort and that's of course if you haven't already done this! Again I would be grateful for any help any of the GP community can give as I have been mucking around with the toolbox for 2 months now and I can't seem to replicate John Koza/Daniel Howards GP classification ideas (they classify imagery data - although mine is mechanical measurements and Acoustic Emission and a lot smaller in size due to n-dimensional reduction). _________________________________________________________________ Win tickets to the sold out Live Earth concert! http://liveearth.uk.msn.com ```
 Re: [gplab-users] classification of 120 length by 30 cases matrix From: Sara Silva - 2007-07-11 23:06 ```Dear James, Sorry for the delay in replying. I am currently writing my PhD thesis so I am VERY busy. But I will try to help. You say that GPLAB does converge. Does it mean that it reaches an optimal solution according to your fitness function? Also, I cannot imagine what type of problem you are having in interpreting the final tree. Can you give an example? Cheers, Sara James Griffin wrote: > I am currently having problems to get a useful output from the GPlab. I > realise this is due to my understanding of the problem and wondered if > anyone could help me get started on this. I have some useful ideas and > already classified my data using fuzzy clustering. To that, I would like > to classify the data using GP as I have worked on this for the last 4 > months and don’t really want to say it can’t classify the data when in > fact it’s down to my understanding and not the strategy behind GP. > The problem: > > If I have 30 cases of data - for argument sake, the 1st ten represent > classification phenomenon 1, the next ten (11-20) represent > classification phenomenon 2 and lastly, the last ten (21-30), represent > classification phenomenon 3. In short I wish to segregate classes 1 to 3 > using ‘if,’ ‘else,’ ‘if greater than,’ and ‘less than rules.’ I use two > text inputs one with the data (120*30 matrix) and one with the targets 1 > to 3. I have got useful rules from tree regression software (Matlab) > however not GP, I believe it’s my fitness function (standard GP toolbox) > and terminals are currently set to nil. The functions are as stated > above (implemented my own ‘min,’ ‘max,’ kurtosis,’ ‘stddev,’ although > these just confused matters. > > The GP lab is converging however I am having problems in interpreting > the output string/tree for classifying the 3 classes of data (it would > be useful to get an output similar to the NN GP output against desired > output. Each of the 30 cases are 120 pts in size and each of the classes > are significantly different from the other classes. I.e. first class has > first 30 pts larger than the 2nd batch and the 3rd batch has 60 -90 pts > both larger than the other classes. > > Based on this problem I would like to use the fitness measure to > continually update a correct classification and penalize a > misclassification. Perhaps a figure of merit could be associated to > ensure the best individual is found. Can anyone help me getting started > with such problem? I.e what functions and terminals do they think are > appropriate (I am looking at le, gr and myif), not sure what terminals > to use ??? The fitness function should look if the rules/tree segregates > the data giving the correct classification .....to this end, I believe > something similar to the ant demo is required but not as complex. > > I would like to express my thanks to Sara for producing such a user > friendly and powerful tool box and once I get into this in greater > detail I would like to share some ideas I have for enabling your GP > toolbox to handle n-dimensionally huge data ??? Although automating may > take some effort and that's of course if you haven't already done this! > Again I would be grateful for any help any of the GP community can give > as I have been mucking around with the toolbox for 2 months now and I > can't seem to replicate John Koza/Daniel Howards GP classification ideas > (they classify imagery data - although mine is mechanical measurements > and Acoustic Emission and a lot smaller in size due to n-dimensional > reduction). > > _________________________________________________________________ > Win tickets to the sold out Live Earth concert! > http://liveearth.uk.msn.com > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > gplab-users mailing list > gplab-users@... > https://lists.sourceforge.net/lists/listinfo/gplab-users ```