[gplab-users] classification of 120 length by 30 cases matrix
Status: Beta
Brought to you by:
gplab
From: James G. <jam...@ho...> - 2007-07-07 19:20:57
|
I am currently having problems to get a useful output from the GPlab. I realise this is due to my understanding of the problem and wondered if anyone could help me get started on this. I have some useful ideas and already classified my data using fuzzy clustering. To that, I would like to classify the data using GP as I have worked on this for the last 4 months and dont really want to say it cant classify the data when in fact its down to my understanding and not the strategy behind GP. The problem: If I have 30 cases of data - for argument sake, the 1st ten represent classification phenomenon 1, the next ten (11-20) represent classification phenomenon 2 and lastly, the last ten (21-30), represent classification phenomenon 3. In short I wish to segregate classes 1 to 3 using if, else, if greater than, and less than rules. I use two text inputs one with the data (120*30 matrix) and one with the targets 1 to 3. I have got useful rules from tree regression software (Matlab) however not GP, I believe its my fitness function (standard GP toolbox) and terminals are currently set to nil. The functions are as stated above (implemented my own min, max, kurtosis, stddev, although these just confused matters. The GP lab is converging however I am having problems in interpreting the output string/tree for classifying the 3 classes of data (it would be useful to get an output similar to the NN GP output against desired output. Each of the 30 cases are 120 pts in size and each of the classes are significantly different from the other classes. I.e. first class has first 30 pts larger than the 2nd batch and the 3rd batch has 60 -90 pts both larger than the other classes. Based on this problem I would like to use the fitness measure to continually update a correct classification and penalize a misclassification. Perhaps a figure of merit could be associated to ensure the best individual is found. Can anyone help me getting started with such problem? I.e what functions and terminals do they think are appropriate (I am looking at le, gr and myif), not sure what terminals to use ??? The fitness function should look if the rules/tree segregates the data giving the correct classification .....to this end, I believe something similar to the ant demo is required but not as complex. I would like to express my thanks to Sara for producing such a user friendly and powerful tool box and once I get into this in greater detail I would like to share some ideas I have for enabling your GP toolbox to handle n-dimensionally huge data ??? Although automating may take some effort and that's of course if you haven't already done this! Again I would be grateful for any help any of the GP community can give as I have been mucking around with the toolbox for 2 months now and I can't seem to replicate John Koza/Daniel Howards GP classification ideas (they classify imagery data - although mine is mechanical measurements and Acoustic Emission and a lot smaller in size due to n-dimensional reduction). _________________________________________________________________ Win tickets to the sold out Live Earth concert! http://liveearth.uk.msn.com |