From: Joerg W. <we...@in...> - 2004-04-19 16:54:07
|
Greetings, > well, then the first question - what about Weka performance ? > (It eats a lot of memory when working with large data sets) > > > R is similar and a long time ago i've used the interface under Java very > > shortly ... we're matlab based ! > > i like using matlab and it is quite usefull; > but matlab itself is not open source , it could be obstacle Same as for representation of molecules. WHY? 1111. Weka splits all into attributes and instances, also nominal and numeric attributes. This causes memory, but is quite usefull, because it is not clear from a series: 1,NaN,3,4,2,1 if this is a nominal classification or a numeric regression problem ! I understand your point, in fact i've implemented a DescriptorMatrix class for JOELib (joelib.desc.data) which holds only the matrix with descriptor names and molecules, but this causes a lot of problems for algorithm development, because the interface can not distinguish the above series by default. I used simply a matrix2weka mapping tool. That's why a student of mine developed a second interface was implemented to have both possibilities, which holds also the molecules in a weka related context directly. For my actual problem i need a wild mix between nominal and numeric and it is more clearly if the attributes holds this information already, so i must not implement always helper classes for both cases. 2222. In general it is usefull to cache data sets (already available as DescriptorMatrixCache) to avoid multiple entries in memory. The cross-validation can be catched from the cached versions. Furthermore optimization algorithms needs a common DB analogue interface or caching mechanism to load required data set s only once (singleton class interface) 3333. It is not possible to compete with fast matrix operations, there R or Matlab should be used, there specific optimized code is needed. Java has: Jama and COLT and some Weka-Add-Ons uses them, but this can never be compared to assembler optimized code. Kind regards, Joerg Joerg Kurt Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |