joelib-devel Mailing List for JOELib/JOELib2 (Page 7)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Greetings,

> well, then the first question  - what about Weka performance ?
> (It eats a lot of memory when working with large data sets)
> 
> > R is similar and a long time ago i've used the interface under Java very
> > shortly ... we're matlab based !
> 
> i like using matlab and it is quite usefull;
> but matlab itself is not open source , it could be obstacle
Same as for representation of molecules.

WHY?
1111.
Weka splits all into attributes and instances, also nominal and numeric
attributes. This causes memory, but is quite usefull, because it is not
clear from a series:
1,NaN,3,4,2,1
if this is a nominal classification or a numeric regression problem !

I understand your point, in fact i've implemented a DescriptorMatrix class
for JOELib (joelib.desc.data) which holds only the matrix with descriptor
names and molecules, but this causes a lot of problems for algorithm
development, because the interface can not distinguish the above series by
default.
I used simply a matrix2weka mapping tool.

That's why a student of mine developed a second interface was implemented
to have both possibilities, which holds also the molecules in a weka
related context directly.

For my actual problem i need a wild mix between nominal and numeric and it
is more clearly if the attributes holds this information already, so i
must not implement always helper classes for both cases.

2222.
In general it is usefull to cache data sets (already available as
DescriptorMatrixCache) to avoid multiple entries in memory. The
cross-validation can be catched from the cached versions.
Furthermore optimization algorithms needs a common DB analogue interface
or caching mechanism to load required data set s only once (singleton
class interface)

3333.
It is not possible to compete with fast matrix operations, there R or
Matlab should be used, there specific optimized code is needed.
Java has: Jama and COLT and some Weka-Add-Ons uses them, but this can
never be compared to assembler optimized code.

Kind regards, Joerg

Joerg Kurt Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                    (E. Hemingway)

Never mistake action for meaningful action.
                               (Hugo Kubinyi,2004)                         

2002	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep (3)	Oct (1)	Nov	Dec (4)
2003	Jan	Feb (10)	Mar	Apr (2)	May (4)	Jun (1)	Jul (1)	Aug (13)	Sep (1)	Oct	Nov (4)	Dec
2004	Jan (5)	Feb (9)	Mar (13)	Apr (25)	May (10)	Jun (21)	Jul (13)	Aug (8)	Sep (6)	Oct (1)	Nov (5)	Dec (16)
2005	Jan (9)	Feb (15)	Mar (8)	Apr (8)	May (3)	Jun (1)	Jul (1)	Aug (1)	Sep	Oct (1)	Nov	Dec
2006	Jan (2)	Feb (2)	Mar (1)	Apr	May	Jun	Jul (2)	Aug	Sep (5)	Oct	Nov	Dec
2007	Jan (1)	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

joelib-devel Mailing List for JOELib/JOELib2 (Page 7)

joelib-devel — JOELib-Development