[Pattern-recognition] Starting up...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This message is sent just to check if everybody has already subscribed 
to the SourceForge Numerical Cruncher project list.

In order to get started, we should begin by developing the subsystems 
which will provide the necessary infrastructure for the rest of the 
project to proceed. I would like you to choose the subsystems you are 
more interested in. This way, we will be able to develop the following 
subsystems in parallel:

- Data modeling subsystem: We can assume we will work on numeric 
datasets which could come from different data sources, so we could use 
the composite design pattern and implement several wrappers (such as the 
JDBC, ASCII, and raw image wrappers already implemented, and other 
standards such as DSTP, XML, etc.). This system should provide the basic 
sequential and random access methods to the patterns/records/tuples in 
each dataset. Continuous/numeric attributes should be separated from 
categorical/nominal ones.

- Process modeling subsystem: Base abstract classes for the kinds of 
techniques we will implement to solve classification, regression, and 
clustering problems. For instance, classification could be seen as a 
particular case of regression, while clustering is a special case of 
classification. All of them can be considered as agents/processes which 
take some kind of input and generate the apropriate output. This 
subsystem should support introspection (i.e. the ability to discover the 
internal structure of the different agents, such as parameters and their 
kind).

- Bridges: Once the previous subsystem interface is defined, we should 
also develop bridges to existing systems and collections of algorithms 
such as WEKA, MLC++, etc..

- User interface subsystem: Using reflection, we should be able to 
generate standard windows/web interfaces for the available components in 
the system (datasets & pattern recognition algorithms). The main design 
principle here should be "generate, don't code", so that the development 
of new algorithms would require NO interface work (unless required for 
particular applications of our framework).

- Framework infrastructure subsystem: We should also develop some 
infrastructure to decouple the components in our system from the actual 
method call techniques (simple method invocation, RMI, CORBA...) and 
provide some transparency to the users of the algorithms we implement 
(e.g. location transparency). This could be useful if we want our system 
to control its own performance asigning resources to competing processes 
and even to work on distributed environments.

- Last, but not least, some of us will have to develop the documentation 
which will make our system usable. This is essential if we want our 
system to grow and more people to collaborate in the development of 
techniques and tools for pattern recognition / machine learning.

I hope we will make a great job, learn a lot and have a good time while 
working on this project.

Best regards,

	Fernando