2006-12-07 16:49:59 UTC
Configuration, in the context of the classifier, not to be confused with config files, is a combination of features that make up a prototype. The evolution of the classifier design is a long story, but the final solution was a software implementation of a hardware implementation of a software speed-up to a *slow* software implementation. So the training data as it stands today contains upto 32 different configs for each class, which loosely represent the 32 different fonts used for the training data. When making a full match in the classifier, it finds the best-matched configuration of features. It is basically a way of allowing some features to be present some of the time without assuming statistical independence between them.
The starbase side is functional and used to work when I was at HP. I orignally designed an 'sbdaemon' program that handled all the interaction with x-windows in a separate process, and enabled you to zoom & scroll whatever had been sent to it from a client - in this case Tesseract. Unbeknownst to me the Starbase people came up with a similar idea at about the same time and called it DisplayList or something like that. I later ported the thing to MS windows and had it working on *both* platforms simultaneously via a bunch of ifdefs. Unfortunately HP doesn't want to release either version to open source as they are still used internally.
I would welcome any attempt to add the missing half of this functionality and bring it to the open source community. It requires rewriting the basic functionality of sbdaemon that is used: i.e. the ability to draw images, basic 2d vector graphics and text onto a surface that can be zoomed and scrolled, perhaps with OpenGL or something else that might easily work on multiple platforms. Connecting it to Tesseract has 2 possible routes:
1. Reverse engineer the requirements of the server from the code that we have from the client.
2. Implement a subclass of WINFD and replace the code in grphics.cpp with your own and then you can write your own IPC interface any way you like. In this case you might be able to get away with not running it in a separate process, which would simplify the design, but it might take major changes to Tesseract.
BTW, the next release is still in the pipeline. I am waiting for someone to finish a project to ifdef out all the graphics stuff, but leaving it all functional, to increase the portability to other operating systems.
Ray.