Re: [Jocr-devels] GOCR improvements
Status: Alpha
Brought to you by:
joerg10
From: Dmitry K. <dm...@ma...> - 2010-09-13 14:26:18
|
Dear GOCR community! I come back to maillist again with the same set of issues. I while ago I have submitted a set of patches, which (I suppose) we all agreed are OK to be applied to the HEAD. However, they are still not there. In particular: * unicode.h.patch Solves INFINITY constant clash with math.h * list.h.patch Solves a conflict of types with STL. I think, above two fixes are trivial. If they are not, let me know how can I improve the situation. * Makefile.in.patch Excludes CVS directories from distribution tarball. Note here: it is also necessary to remove all makefiles, as they are autogenerated from Makefile.in files. This should be done in "clean" or "distclean" rule. * Makefile.src.in.patch Provides extra rules to build a library. I see no other way at the moment than to provide all headers together with the library. However, the best way is to define a clean API (= gocr.h), which does not depend on e.g. "config.h". As I mentioned below the candidate functions for API are: job_init(&job); job_free(&job); pgm2asc(&job); I hope that GOCR developers will not block the proposal above concerning GOCR improvement for another tree months. I myself am ready to allocate some time to work on further GOCR improvements, for example: - Eliminate the global variable "job_t *JOB" - Enable the logging information to STDERR only if job->cfg.verbose flag is ON but in order to be sure that my changes do not break the core, I would like GOCR developers to provide basic tests for the project (via "make test" rule). I also provide a complete set of patches and improvements to Debian packaging, which I hope Cosimo will accept for the next release. Thank you in advance for any feedback! On 14.07.2010 15:10, Dmitry Katsubo wrote: > Hi Igor! > > I am pushing the discussion to maillist, as it might happen someone get > interested. > > I agree with Joerg'es remark, that we should define a clean interface > for the library. Right now we are using quite few functions: > > job_init(&job); > job_free(&job); > pgm2asc(&job); > > but <pgm2asc.h> refers all other headers, thus causing all of them (even > config.h) to be included into the package. I think this is easy to solve. > > Just few comments from my side: > > On 13.07.2010 18:47, Igor Filippov wrote: >> Joerg, >> >> What is your time frame for 0.49 release? I would like to synchronize >> OSRA release if possible. Also, there are several requests if you have >> time >> - My request on the GOCR list from January 14, 2010 to check the reason >> for recognition rates dropping since version 0.45 - very important for >> me > > I am happy to help here. It looks like this message never hit the > maillist (at least I was not able to locate it in [1], I found only your > message "segmenting characters from touching line graphics"). Shell I > use the same image (apodaca.png) to localize the problem? I am ready to > produce a simple test-case. If you mean another image, send me it then. > > [1] https://sourceforge.net/mailarchive/forum.php?forum_name=jocr-devels > >> My requests on the mail list from March 30, 2009: >> - INFINITY macro in unicode.h - it does indeed conflict with math.h! >> Should be easy enough to rename it to something unique? >> >> - "struct list" in list.h also conflicts with STL objects, easily solved >> by renaming it to "struct list_s". > > Above two are easy fixes and the patch I've already send. Anyway, I > include the complete debian package. > >> - global variable job_t *JOB - I understand it won't be easy to get rid >> of this one, but perhaps it can be added to the "TODO" list? > > I think, I'll get a time slot to work on this, but definitely later. I > also think, that there are some memory leaks (either in library or OSRA > fails to cleanup correctly). > >> - Sometimes libPgm2asc spits out warnings and errors to stderr, this is >> very unwelcome behavior. I would prefer the library to be silent on >> stderr. For me the only output I'm interested in is either recognized >> character(s) or "_" (as unrecognized character), any other side effects >> of running the library only get in the way. > > That is true. Because of that in the main module of OSRA we need to > close STDERR before processing, however it should be used to display > application-related problems. I have suppressed only one such logging, > but refactoring all cases is not easy, as the code here and there uses > different approaches to logging: > > if (job->cfg.verbose) { > > if (job->cfg.verbose & 16) { // constant should be used > > g_debug(fprintf(stderr," start frame:");) > > MSG( fprintf(stderr,"ad %d", ad); ) > >> - More straightforward way to build the library libPgm2asc - right now a >> user has to set up CPPFLAGS and LDFLAGS to add "-fPIC" flag to get "make >> libs" to compile. Also I'd like an option to have only static library >> built. > > I think, this is already in. > >> So sorry to bother you with this, but as you can see some of the >> requests have been hanging there for over a year, and I would think >> at least a few would have been fairly easy to resolve... >> >> Thank you for the absolutely essential open source OCR library - could >> not have proceed with my own project without it! >> >> Igor >> >> On Tue, 2010-06-29 at 16:54 -0400, Joerg Schulenburg wrote: >>> >>> The global job is a relict of a rewritten version, it will be eliminated >>> stepwise. I simply had not enough time to rewrite everything. >>> >>> Joerg -- With best regards, Dmitry |