jocr-devels Mailing List for Optical Character Recognition (GOCR)
Status: Alpha
Brought to you by:
joerg10
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(9) |
Sep
(18) |
Oct
(20) |
Nov
(12) |
Dec
(53) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(10) |
Feb
(1) |
Mar
(2) |
Apr
(11) |
May
(19) |
Jun
(10) |
Jul
(28) |
Aug
(23) |
Sep
(15) |
Oct
(22) |
Nov
(7) |
Dec
(2) |
2002 |
Jan
(16) |
Feb
(11) |
Mar
(7) |
Apr
(5) |
May
(10) |
Jun
(11) |
Jul
(1) |
Aug
(6) |
Sep
(7) |
Oct
(3) |
Nov
(4) |
Dec
(1) |
2003 |
Jan
(16) |
Feb
|
Mar
(29) |
Apr
(29) |
May
(12) |
Jun
(2) |
Jul
(2) |
Aug
(2) |
Sep
(2) |
Oct
(5) |
Nov
(4) |
Dec
(3) |
2004 |
Jan
(2) |
Feb
|
Mar
(10) |
Apr
(4) |
May
(3) |
Jun
(3) |
Jul
(9) |
Aug
(4) |
Sep
(1) |
Oct
(8) |
Nov
(3) |
Dec
(2) |
2005 |
Jan
(7) |
Feb
(1) |
Mar
|
Apr
(5) |
May
(10) |
Jun
(12) |
Jul
(6) |
Aug
(17) |
Sep
(5) |
Oct
(1) |
Nov
(3) |
Dec
(26) |
2006 |
Jan
(14) |
Feb
(7) |
Mar
(1) |
Apr
(3) |
May
(11) |
Jun
(21) |
Jul
(3) |
Aug
(16) |
Sep
(14) |
Oct
(3) |
Nov
(16) |
Dec
(37) |
2007 |
Jan
(1) |
Feb
(8) |
Mar
(3) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
(9) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2008 |
Jan
(1) |
Feb
|
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
(7) |
Aug
|
Sep
|
Oct
(2) |
Nov
(5) |
Dec
(2) |
2009 |
Jan
(6) |
Feb
(5) |
Mar
(5) |
Apr
(2) |
May
|
Jun
|
Jul
(3) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2010 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
From: Moshe F. <pa...@gm...> - 2011-11-01 11:53:27
|
When I wrote about downsizing I meant with GOCR not to develop software of my own. 5000$ is not NOTHING. It is not a lot - but it IS something. And if GOCR stays open source then I'm paying for something which is not mine. So its a WIN WIN - if it fits the AMR (automatic Meter Reading) industry, and is used by everyone in that industry. Once its used extensively, problems will come up, and then the companies will be happy to pay for an advanced version with some special requirements or to pay for a specific fix. Moshe |
From: Moshe F. <pa...@gm...> - 2011-10-30 17:24:35
|
A client of mine is willing to pay $5000 for a subset of JOCR which will be as small as possible (will be used in microcontrollers) and that can read AMR (Automatic Meter Reading) numbers, including the "in between" situations. It also should give the options of what the result could be, if not determined (e.g. 5 or 6; 1-2 or 7-8) We can supply sample images. Is that possible? The license will stay as it is now (open source, allowed in commercial use as long as due acknowledgement given, correct?) Thanks, Moshe Flam |
From: Stephen B. <ste...@mc...> - 2011-04-22 06:37:53
|
Hi All, In running the following command in a cygwin terminal ------------------------------------------------------------------------ Admin@Shammah /cygdrive/p/R_PNM $ gocr -m 130 -p /etc/gocr/db -m 4 A.pnm ---------------------------------------------------------------------- I get --------------------------------------------------------------------------------- # show box + environment # show box x= 122 405 d= 13 17 r= 6 0 # show pattern x= 97 403 d= 63 34 t= 1 1 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,......####...,,,,,,,,,,,,,,,,,,,,,,,,,<- ,,,,,,,,,,,,,,,,,,,,,,,,,...########..,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,..#########..,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,.###....###..,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,###......###.,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,###.......###,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,###.......###,,,,,,,,,,,,,,,,,,,,,,,,,< ,,,,,,,,,,,,,,,,,,,,,,,,,###.......###,,,,,,,,,,,,,,,,,,,,,O,,, ,,,,,,,,,,,,,,,,,,,,,,,,,##........###,,,,,,,,,,,,,,,,,,,,,O,,, ,,,,,,,,,,,,,,,,,,,,,,,,,..........###,,,,,,,,,,,,,,,,,,,,,O,,, ,,,,,,,,,,,,,,,,,,,,,,,,,.........####,,,,,,,,,,,,,,,,,,,,,O,,O ,,,,,,,,,,,,,,,,,,,,,,,,,........####.,,,,,,,,,,,,,,,,,,,,,O,,O ,,,,,,,,,,,,,,,,,,,,,,,,,.......#####.,,,,,,,,,,,,,,,,,,,,,OO,O ,,,,,,,,,,,,,,,,,,,,,,,,,......#####..,,,,,,,,,,,,,,,,,,,,,OO,, ,,,,,,,,,,,,,,,,,,,,,,,,,.....#####...,,,,,,,,,,,,,,,,,,,,,OO,, ,,,,,,,,,,,,,,,,,,,,,,,,,.....####....,,,,,,,,OOO,OOO,,,,,,O,,, ,,,,,,,,,,,,,,,,,,,,,,,,,...#.#.......,,,,,,,,OOO,OOO,,,,,,OO,, - ,,,,,,,,,,,,,,,,,,,,,,,,,,,OO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,OO,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,O,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,O,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,O,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,OO,, ,,,,,,,,,,,,,,,,,,,,,,,,,,OO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,OOOOO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,OOOOOO,,OOO,,,,,,,,,,,,,,,,,,,,,,,,,,O ,,,,,,,,,,,,,,,,,,,,,,,,,OOOOOOOOOOOO,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,OOOOOOOOOO,,,,,,,,,,,,,,,,,,,,,,,,,,,< ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,< The above pattern was not recognized. Enter UTF8 char or string for above pattern. Leave empty if unsure. Press RET at the end (ALT+RET to store into RAM only) : ------------------------------------------------------------------------------------------------- Of course in a fixed font terminal this is easily recognized as "2 -" How do I get gocr to look at a smaller part of this image? Thanks in advance for your help. Yours Sincerely Stephen Grant Brown |
From: Stephen B. <ste...@mc...> - 2011-04-15 12:27:21
|
Hi All, I have successfully compiled gocr-0.49.tar.gz under Microsoft Windows Vista using cygwin. In a cygwin terminal the following command does not stop and ask user to exeendto extend the database, although some characters are recognized. Is the following command correct? djpeg digan.jpg | gocr -m 130 Whereis is the database stored? Where is the gocr user manual? Yours Sincerely Stephen Grant Brown |
From: Dmitry K. <dm...@ma...> - 2010-10-05 11:04:19
|
Hi Jörg! (1) I also ask you to add the following patch to mainstream. The reason of the path is that the binaries should be removed with "make clean". Plus some important things for library / headers installation. See attachment. (2) The also tarball contains some temporary files, e.g. ~/gocr# find . -name '.#*' -o -name '*~' -o -name '*.orig' ./bin/gocr_chk.sh~ ./include/config.h~ ./include/config.h.in~ ./include/version.h~ ./src/barcode.c.orig ./doc/.#Makefile.1.6 ./man/man1/gocr.1~ ./man/Makefile.in~ ./examples/.#Makefile.1.22 I think, "make distclean" should remove them, no? (3) I have several reports about the memory leak in list_app() function in list.c:112. valgrind report: > ==29705== 2,628 (2,244 direct, 384 indirect) bytes in 187 blocks are definitely lost in loss record 192 of 209 > ==29705== 2,660 (2,280 direct, 380 indirect) bytes in 190 blocks are definitely lost in loss record 193 of 209 > ==29705== 3,538 (3,024 direct, 514 indirect) bytes in 252 blocks are definitely lost in loss record 198 of 209 > ==29705== 3,585 (3,072 direct, 513 indirect) bytes in 256 blocks are definitely lost in loss record 199 of 209 > ==29705== 5,883 (5,040 direct, 843 indirect) bytes in 420 blocks are definitely lost in loss record 201 of 209 > ==29705== 8,261 (7,068 direct, 1,193 indirect) bytes in 589 blocks are definitely lost in loss record 206 of 209 > ==29705== at 0x4823C4C: malloc (vg_replace_malloc.c:195) > ==29705== by 0x1DF427: list_app (list.c:112) > ==29705== by 0x1DE733: store_boxtree_lines (lines.c:355) > ==29705== by 0x1D3971: pgm2asc (pgm2asc.c:3158) > ==29705== by 0x18D642: get_atom_label(Magick::Image const&, Magick::ColorGray const&, int, int, int, int, double, int, int) (osra_ocr.cpp:215) list_app() allocates a memory for the element: if ( !(e = (Element *)malloc(sizeof(Element))) ) for the the list job->res.linelist in function store_boxtree_lines(). However, feeing of this list is commented out in job_free_image() (see job.c:79) /* FIMXE jb: free lists * list_free( &job->res.linelist ); * list_free( &job->tmp.dblist ); */ Maybe this causes so many reports. I hope, it helped you a bit :) On 24.09.2010 10:17, Joerg Schulenburg wrote: > > Because --exclude-cvs is a newer option (My version did not know it), I > prefer to not add it at this moment. But I removed the CVS directories > and created a new jocr-tarball. > Because I dont use CVS anymore (its easier for me to live without SF), > that should never reappear in future tarballs. > > Yes, you could help. > I would solve the global variable problem itself, because the whole > program is attached by the problem and I think checking a patch is of > the same effort like solving the problem itself. I did it not until yet, > because I know that this is very time consuming which is a problem if > one dont have much time for hacking. > > I would very welcome if someone points me to the precise source of > memory leaks and security problems and give good samples to reproduce it > and hints or patches to solve it. > This would give me more time to improve the quality. > > Jörg -- With best regards, Dmitry |
From: Joerg S. <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2010-09-24 09:01:42
|
> JOCR is unable to detect '@' character and it is converting 0 (Zero) > as upper case 'O'. Is there any way that I can modify or configure JOCR to > recognize '@' and other characters which are not being correctly read by > JOCR. If you feel, that the reason is bad recognition instead of bad scan quality, send a small sample + short description file directly to me. Also please try always the newest version http://www.ovgu.de/jschulen/ocr/jocr.tgz before asking for improvements. You can set a filter -C 0-9 if you want to detect numbers only. Distinguish 0 from O is mostly a context problem. This is not or poorly supported. Joerg |
From: Dmitry K. <dm...@ma...> - 2010-09-23 23:06:51
|
Hi Joerg, Thank you for reply! Initially I checked the sources in the official repository: http://jocr.cvs.sourceforge.net/viewvc/jocr/jocr/ but indeed, the patches are in tarball you've mentioned. Why you cannot commit them? I also suggest to add "--exclude-vcs" in all places in Makefile, where you create tarball, e.g. --- a/Makefile.in +++ b/Makefile.in @@ -153,7 +153,7 @@ -rm -rf jocr/src/api # -rm -rf jocr/CVS jocr/*/CVS jocr/*/*/CVS # CVS tree -rm -rf jocr/Makefile jocr/src/Makefile jocr/include/config.h - tar chzf ../jocr.tgz jocr + tar --exclude-vcs chzf ../jocr.tgz jocr -gpg -ab --default-key 0x53BDFBE3 ../jocr.tgz # .asc -cp ../jocr.tgz ../jocr.tgz.`date +%y%m%d` # backup, remove later ls -l ../jocr.tgz{,.asc} I wonder, if we can help you somehow. I also suppose there are memory leaks in list structures, that GOCR allocates in memory. They are visible from valgrind output. I would like to have a closer look at them, if you could manage the removal of global JOB variable. Maybe I can help you with last one? I tried to analyze it, but I was a bit confused, as in some places where global and non-global approaches are used. Joerg Schulenburg wrote on 23.09.2010 00:18: > > Hi, > > Your patches are there. > Did you checked http://www.ovgu.de/jschulen/ocr/jocr.tgz ? > But I have to do some other improvements to avoid worse recognition > before releasing 0.49 official. > > Joerg > > On Mon, 13 Sep 2010, Dmitry Katsubo wrote: > >> Dear GOCR community! >> >> I come back to maillist again with the same set of issues. I while ago I >> have submitted a set of patches, which (I suppose) we all agreed are OK >> to be applied to the HEAD. However, they are still not there. >> >> In particular: >> >> * unicode.h.patch >> Solves INFINITY constant clash with math.h >> * list.h.patch >> Solves a conflict of types with STL. >> >> I think, above two fixes are trivial. If they are not, let me know how >> can I improve the situation. >> >> * Makefile.in.patch >> Excludes CVS directories from distribution tarball. >> Note here: it is also necessary to remove all makefiles, as they are >> autogenerated from Makefile.in files. This should be done in "clean" or >> "distclean" rule. >> >> * Makefile.src.in.patch >> Provides extra rules to build a library. I see no other way at the >> moment than to provide all headers together with the library. However, >> the best way is to define a clean API (= gocr.h), which does not depend >> on e.g. "config.h". As I mentioned below the candidate functions for API >> are: >> >> job_init(&job); >> job_free(&job); >> pgm2asc(&job); >> >> I hope that GOCR developers will not block the proposal above concerning >> GOCR improvement for another tree months. >> >> I myself am ready to allocate some time to work on further GOCR >> improvements, for example: >> - Eliminate the global variable "job_t *JOB" >> - Enable the logging information to STDERR only if job->cfg.verbose flag >> is ON >> but in order to be sure that my changes do not break the core, I would >> like GOCR developers to provide basic tests for the project (via "make >> test" rule). >> >> I also provide a complete set of patches and improvements to Debian >> packaging, which I hope Cosimo will accept for the next release. >> >> Thank you in advance for any feedback! -- With best regards, Dmitry |
From: Steve <pro...@gm...> - 2010-09-23 10:10:31
|
Hello, Thanks for accepting me to mailing list. I am user of JOCR and not devloper and my level of programming is beginner. I need some help on making chnages to JOCR. I look for the support and other forums for JOCR on Google but not found any helpful forum or board to post my question. I am using JOCR on Fedora 9. I have got a work of converting images into text. JOCR is unable to detect '@' character and it is converting 0 (Zero) as upper case 'O'. Is there any way that I can modify or configure JOCR to recognize '@' and other characters which are not being correctly read by JOCR. If anyone provides me help on this I would be thankful. Thanks Steve |
From: Dmitry K. <dm...@ma...> - 2010-09-13 14:26:18
|
Dear GOCR community! I come back to maillist again with the same set of issues. I while ago I have submitted a set of patches, which (I suppose) we all agreed are OK to be applied to the HEAD. However, they are still not there. In particular: * unicode.h.patch Solves INFINITY constant clash with math.h * list.h.patch Solves a conflict of types with STL. I think, above two fixes are trivial. If they are not, let me know how can I improve the situation. * Makefile.in.patch Excludes CVS directories from distribution tarball. Note here: it is also necessary to remove all makefiles, as they are autogenerated from Makefile.in files. This should be done in "clean" or "distclean" rule. * Makefile.src.in.patch Provides extra rules to build a library. I see no other way at the moment than to provide all headers together with the library. However, the best way is to define a clean API (= gocr.h), which does not depend on e.g. "config.h". As I mentioned below the candidate functions for API are: job_init(&job); job_free(&job); pgm2asc(&job); I hope that GOCR developers will not block the proposal above concerning GOCR improvement for another tree months. I myself am ready to allocate some time to work on further GOCR improvements, for example: - Eliminate the global variable "job_t *JOB" - Enable the logging information to STDERR only if job->cfg.verbose flag is ON but in order to be sure that my changes do not break the core, I would like GOCR developers to provide basic tests for the project (via "make test" rule). I also provide a complete set of patches and improvements to Debian packaging, which I hope Cosimo will accept for the next release. Thank you in advance for any feedback! On 14.07.2010 15:10, Dmitry Katsubo wrote: > Hi Igor! > > I am pushing the discussion to maillist, as it might happen someone get > interested. > > I agree with Joerg'es remark, that we should define a clean interface > for the library. Right now we are using quite few functions: > > job_init(&job); > job_free(&job); > pgm2asc(&job); > > but <pgm2asc.h> refers all other headers, thus causing all of them (even > config.h) to be included into the package. I think this is easy to solve. > > Just few comments from my side: > > On 13.07.2010 18:47, Igor Filippov wrote: >> Joerg, >> >> What is your time frame for 0.49 release? I would like to synchronize >> OSRA release if possible. Also, there are several requests if you have >> time >> - My request on the GOCR list from January 14, 2010 to check the reason >> for recognition rates dropping since version 0.45 - very important for >> me > > I am happy to help here. It looks like this message never hit the > maillist (at least I was not able to locate it in [1], I found only your > message "segmenting characters from touching line graphics"). Shell I > use the same image (apodaca.png) to localize the problem? I am ready to > produce a simple test-case. If you mean another image, send me it then. > > [1] https://sourceforge.net/mailarchive/forum.php?forum_name=jocr-devels > >> My requests on the mail list from March 30, 2009: >> - INFINITY macro in unicode.h - it does indeed conflict with math.h! >> Should be easy enough to rename it to something unique? >> >> - "struct list" in list.h also conflicts with STL objects, easily solved >> by renaming it to "struct list_s". > > Above two are easy fixes and the patch I've already send. Anyway, I > include the complete debian package. > >> - global variable job_t *JOB - I understand it won't be easy to get rid >> of this one, but perhaps it can be added to the "TODO" list? > > I think, I'll get a time slot to work on this, but definitely later. I > also think, that there are some memory leaks (either in library or OSRA > fails to cleanup correctly). > >> - Sometimes libPgm2asc spits out warnings and errors to stderr, this is >> very unwelcome behavior. I would prefer the library to be silent on >> stderr. For me the only output I'm interested in is either recognized >> character(s) or "_" (as unrecognized character), any other side effects >> of running the library only get in the way. > > That is true. Because of that in the main module of OSRA we need to > close STDERR before processing, however it should be used to display > application-related problems. I have suppressed only one such logging, > but refactoring all cases is not easy, as the code here and there uses > different approaches to logging: > > if (job->cfg.verbose) { > > if (job->cfg.verbose & 16) { // constant should be used > > g_debug(fprintf(stderr," start frame:");) > > MSG( fprintf(stderr,"ad %d", ad); ) > >> - More straightforward way to build the library libPgm2asc - right now a >> user has to set up CPPFLAGS and LDFLAGS to add "-fPIC" flag to get "make >> libs" to compile. Also I'd like an option to have only static library >> built. > > I think, this is already in. > >> So sorry to bother you with this, but as you can see some of the >> requests have been hanging there for over a year, and I would think >> at least a few would have been fairly easy to resolve... >> >> Thank you for the absolutely essential open source OCR library - could >> not have proceed with my own project without it! >> >> Igor >> >> On Tue, 2010-06-29 at 16:54 -0400, Joerg Schulenburg wrote: >>> >>> The global job is a relict of a rewritten version, it will be eliminated >>> stepwise. I simply had not enough time to rewrite everything. >>> >>> Joerg -- With best regards, Dmitry |
From: Carl K. <ca...@pe...> - 2010-09-11 04:17:21
|
$ gocr shot0001.png l NVALl D That should be "INVALID" - what options will make it see it? flv$ gocr -C INVALID -s 1 shot0001.png I N VA L I D -- Carl K |
From: Ryan S. <rya...@us...> - 2010-04-20 09:07:17
|
Hello, I notice on the GOCR homepage that you had trouble uploading the latest GOCR release to SourceForge, and don't like their new interface and wasting time dealing with all their "improvements". Have you considered moving the project to Google Code? The developer of the Pure programming language had similar complaints about SourceForge, and has found Google Code quite pleasant by comparison. He switched from SourceForge to Google Code within 2 days of the suggestion. Thread asking for other hosting options and discussing some features of Google Code: https://sourceforge.net/mailarchive/forum.php?thread_name=48DFC587.6090503%40t-online.de&forum_name=pure-lang-users Thread announcing switch to Google Code: https://sourceforge.net/mailarchive/forum.php?thread_name=48E1085C.6010602%40t-online.de&forum_name=pure-lang-users |
From: Carl K. <ca...@pe...> - 2010-03-01 04:51:11
|
What can this list tell me about reading what bubbles are filled in on a form? aka Scantron, "use a #2 pencil." The goal is to write an app that lets user #1 enter into a database some questions and answers, app prints out form, user #2 fills in bubbles, user #3 scans form and answers are appended to database. bar code on the page or each question will help with multi page issues. -- Carl K |
From: Igor F. [Contr] <ig...@he...> - 2009-12-17 18:20:35
|
Hello, I discussed this feature request with Antonio (of OCRAD), and I am wondering if GOCR developers might get interested in this as well. Basically I would find it tremendously useful if a character which touches a line of graphics (such as for example a bond in a chemical structure touches the character(s) of atomic label) could be detected and recognized. I have a few sample images if there is any developer who is interested to work on this feature. This is a hard problem and while there are some commercial products which can deal with segmentation of this nature I found no open source code which can do this. I am looking to use it in OSRA project, but this could be useful for other applications as well - e.g. OCR for maps and engineering drawings. Please let me know if there is anyone interested in working on this. Regards, Igor Filippov |
From: Joerg S. <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2009-08-04 15:37:11
|
Hi all, Because I am unable to upload to SourceForge, I contact you directly. GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Please look at: http://jocr.sourceforge.net/download.html History: (Changes,ChangeLog) 0.48 Jul09 fix buffer overflow introduced in 0.46 for filenames add codabar barcode fix bug, removing melted serifs add patch by Chris Lee, i25 barcode recognition + modifications fix some false positive numbers "34" (video, gas meter) fix problems with 2zZ4 for 10x10 screen font better debug output for :;,. remove examples, doc and libs part from configure (see below) remove doc and examples from the (make install) part to reduce dependencies (gs and transfig is not needed for rpm/ebuild) gocr only may depend from netpbm, but can live without too this will help to install gocr on "exotic" (nonlinux) platforms fix gentoo app-text/gocr Bug 243250 src/Makefile: $(CC) $(LDFLAGS) ... Hope, you find it useful. Regards, Joerg. - http://www-e.uni-magdeburg.de/jschulen/ocr/ - PGP 1024D/53BDFBE3, 3816 B803 D578 F5AD 12FD FE06 5D33 0C49 53BD FBE3 |
From: Joerg S. <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2009-07-03 14:44:25
|
Hi, gocr is a "command line" based program. You dont have to install and you dont get a graphical user interface. You have to open a shell and call gocr from there. You can also can install graphical user interfaces for gocr. What you see is a shell window opening showing the help text and closing. I am sorry that I can not give much support for Windows. Joerg. On Thu, 2 Jul 2009, Stephen Grant Brown wrote: > Hi All, > > Joerg, sorry about the lack of information previously sent. > > When running gocr047.exe on a Windows Vista Home Premium machine with > Intel(R) Core(TM) 2 CPU 4400 @2.00GHz with 1.00 GB ram, a window opens up too > quickly for me to read and then gocr047 does not produce any thing else on > the screen.. Windows Task Manager does not show this program under the > applications tab. > > What can I do to install this program? > > Yours Sincerely Stephen Grant Brown > |
From: Stephen G. B. <s_g...@mc...> - 2009-07-02 02:53:30
|
Hi All, Joerg, sorry about the lack of information previously sent. When running gocr047.exe on a Windows Vista Home Premium machine with Intel(R) Core(TM) 2 CPU 4400 @2.00GHz with 1.00 GB ram, a window opens up too quickly for me to read and then gocr047 does not produce any thing else on the screen.. Windows Task Manager does not show this program under the applications tab. What can I do to install this program? Yours Sincerely Stephen Grant Brown |
From: Joerg <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2009-04-10 09:29:41
|
without -m 4 there is no layout analysis (means detection of columns, text blocks etc) except the line recognition, which is necessary for the detection process and can not switched off. If you need a better interface to gocr, feel free to make suggestions. I know that the current solution is just a developped of a first "5 minutes thinking" idea and was not the main focus of my work. joerg On Tue, 7 Apr 2009, Songhua Xu wrote: > Hello, > > I am developing a new OCR layout analysis algorithm and am currently > using GOCR as the backend recognition engine. To test the performance of > my new algorithm, I need to turn off all the layout analysis processings > in GOCR. > > Thus I wonder can any experienced user or developer here suggest how to > thoroughly turn off the layout analysis component in GOCR? I know using > the option "-m 4" will enforce the use of the layout analysis component. > But I don't know how to explicitly turn off all the layout analysis > steps. Your suggestions will be greatly appreciated! > > With best regards, > Yours sincerely, > Songhua > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > Jocr-devels mailing list > Joc...@li... > https://lists.sourceforge.net/lists/listinfo/jocr-devels > > |
From: Songhua Xu <son...@ya...> - 2009-04-07 21:39:39
|
Hello, I am developing a new OCR layout analysis algorithm and am currently using GOCR as the backend recognition engine. To test the performance of my new algorithm, I need to turn off all the layout analysis processings in GOCR. Thus I wonder can any experienced user or developer here suggest how to thoroughly turn off the layout analysis component in GOCR? I know using the option "-m 4" will enforce the use of the layout analysis component. But I don't know how to explicitly turn off all the layout analysis steps. Your suggestions will be greatly appreciated! With best regards, Yours sincerely, Songhua |
From: Igor F. [Contr] <ig...@he...> - 2009-03-31 15:11:49
|
Dear Joerg, Perhaps it would be possible to consider the following feature/update requests for some future release of GOCR: - INFINITY macro in unicode.h - it does indeed conflict with math.h! Should be easy enough to rename it to something unique? - "struct list" in list.h also conflicts with STL objects, easily solved by renaming it to "struct list_s". - global variable job_t *JOB - I understand it won't be easy to get rid of this one, but perhaps it can be added to the "TODO" list? - Sometimes libPgm2asc spits out warnings and errors to stderr, this is very unwelcome behavior. I would prefer the library to be silent on stderr. For me the only output I'm interested in is either recognized character(s) or "_" (as unrecognized character), any other side effects of running the library only get in the way. Best regards and thank you for the wonderful tool! Igor |
From: Igor F. [Contr] <ig...@he...> - 2009-03-30 22:23:24
|
Joerg, > using -fPIC and tell if it works, otherwise check all .o files if the code > is Position-Independent. I dont know how. May be using file or nm like: Ah, thanks - it works! Here is how I got it to compile: ./configure CPPFLAGS=-fPIC LDFLAGS=-fPIC make libs Would it be possible to add -fPIC automatically in case of "make libs"? I think it would make sense. Thank you, Igor |
From: Joerg S. <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2009-03-30 21:09:43
|
If I interpret the output right, the static lib was build successfully (cd src;make libPgm2asc.a). The shared lib failed only. See http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml?part=1&chap=3 its complicated even to me and the reason to throw it away on 0.46. It works on my PC (x86 32bit gcc-4.0). You should try compiling everything using -fPIC and tell if it works, otherwise check all .o files if the code is Position-Independent. I dont know how. May be using file or nm like: file src/pgm2asc.o src/pgm2asc.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped ~~~~~~~~~~~ looks ok to me. What do you get? Regards, Jrg. On Mon, 30 Mar 2009, Igor Filippov [Contr] wrote: > Dear Joerg, > > I've tried building the libPgm2asc.a library in 0.47 (by the way - thank > you for adding it back), but no success so far. Attached is the output > of "make libs". Am I missing something obvious? > > Thank you, > Igor > > > > On Sun, 2009-03-29 at 17:48 +0200, Joerg Schulenburg wrote: >> thanks for the report. Will be fixed in 0.47 >> >> Joerg >> >> On Wed, 25 Feb 2009, Mark Hammond wrote: >> >>>> When an image with no contrast is processed, the thresholding function >>>> in otsu.c divides by zero: >>> >>> ... >>> >>>> Attached is a test image that causes the problem and a proposed patch >>>> to fix it. >>> >>> I can confirm this; the spambayes project is using gocr to try and detect >>> image spam, and the gocr process crashed attempting to scan the attachments >>> to this message - which is the first time I've seen happen :) >>> >>> Cheers, >>> >>> Mark >>> >>> >>> ------------------------------------------------------------------------------ >>> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA >>> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise >>> -Strategies to boost innovation and cut costs with open source participation >>> -Receive a $600 discount off the registration fee with the source code: SFAD >>> http://p.sf.net/sfu/XcvMzF8H >>> _______________________________________________ >>> Jocr-devels mailing list >>> Joc...@li... >>> https://lists.sourceforge.net/lists/listinfo/jocr-devels >>> >>> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Jocr-devels mailing list >> Joc...@li... >> https://lists.sourceforge.net/lists/listinfo/jocr-devels > |
From: Igor F. [Contr] <ig...@he...> - 2009-03-30 19:55:45
|
Dear Joerg, I've tried building the libPgm2asc.a library in 0.47 (by the way - thank you for adding it back), but no success so far. Attached is the output of "make libs". Am I missing something obvious? Thank you, Igor On Sun, 2009-03-29 at 17:48 +0200, Joerg Schulenburg wrote: > thanks for the report. Will be fixed in 0.47 > > Joerg > > On Wed, 25 Feb 2009, Mark Hammond wrote: > > >> When an image with no contrast is processed, the thresholding function > >> in otsu.c divides by zero: > > > > ... > > > >> Attached is a test image that causes the problem and a proposed patch > >> to fix it. > > > > I can confirm this; the spambayes project is using gocr to try and detect > > image spam, and the gocr process crashed attempting to scan the attachments > > to this message - which is the first time I've seen happen :) > > > > Cheers, > > > > Mark > > > > > > ------------------------------------------------------------------------------ > > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > > -Strategies to boost innovation and cut costs with open source participation > > -Receive a $600 discount off the registration fee with the source code: SFAD > > http://p.sf.net/sfu/XcvMzF8H > > _______________________________________________ > > Jocr-devels mailing list > > Joc...@li... > > https://lists.sourceforge.net/lists/listinfo/jocr-devels > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Jocr-devels mailing list > Joc...@li... > https://lists.sourceforge.net/lists/listinfo/jocr-devels |
From: Joerg S. <Joerg.Schulenburg@URZ.Uni-Magdeburg.DE> - 2009-03-29 16:23:15
|
thanks for the report. Will be fixed in 0.47 Joerg On Wed, 25 Feb 2009, Mark Hammond wrote: >> When an image with no contrast is processed, the thresholding function >> in otsu.c divides by zero: > > ... > >> Attached is a test image that causes the problem and a proposed patch >> to fix it. > > I can confirm this; the spambayes project is using gocr to try and detect > image spam, and the gocr process crashed attempting to scan the attachments > to this message - which is the first time I've seen happen :) > > Cheers, > > Mark > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Jocr-devels mailing list > Joc...@li... > https://lists.sourceforge.net/lists/listinfo/jocr-devels > > |
From: Mark H. <mha...@sk...> - 2009-02-25 05:09:40
|
> When an image with no contrast is processed, the thresholding function > in otsu.c divides by zero: ... > Attached is a test image that causes the problem and a proposed patch > to fix it. I can confirm this; the spambayes project is using gocr to try and detect image spam, and the gocr process crashed attempting to scan the attachments to this message - which is the first time I've seen happen :) Cheers, Mark |
From: Heath N. C. <hnc...@cs...> - 2009-02-25 04:49:54
|
Hello, When an image with no contrast is processed, the thresholding function in otsu.c divides by zero: ----- hncaldwell@grey:/tmp $ gocr bad.ppm # thresholdValue out of range 155..155, reset to 155 Floating point exception ----- Attached is a test image that causes the problem and a proposed patch to fix it. -- Heath Caldwell - hnc...@cs... Operating Systems Analyst - California State Polytechnic University, Pomona |