Re: [Jocr-devels] urgent OCR project idea with potential cash support
Status: Alpha
Brought to you by:
joerg10
|
From: Joerg <Joe...@UR...> - 2006-11-01 23:30:20
|
The resolution of 14509.gif is to low. I cant even read it without guessing. May be you did some conversion to gif including lowering of resolution? Parsing can be done for such kind of forms, but I dont know If I can find time to make it in few days. Joerg. On Tue, 31 Oct 2006, Matt Williamson wrote: > OCR has role to play in current US election cycle. > There is serious concern that at least one Senatorial > Candidate (Joe Lieberman) has campaign expenditures > that *might* be fraudulent. The data to show that it > is at minimum unusual are publicly available and filed > on standard forms but in scanned .pdf or gif images > only. (An example of the type of form I would like to > parse and scan is here > http://images.nictusa.com/showimg/14509.gif). Given > that there are thousands of pages, the effort to > process these forms, extract the data into an > analyzable database in an unbiased fashion is all but > impossiblqe to accomplish manually at this point. Thus > my hope for OCR. > > I would like to be able to pull out data from the form > fields and compile it into a database. (The end > result would be to compare line level expenditures of > Senate candidates. At least one candidate has what > looks to be a very unusual pattern). Anyone have any > recommendations? > > Any recommendations you have for using GOCR or > any other product, commercial or otherwise, would be > greatly appreciated. > > I'm sure I could arrange for significant financial > support for the project if this job could be done in a > couple days. Tall order I know, but it would > definitely create giant media buzz in the US and > around the world for GOCR. > > Sincerely, > > Matt Williamson |