Eye needs more work!
In order to advance the software's recognition capabilities, I want to test and train Eye on actual input.
You can send us scans that you actually want to be recognized, and we can get back to you with the recognized text.
You can also send us some material just to help improve Eye. I'll be grateful!
As for the content, everything with text in it is fine: Primarily I am thinking of scanned documents of any kind. Screenshots with text in it are also good.
Because if I grab things off the web, they will typically be copyright-protected (or as I prefer to call it: "copyright-infected"). Copyright creates a stupid web of uncertainty that I have no intention of getting close to. (I believe that copyright will vanish from the face of earth soon anyway.)
So please make sure that your material is copyright-free or that you are confident that copyright is not an issue.
To submit material, send me a mail to firstname.lastname@example.org. Or post a reply in this forum with a link to your image(s). Please also state whether the material can be made public.
(Publicizable material is of course the best because it contributes to a public set of training data for Eye.)
Thanks everyone and have a great day :)
http://history.dcs.ed.ac.uk/archive/scans/ has a lot of material and there are no copyright problems. (And i have more that's not online, and more that is still to be scanned)
I've emailed you separately as well with an idea to improve the recognition of this specific type of document.
Hi Graham, thanks, that is good stuff. Why are you sure that it is copyright-free? Because of the material's age?
When we started the project some years ago, we went through the official University channels and got an assurance from the highest level that we could publish any of the Edinburgh University software from the 70's. Also we have had many files released from old personal archives for people who have signed a release form. It's not going to be an issue.
Good work on that dem on your home page showing the page segmented into a grid to isolate the text from the fixed-pitch listing! That was quick!
Alright, it really looks like you got your base covered there!
Regarding the original post: I think right now I've probably got more than enough material to work with. However, anyone so inclined is of course still free to post or send any images they would like to get OCR'd. Or just go ahead and tell us about your experiences with Eye. How does it perform on your images?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.