On behalf of Bob Coret:
I'd like to mention Gaenovium to you: a genealogy technology conference,
by genealogy technologists for genealogy technologists. It's an
international conference in Leiden, organised by Bob Coret and Tamura Jones.
I think it might be of interest to you! More info is available at
> I've been wondering recently if anyoen is working on semi-automatic
> extraction of significant works using OCR for genealogy purposes. For
> example I have piles of letters, some of them typed, that my
> grandparents have about their genealogy. I do not have time to read
> them all. But if they could be scanned and 'intelligently' searched
> for recurring keywords then that would be a great help.
To tell you the truth, I'm quite disappointed by the lack of progress in
that area. There's all sorts of programs that can help you organize
pictures, recognizing faces, like Picasa, mostly on-line today, but the
next step, i.e. reading texts and enable searching on that, doesn't seem
to be made much. I know one exception, and that's Microsoft OneDrive,
which does do OCR on the scans that I have there. They are not indexed
however, I think.
I would love to have a folder where I can store scanned documents and
have them scanned for text like media player scans and organizes my
music, preferably on Linux, because that is my genealogical workbench
today, but instead the only progress that I see is that our desktop
software gets more sophisticated data models, where Gramps is ahead of
most of the competition, but still sort of forces me to have my data
confined in database cells. And that's kind of weird, because the
technology to access data that lives outside the program is there.
That's what media players and photo albums use too, reading meta data
from the actual files, and rescanning when appropriate. And that should
be easy with open document formats too, don't you think?
One of my fantasies is to have a source book in LibreOffice, which
allows me to format sources any way I like, paste them from web pages,
and so forth, in which I can use headers and styles that are recognized
by Gramps as source titles, authors, etc. I like that, because I feel
way more comfortable paging through a large text than clicking from one
citation to another. And when I can have things that way, I'd like to
put links to persons in there too, so that indi: shows a person in
Gramps just like http: shows a web page in Firefox. One could think
about a similar thing in Evernote, which uses an open document format
too. It's all XML.
This lack of progress, or lack of fantasy maybe, is the primary reason
why I won't attend the Gaenovium conference myself. I know there will be
smart people talking there, but each of their works are still sort of
confined to a single domain, where I think we should recognize that
we're part of an ecosystem.
Come to think of it. RootsMagic is in my ecosystem too, and it has a
find everything tool. Wouldn't that be nice in Gramps?