Re: [Hebmorph-thinktank] hebmorph searching the bible
Status: Pre-Alpha
Brought to you by:
synhershko
|
From: Efraim F. <efr...@gm...> - 2011-06-10 12:55:23
|
Hi, On 06/10/2011 03:49 AM, Itamar Syn-Hershko wrote: > > On 10/06/2011 04:42, Efraim Feinstein wrote: > >> I have a demo set up at >> http://shell.jewishliturgy.org:8080/code/apps/builder, where you can see >> hebmorph searching the Westminster Leningrad Codex (Tanach). > Nice work! (although it would be nice to wave the requirement of > username/password for searches) Sorry about the login requirement. I know it's annoying. The purpose of the app isn't actually search. > >> One thing I notice on the first pass: >> hebmorph searches sometimes return large numbers of results that do not >> contain the search term or anything resembling it. In my code, I filter >> them out. I'm not sure whether this is a bug/feature/unknown side-effect >> in eXist, Lucene, or hebmorph. > Can you give some examples? I'll dig up some unfiltered examples over the weekend. As I said, the interface I'm using is eXist, so I'm not sure exactly where the extraneous results are coming from. What would be useful to help debug it? > > Generally speaking, HebMorph's strength is with modern texts. Tanach > is usually using a bit of a different Hebrew that may not be supported > well by the dictionary at its base. This can all be tuned of course, > but it may be the issue that you're seeing. I was actually pleasantly surprised at how *well* it works with Biblical Hebrew, considering that it is based on modern Hebrew spelling and grammar. It certainly does much better than any other analyzers. What would it take to add to the dictionary? Although I don't have lots of time to work on this, we do have a reasonably complete public domain biblical dictionary (that is, word list + parts of speech). It wouldn't help with the unique biblical grammatical forms, non-Academia spelling, or Aramaic, but it could get us a bit farther along. Thanks, -- --- Efraim Feinstein Lead Developer Open Siddur Project http://opensiddur.net http://wiki.jewishliturgy.org |