From: Marcus <stm...@gm...> - 2010-03-24 09:34:37
|
Hi, i read that lucene is somehow able to build not only indexes on the db, but also on external files. Has anyone experience with this? And what about PDF Files stored to exist, would the lucene index also be able to index the pdf files there as well? In my project the customer hast a small amount of PDFs that he wants to upload and be integrated to the full-text search. Would be great, if exist would be able to do that. Thanks, Marcus |
From: Wolfgang M. <wol...@ex...> - 2010-03-24 10:14:33
|
Hi, > And what about PDF Files stored to exist, would the lucene index also be > able to index the pdf files there as well? This is not implemented yet, though it could be added easily. We would just need a trigger to update the index and another XQuery function to handle the searching. > In my project the customer hast a small amount of PDFs that he wants to > upload and be integrated to the full-text search. Would be great, if > exist would be able to do that. Well, if you need the feature short term, maybe you could convince the customer to sponsor it. I'll be happy to work out an estimate, but I guess the whole feature can be implemented in one or two days. Wolfgang |
From: Marcus <stm...@gm...> - 2010-03-24 11:08:34
|
Thanks for the quick response. >> And what about PDF Files stored to exist, would the lucene index also be >> able to index the pdf files there as well? >> > This is not implemented yet, though it could be added easily. We would > just need a trigger to update the index and another XQuery function to > handle the searching >> In my project the customer hast a small amount of PDFs that he wants to >> upload and be integrated to the full-text search. Would be great, if >> exist would be able to do that. >> > Well, if you need the feature short term, maybe you could convince the > customer to sponsor it. I'll be happy to work out an estimate, but I > guess the whole feature can be implemented in one or two days. > We may need it in during the next 3-6 months. There might be two ways. 1. How much do you think will this cost, so i can ask them about extra money. Or if i might use the feature then as well for another project i may sponser it myself then!? 2. The other way would be, that i implement it myself, whereas this might take a bit longer than 2 days, while i'm not so confirm in den exist details, but i'm not afraid to try it, if someone could give me some starting points in the code. And of course i can provide that feature than to you for free as well. So i would guess, depending on the costs, i can try first and if i don't get it to work, the sponsoring would be the way to go then. So what do you think about my idea and how much for the sponsoring. Greets, Marcus |
From: Roy W. <gar...@ya...> - 2010-03-24 11:42:48
|
Marcus wrote: > Thanks for the quick response. > >>> And what about PDF Files stored to exist, would the lucene index also be >>> able to index the pdf files there as well? >>> >>> >> This is not implemented yet, though it could be added easily. We would >> just need a trigger to update the index and another XQuery function to >> handle the searching >> >>> In my project the customer hast a small amount of PDFs that he wants to >>> upload and be integrated to the full-text search. Would be great, if >>> exist would be able to do that. >>> >>> >> Well, if you need the feature short term, maybe you could convince the >> customer to sponsor it. I'll be happy to work out an estimate, but I >> guess the whole feature can be implemented in one or two days. >> >> > We may need it in during the next 3-6 months. There might be two ways. > 1. How much do you think will this cost, so i can ask them about extra > money. > Or if i might use the feature then as well for another project i may > sponser it myself then!? > 2. The other way would be, that i implement it myself, whereas this > might take a bit longer than 2 days, while i'm not so confirm in den > exist details, but i'm not afraid to try it, if someone could give me > some starting points in the code. And of course i can provide that > feature than to you for free as well. > > So i would guess, depending on the costs, i can try first and if i don't > get it to work, > the sponsoring would be the way to go then. So what do you think about > my idea and how much for the sponsoring. > > Greets, Marcus I'm interested in this too so would be interested to see the numbers. -- Roy |
From: Wolfgang M. <wol...@ex...> - 2010-03-24 12:10:48
|
> We may need it in during the next 3-6 months. There might be two ways. > 1. How much do you think will this cost, so i can ask them about extra > money. I'll send you and Roy an estimation in a private email. > 2. The other way would be, that i implement it myself, whereas this might > take a bit longer than 2 days, while i'm not so confirm in den exist > details, but i'm not afraid to try it, if someone could give me some > starting points in the code. And of course i can provide that feature than > to you for free as well. Sure. The necessary tasks as I see them right now would be: * create a "lucene" trigger which waits for binary resources being added to the db * integrate PDFBox to extract the text of the PDF and pass it to Lucene's indexer * create a query function based on the existing classes in eXist's Lucene module This should all be integrated with eXist's Lucene index module, so we can reuse the core classes to handle concurrency and the like. Wolfgang |
From: Evgeny G. <gaz...@gm...> - 2010-03-24 12:15:52
|
What about plain text also? ----- Evgeny |
From: Wolfgang M. <wol...@ex...> - 2010-03-24 15:27:09
|
> What about plain text also? Sure, plain text can be handled by the same code (without the PDF step). Wolfgang |