[Exist-development] Suggestion to update to tika-app-0.9.jar

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

Tika 0.8 suffers from a well-documented problem, in which spaces are
stripped from PDF content [1].  This problem was fixed in 0.9 [2],
which was released back in February.  Since eXist 1.5dev trunk
contains 0.8, I experienced this problem where extracted PDF text had
no spaces.  When I replaced the 0.8 jar with 0.9, the problem was
gone.  I was using content:get-metadata-and-content() to extract text
from a PDF file stored in the db.  I experienced the problem with both
PDFs I tested under 0.8.

I'd suggest that we update to 0.9.  I'd be happy to update trunk if
the core devs give me the heads up, if it would help [3], but perhaps
there are other considerations here I'm not aware of.  Thoughts?

Cheers,
Joe

[1] https://issues.apache.org/jira/browse/TIKA-548
[2] http://www.apache.org/dist/tika/CHANGES-0.9.txt
[3] http://tika.apache.org/download.html

[Exist-development] Suggestion to update to tika-app-0.9.jar

eXist-db is a feature rich Open Source native XML database

[Exist-development] Suggestion to update to tika-app-0.9.jar