From: Wolfgang M. <wol...@ex...> - 2011-08-06 08:34:37
|
Hi, I noticed the same issue of whitespace being lost. An update would be great. Wolfgang Am 06.08.2011 06:16 schrieb "Joe Wicentowski" <jo...@gm...>: > Hi all, > > Tika 0.8 suffers from a well-documented problem, in which spaces are > stripped from PDF content [1]. This problem was fixed in 0.9 [2], > which was released back in February. Since eXist 1.5dev trunk > contains 0.8, I experienced this problem where extracted PDF text had > no spaces. When I replaced the 0.8 jar with 0.9, the problem was > gone. I was using content:get-metadata-and-content() to extract text > from a PDF file stored in the db. I experienced the problem with both > PDFs I tested under 0.8. > > I'd suggest that we update to 0.9. I'd be happy to update trunk if > the core devs give me the heads up, if it would help [3], but perhaps > there are other considerations here I'm not aware of. Thoughts? > > Cheers, > Joe > > [1] https://issues.apache.org/jira/browse/TIKA-548 > [2] http://www.apache.org/dist/tika/CHANGES-0.9.txt > [3] http://tika.apache.org/download.html > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos & much more. Register early & save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > Exist-development mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-development |