Error when web crawling Solaris directory
An open source search engine with RESTFul API and crawlers
Brought to you by:
emmanuel_keller
I have an issue when crawling office 2013 files as part of a web crawl. The text analyzer returns the underlying ascii text as the content field, rather than human-readable content.
I've done a little investigation, and this problem doesn't occur on file crawls, and indeed only seems to happen on the deployment platform, a Solaris machine.
I have OSS v1.5.3 deployed on Solaris 10 (x64) running Java 1.7.0-b147. The file repository is hosted by Apache2 web server. The same OSS runs fine on a linux mint VM unless the file repository is on Solaris. Likewise, OSS deployed on Solaris doesn't experience the problem with a file repository hosted on Linux or Windows
Best wishes