Error when web crawling Solaris directory

An open source search engine with RESTFul API and crawlers

Brought to you by: emmanuel_keller

#195 Error when web crawling Solaris directory

Milestone: v1.4

Status: open

Owner: nobody

Labels: None

Priority: 1

Updated: 2014-07-04

Created: 2014-07-04

Creator: Tony Dixon

Private: No

I have an issue when crawling office 2013 files as part of a web crawl. The text analyzer returns the underlying ascii text as the content field, rather than human-readable content.

I've done a little investigation, and this problem doesn't occur on file crawls, and indeed only seems to happen on the deployment platform, a Solaris machine.

I have OSS v1.5.3 deployed on Solaris 10 (x64) running Java 1.7.0-b147. The file repository is hosted by Apache2 web server. The same OSS runs fine on a linux mint VM unless the file repository is on Solaris. Likewise, OSS deployed on Solaris doesn't experience the problem with a file repository hosted on Linux or Windows

Best wishes

Error when web crawling Solaris directory

An open source search engine with RESTFul API and crawlers

Group

Searches

Help

#195 Error when web crawling Solaris directory

Discussion