I have an issue when crawling office 2013 files as part of a web crawl. The text analyzer returns the underlying ascii text as the content field, rather than human-readable content.
I've done a little investigation, and this problem doesn't occur on file crawls, and indeed only seems to happen on the deployment platform, a Solaris machine.
I have OSS v1.5.3 deployed on Solaris 10 (x64) running Java 1.7.0-b147. The file repository is hosted by Apache2 web server. The same OSS runs fine on a linux mint VM unless the file repository is on Solaris. Likewise, OSS deployed on Solaris doesn't experience the problem with a file repository hosted on Linux or Windows