Menu

IndexingAttachments

Ulf Dittmer

Building a JForum version that can index and search most document types

Up to version 2.7.0, JForum used the Apache Tika library to index attachments. But Tika has a large number of dependencies which just about tripled the size of the war file, so that was changed to only index text and PDF documents. If you have many attachments in structured file formats (like Microsoft Office or OpenDocument) you can build a JForum version that uses Tika. Two things need to be done for this:

  • You need to remove the PDFBox dependency from pom.xml, and add the Tika dependency. Here is the source code diff that shows what needs to be done.

  • And you need to change the createDocument method in the LuceneIndexer class and adjust some import statements according to this diff.

Then you can build the war file by running "mvn package".


Related

Wiki: Documentation

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.