Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


Question on msword parsing and Nutch/Lucene

  • Ahmed Hammad
    Ahmed Hammad


    I'm currently working with Nutch (a search engine based on Lucene) and I'm using it to parse and index Word Documents. There are a few limitations in Lucene with document parsing, so I thought I'd look for alternatives, which lead me to Aperture.

    I know that Aperture can extract MS Word document meta tags. What tags does it refer to exactly? If I go into File>Properties and click the Custom tab in any Word Document, you can add custom properties in there. Does Aperture have the ability to read those custom properties, or does it stick to the standard stuff in the Summary tab ie Title, Subject, Author etc...

    Also, can Aperture be easily integrated with Lucene?

    Thanks a lot for the help.