Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Question on msword parsing and Nutch/Lucene

2009-01-29
2013-05-13
  • Ahmed Hammad
    Ahmed Hammad
    2009-01-29

    Hello,

    I'm currently working with Nutch (a search engine based on Lucene) and I'm using it to parse and index Word Documents. There are a few limitations in Lucene with document parsing, so I thought I'd look for alternatives, which lead me to Aperture.

    I know that Aperture can extract MS Word document meta tags. What tags does it refer to exactly? If I go into File>Properties and click the Custom tab in any Word Document, you can add custom properties in there. Does Aperture have the ability to read those custom properties, or does it stick to the standard stuff in the Summary tab ie Title, Subject, Author etc...

    Also, can Aperture be easily integrated with Lucene?

    Thanks a lot for the help.

    Cheers