How do I index additional fields in HTML? For example, I would like to include the <title> in the index. I have not been able to get the the JTidyHtmlIndexer to do this at all. I was able to get NekoHtmlIndexer to do this with the following:
<luceneField name="fullText" xpathSelect="//*" type="Text" ocurSep="|" />
<luceneField name="title" xpathSelect="//TITLE" type="Text" ocurSep="|" />
But this only works if the <title> tag is not namespace qualified in the source document. Some of my HTML files explicitly declare a namespace.
In this, case the xpathSelect="//TITLE" is incorrect because it does not specify a namespace. I tried
<luceneField name="title" xpathSelect="//xhtml:TITLE" type="Text" ocurSep="|" />
This does not work because there is no declaration for the xhtml namespace prefix. It debugging through the code, I found only namespaces declared in the INDEXED document are passed to Jaxen. But xhtml namespace needs to be defined in the LiusConfig.xml document for the xpathSelect statement to always work.
I also tried
<luceneField name="title" xpathSelect="//*[local-name(.)='TITLE'" type="Text" ocurSep="|" />
which should work regardless of the namespace of the indexed document. However, I get errors that "ERROR [main] (XmlFileIndexer.java:148) - Function :local-name".
Your help is much appreciated.
Log in to post a comment.