Menu

#44 New Lucene indexer

open
nobody
None
5
2005-12-06
2005-12-06
Steve Pugh
No

This patch (optionally) replaces the default lucene
indexer with a new system. This is to solve various
issues we had, such as items not being removed from the
correctly when unpublished, and non-stable links
appearing in the index. In addition, we found that the
index did not repair itself once it went wrong - one
could only resolve issues by doing a fullre-index.

In order for the patch to work, a number of changes are
required to ccm-core. These changes define a new
parameter "waf.lucene.m_index_synchronizer" which
allows the user to change which indexer he would like
to use. After the patch to ccm-core is applied, there
should be no visible change to the operation of aplaws
unless this parameter is changed.

If you want to then use a new indexer, you can use the
attached example "IndexerNew" which I have put in
ccm-ldn-search.
By setting
waf.lucene.m_index_synchronizer=com.arsdigita.london.search.lucene.IndexerNew,
aplaws will now use this new indexer.

The new indexer works by comparing the lucene index
with the contents of database.

1. any records in the lucene index but no longer in the
database are deleted from the index.
2. any records in the database but not in the index are
added to the index.

This ensures that the database and lucene index are
synchronized at all times. It also means that
lucene_docs table is no longer necessary, and problems
of trying to synchronize the indexes on multiple aplaws
instances are neatly avoided. There is a possible
downside in that this method may be more
computationally intensive, but this can be mitigated by
setting the waf.lucene.interval to a greater number.

Note the new indexer also stores the stable link id of
live documents. This parameter can be used to the
search front-end to ensure only stable links are used.

At Bristol, we have been using this indexer for some
weeks and how found it to be a great improvement.

Discussion

  • Steve Pugh

    Steve Pugh - 2005-12-06

    ccm-core patch

     
  • Steve Pugh

    Steve Pugh - 2005-12-06

    Logged In: YES
    user_id=1271522

    See also new indexer (attached)
    suggest to put in:
    ccm-ldn-search/src/com/arsdigita/london/search/lucene/

     
  • Steve Pugh

    Steve Pugh - 2005-12-23

    new lucene indexer

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.