There is an article in the current Linux Journal
on jsfind, "a small JavaScript program of about 500 lines" that works as an indexer.  The author intends its use for CD-ROM searches but it is open source.  It also works with the aforementioned SWISH-E.


At 05:26 PM 11/11/2003 -0500, Eric Lease Morgan wrote:
On 11/11/03 12:10 PM, Walter Lewis <> wrote:

> I'm looking for an indexing engine for a set of projects that crosses the line
> between structured and unstructured documents (stored in a database in a
> combination of char and text fields.  The projects will probably be built in
> PHP, although that is still up for negotiation. The full text searching
> functions emerging in MySQL are one option; those in Postgres are a second.  I
> found lots of references to Eric Lease Morgan's

While I do not think it has a PHP interface, I would advocate the use of

SWISH-E is a single binary that indexes as well as searches. It can be run
from the command line or through a library/CGI script. It runs on Windows as
well as Unix. It comes with C and Perl API's. The Perl API is object
oriented. It can index structured data like HTML files, XML files, as well
as streams of text from databases or plain o' text files. Using a "helper"
application, it can index things like Word, PDF, and image files. The
indexes it creates can be merged with other indexes it creates. It can
search multiple indexes simultaneously. It supports field searching,
freetext searching, phrase searching, Boolean searching, nested queries,
soundex searching, user-defined sorting of results as well as relevance
ranking. It builds really easily. By taking advantage of various
command-line switches, a dictionary of terms can be created ultimately used
to implement automatic spelling corrections a la the Google "Did you mean?"
service. Since it is more of a toolkit as opposed to an application, it is
easy to examine the incoming query, munge it if necessary to improve
retrieval, and return results.

There are only two things it does not do that I wish it did. First, it does
not index Unix mbox files very well, but I can get the same functionality by
first creating a Hypermail archive and indexing that. Second, it is
difficult to return part of a document and/or highlight search terms in
search results.

I endorse SWISH-E. It is an unsung hero.

Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604

This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more!
see also

Edward Iglesias                                                                 email:
Technical Services Librarian/ILS Coordinator                                            phone: 504.864.7838
J. Edgar and Louise S. Monroe Library
Loyola University New Orleans