Have you looked at this JIRA ticket?

http://vufind.org/jira/browse/VUFIND-454

This contains the solution we use at Villanova.  I plan on porting it to 2.0 in the fairly near future (search refactoring is nearly done, so porting VU customizations is the next step).

When I port this, I will likely change the schema somewhat; I think it would benefit from using dynamic fields in place of some of the current hard-coded ones.

In my opinion, the main weakness here is the indexing tool, which was designed to be simple rather than robust: it simply captures sitemap.xml files and XSL-transforms them into Solr documents, a slow and memory-intensive operation (though at the scale of Villanova's websites, it works just fine).  We handle deleted with the help of timestamps in the index (every re-index updates every record; anything left after the process that wasn't touched during indexing gets deleted based on its timestamp).

- Demian

From: Tod Olson [tod@uchicago.edu]
Sent: Tuesday, March 26, 2013 8:04 PM
To: vufind-tech@lists.sourceforge.net Tech
Subject: [VuFind-Tech] Website spidering

Are other sites integrating website searching into their VuFind installations?

I need to arrange website for our upcoming VuFind implementation, so I'm looking for any advice on what has worked well (or even not so well) for others. Anything you would care to say about spidering software, indexing choices, and method of integration with VuFind and Solr would be appreciated.

-Tod


Tod Olson <tod@uchicago.edu>
Systems Librarian     
University of Chicago Library