Have you looked at this JIRA ticket?
This contains the solution we use at Villanova. I plan on porting it to 2.0 in the fairly near future (search refactoring is nearly done, so porting VU customizations is the next step).
When I port this, I will likely change the schema somewhat; I think it would benefit from using dynamic fields in place of some of the current hard-coded ones.
In my opinion, the main weakness here is the indexing tool, which was designed to be simple rather than robust: it simply captures sitemap.xml files and XSL-transforms them into Solr documents, a slow and memory-intensive operation (though at the scale of Villanova's
websites, it works just fine). We handle deleted with the help of timestamps in the index (every re-index updates every record; anything left after the process that wasn't touched during indexing gets deleted based on its timestamp).
From: Tod Olson [email@example.com]
Sent: Tuesday, March 26, 2013 8:04 PM
To: firstname.lastname@example.org Tech
Subject: [VuFind-Tech] Website spidering
Are other sites integrating website searching into their VuFind installations?
I need to arrange website for our upcoming VuFind implementation, so I'm looking for any advice on what has worked well (or even not so well) for others. Anything you would care to say about spidering software, indexing choices, and method of integration
with VuFind and Solr would be appreciated.
University of Chicago Library