Aperture can index by URL or from the local file system.  My current proof of concept in VuFind is using Aperture’s webcrawler tool to pull down content, but it’s a very minor change to use the local file reader instead if you need to.


I don’t currently have plans on adding PDF archiving to VuFind as a generic feature – my perspective on VuFind is that it is primarily an indexing and searching tool, and file management is better left to content management systems.  However, if you wanted to do something like this as part of an XSLT indexing process, it wouldn’t be too hard – just create a custom function that harvests the PDF to a web-accessible directory, then drop the resulting URL into the Solr index.


- Demian


From: mikan.d.dspace listmail [mailto:mikan.dspace@gmail.com]
Sent: Tuesday, October 26, 2010 3:11 AM
To: Demian Katz
Cc: vufind-tech@lists.sourceforge.net
Subject: Re: [VuFind-Tech] Full text indexing with Aperture


Hi Demian,

This sounds very interesting indeed. Can Aperture index full text files via URL or should they be locally present? And if full-texts are indexed indeed, why not offer the ability to store the actual pdf - file in VuFind as well? Any plans to make this happen? :)

Good Work!

2010/10/25 Demian Katz <demian.katz@villanova.edu>



Just a quick update – I have just built upon my XSLT work from last week by integrating the Java Aperture library with VuFind.  This makes it possible to harvest documents like PDFs or Word files and extract their text contents directly into the Solr index.  It was easier to get it working than I expected, though I did run into one apparent bug in Aperture’s shell scripts under Linux!  See notes here:




It may be useful to do something similar for SolrMarc-based imports – see http://vufind.org/jira/browse/VUFIND-274 for details.


Let me know if you have questions about this – I’m sure if anyone starts using this in earnest, we’ll need to make some further adjustments for improved stability…  but as a proof of concept, it seems to work quite nicely!


- Demian

Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
Vufind-tech mailing list