Is it possible to pipe the output from a crawler to another application (e.g. Solr) rather than writing to a file? For example, when crawling a website I want to pipe each retrieved document directly to a Solr indexing process rather than writing it to a file and then having to get Solr to read these files.
Yes, you'd have to implement your own crawler handler which would get the data objects, convert the metadata to SolrDocuments and push them to a Solr instance. There is a DrupalCrawlerHandler in the aperture webserver module, which pushes the content from Aperture to drupal. You might have a look at it for ideas.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.