Piping output from a crawler to another app

Scotch Egg
  • Scotch Egg

    Scotch Egg - 2011-03-04

    Is it possible to pipe the output from a crawler to another application (e.g. Solr) rather than writing to a file? For example, when crawling a website I want to pipe each retrieved document directly to a Solr indexing process rather than writing it to a file and then having to get Solr to read these files.

  • Antoni Mylka

    Antoni Mylka - 2011-03-19

    Yes, you'd have to implement your own crawler handler which would get the data objects, convert the metadata to SolrDocuments and push them to a Solr instance. There is a DrupalCrawlerHandler in the aperture webserver module, which pushes the content from Aperture to drupal. You might have a look at it for ideas.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks