Asyncio-based Python framework for building fast web crawling spiders
dude uncomplicated data extraction: A simple framework
The Saxon XSLT and XQuery processor, developed by Saxonica
ML-based HTML scraper that learns extraction rules from examples
Dynamically generate documents from templates
Built for enterprise level and highly customized websites
Apache LDAP Directory Manager
Information Manager(split/analyze/compare/combine).
Framework for search and display of heterogenous document collections.
Monitoring of websites with spider and email notifications
Online gallery factory
Data migration/conversion library based on STX and XSLT transformation