Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
Atera all-in-one platform IT management software with AI agents
Ideal for internal IT departments or managed service providers (MSPs)
Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
Fennel is a library of data storage and processing components written in C++. It is developed as a sub-project of The Eigenbase Project, and also serves as a substrate for the Farrago project.
XAnswer project is aimed to coordinate efforts in development of different components of XQuery processing engines by means of "standard" API, specifications and protocols.
Functional XML parsing framework: SAX/DOM and SXML parsers with
support for XML Namespaces and validation. Related to SSAX are SXPath
queries and SXML transformations, with applications to XML/HTML
authoring and literate Scheme and XML programming.
"distribution" is a message and data processing tool. It allows to process information through a graph of processors. It may be used to build mailing lists, fax gateways, email filters, PDF mailing combinators, report systems and many other processes
Linux based e-commerce server. Written entirely in perl, PostgreSQL database backend, page templates with HTML::Mason and image processing with ImageMagick.
XDBP is a daemon application enabling clients to make database queries without using RDBMS or operating system specific drivers and libriaries. The system works by processing custom XML data sent through sockets into the specified database calls.
Project MOVED to Codehaus New location is http://esper.codehaus.org The Esper project aims to provide a general-purpose event processing facility in Java for complex event processing of real-time, high throughput data streams.
It's a modern take on desktop management that can be scaled as per organizational needs.
Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
Lightweight system for running a weblog. Features multiple authors, topics, Trackback, RSS (amongst others). TruBlog comes with easy installation and strong caching mechanisms, it's localisable and produces a valid XHTML. Theming is done through CSS.
TM4J is a topic map engine implemented entirely in Java. Topic maps are a standard paradigm for the interchange of knowledge structures. This project aims to produce a complete suite of tools for creating, processing and publishing topic map information.
dbXML is a Native XML Database (NXD). NXDs are databases that store XML using an internalized format for faster overall processing. dbXML was developed using the Java 2 Standard Edition version 1.4.
PHP class that creates a Web Form for a database table and handles all processing of the form including validation of data, inserts, updates and deletes. Very customisable and flexible. Uses ADODB to connect to databases and simple Templates.
Project to create a unified FAQ XML format with all applicable software to convert it to various formats, such as multiple forms of HTML, TeX, PDF, text files, etc. Useful for most of "FAQ keepers" on various forums and discussion lists.
You may manage your bibliography with this tool: export to BibTeX and HTML, shortcut for citation in LaTeX-code, internationalization... Due to lack of time this project is stalled, please see JabRef on http://jabref.sourceforge.net/.
The most powerful non-commercial translation memory software (TM tool) with enhanced capabilities, like networking/collaboration (http, rpc), encoding conversion, project management capabilities, email capability with attachments, file tree diff etc.
Talisman is an interpreter for a logical markup language. This language contains the content and logic of a web (or, in the future, Java Swing) based user interface, including arbitrary datatypes and processing actions.
UMDS is small, strong and expansible project. The model offers a uniform method of conceptual view of the diverse data as a sequence of bits and a uniform method of keeping and processing
this sequence on the external storage.
MacBibTk is a Mac compatible version of Peter Corke's tkbibtex (release 9), a BibTeX file editor and browser. BibTeX is a reference/citation system for use with LaTeX. MacBibTk runs on all platforms with Tcl/Tk ports.
This is a sample implementation of the TransQuery processing model, enabling the use of XSLT as a query language over multiple XML documents. See <a href="http://www.xmlportfolio.com/transquery">http://www.xmlportfolio.com/transquery</a> for more info.