Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
Atera all-in-one platform IT management software with AI agents
Ideal for internal IT departments or managed service providers (MSPs)
Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
netcdf-tools is a set of tools for creating netCDF files. It supports command line use on both Windows and Unix as well as use directly as a Java library. Written by CSIRO Australia and funded by the ANDS Australian Research Data Commons Project
SPIDR (Space Physics Interactive Data Resource) is a distributed database and application server network, built to select, visualize and model historical space weather data. SPIDR is a web-application and a grid of data mining web-services.
A Java API binding of the IETF Mime-Dir and vCard RFC's. This package defines implementation neutral Mime-Dir and vCard Java interfaces for exposing data objects corresponding to those defined by the RFC's.
The gateway is an open source JavaEE application developed by the Vermont Dept of Taxes. It provides a web services framework for accepting Streamlined Sales Tax registrations and returns. It also includes a web interface for submitting transmissions.
A toy XML-aware (but otherwise generic and extensible) content management system demonstrating how to do sophisticated management of versioned hyperdocuments with a focus on issues of import and export of compound documents (e.g., XInclude-based).
The Wicket Stuff project makes third party components available using the Wicket web component framework. Subprojects of this project contain integrations for Spring, Groovy, Hibernate, Velocity and other popular Java open source projects.
The SSAF ("Secure Search And Forwards") is a dirt-simple standalone web app for inexpensive and secure information sharing. Any uploaded record may be forwarded to an intended destination, and may also be stashed in a searchable repository.
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.
This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
Memomics Forge is a meta-project for software that utilizes the Memomics Semantic Service.
Memomics Semantic Service provides semantic data which can be embedded in applications via webservices.
The aim of the tool is to validate a particular format of metadata. Specifically, the tool checks three parts: 1. Big5 character encoding; 2. whether it is a well-formed XML document ; 3. other specifications with our own purposes.
XML carver which can carve damaged or non-standard XML out of any file. It rebuilds the XML tree, along with giving the offsets for all the carved XML data. This tool was developed for the DFRWS 2010 Forensics Challenge.
ExtractData is a program which scans your files, looking for specific types of data to isolate and extract. Extract lists of email addresses, person names, addresses and other kinds of data types from multiple files in a single pass.
Raken, web service controller, is based on JSON as a data definition language. it utilizes simple protocol, optional asynchronous interaction, recovery, localization, security, caching, batch, and multi-part messaging.
A GUI-based text annotation tool for creating and visualizing annotations. It uses a flexible stand-off XML data format, and has advanced and customizable methods for information and relation visualization.
The JotScale project is a cluster of servers implementing a highly scalable object storage system combined with a high performance http server and a subscription system. Jotscale now depends upon Mojasi (same trunk). Pure Java. Source only.
primeHandle is a set of data management tools in support of the PrIMe Initiative (http://primekinetics.org). primeHandle includes graphical user interfaces for searching through data collections, editing and submitting data to the PrIMe Data Depository.
Angur is a XML visualization utility which helps users visualize XML files in node graphs as well as generating XML visually, without any XML knowledge.
TeXGrapher is an application for drawing graphs and exporting them into code for GraphTeX, which is a package for LaTeX that provides macros for specifying drawings within LaTeX source file.
Defuddle is a data translation engine that supports mapping arbitrary ASCII and binary file formats to a data model defined in XML Schema in a manner similar to, but not compliant with the Data Format Description Language (http://www.ogf.org/dfdl/).
A stand-alone editor using Mediawiki markup language to generate HTML code. You can create and preview pages written using Mediawiki markup (i.e. Wikipedia pages) while off-line.
A lightweight HTML editor, developed in java. So far includes: Tag highlighting. Default browser previewing. Will be developed to eventually include many Automatic HTML code insertion methods. Great for those learning HTML.