With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
The Informa library provides a convenient Java API for handling news channels and metadata about them. Different syntax formats (RSS 0.91, 1.0, 2.0 and Atom 0.3, 1.0) for feeds are supported. Also support for channel information descriptions (OPML) avail
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
This forum software is a Java based discussion forum, that uses JDBC to store data in a database. This discussion forum is available in different languages and has features for easy integration into a site and easy administration of forum.
JAMP provides several functions to index and manage your media files on resources like storage systems or dvds. The userinterface is webbased and fully written in java.
LIMO stands for Lucene Index Monitor. It is a web application that gives basic information about indexes used by the Lucene search engine (http://lucene.apache.org). It allows you to browse and search the index, and reconstruct stored fields.
Pandora's Jar enables timeshifting on pandora-based (www.pandora.com) radio stations. Distributed voting system provides error correction and detection of damaged files. Playlist-per-station support lets you re-play by-genre in addition to by-time.
DOP Software’s mission is to streamline waste and recycling business’ processes by providing them with dynamic, comprehensive software and services that increase productivity and quality of performance.
Photospace is an open platform for searching, viewing and annotating digital media in time and space. Photospace integrates easily with photo blogs, map blogs, SOAP clients, RSS and RDF readers and your own custom applications.
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
Glue is a WSMO compliant discovery engine that aims at developing an efficient system for the management of semantically described Web Services and their discovery.
OJAX provides - a meta-search service with a highly dynamic AJAX based user interface. - an OAI-PMH harvester to harvest multiple repositories to a single Lucene index - an easy to use, highly discoverable user interface to searching that index.
Project consist of 2 parts. One of them is a J2ME app. used to get information such as photo, position, speed & course from GPS and transfers it to the web server. Another one is a web app. which allows to manage and display received data using GoogleMap
webspider provides a mechanism to get contents from web. With the extended classes, you can do the following things:
1. grab urls from a specified base url
2. analyze the contents of a list of urls
3. get specific files from web
4. blablabla
PDFBox is a Java PDF Library. This project will allow access to all of the components in a PDF document. More PDF manipulation features will be added as the project matures. This ships with a utility to take a PDF document and output a text file.
A front end to the swedish public transportation search engine. It does the same requests as the wap page would but with the added funtions of a standard J2ME app.
SCAM is a development environment for building metadata stores for RDF and the Semantic Web. SCAM is built upon international technology standards and metadata standards. Such as RDF, Dublin Core, IEEE/LOM and IMS.
This project intends to create an indexing search engine, for knowledge management. The primary object is to apply an information retrieval core. And implement a knowledge data discovery theory such as data mining algorithm, text mining.