Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
CekTKP Create many short URLs (tinyurl.com, bit.ly and many more) from one website. You can also EXTRACT or retrieve the original URL that already shortened. It's useful to preview where exactly you will be redirected before you click that short URL.
The Backlinkchecker is used to analyze how many inbound links there are for your website. Its developed in PHP, JavaScript and MySQL. It uses the great Javascript lib extjs and the php lib snoopy.php.
FoxBAT was made in an attempt to see if Naïve Bayesian filtering commonly used for spam filtering could be employed in the World Wide Web context. The application consists of a Firefox extension (.XPI package) and a Perl server script.
Ideal for lending professionals who are looking for a feature rich loan management system
Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.
The purpose of this project is to gauge website page loading speed as aspect of client through web management system, Leech. Leech says, "Leech bot reachs hosts via URL Information and sucks http responses!"
SpongeStats est un outil développé en PHP/MySQL/AJAX pour visualiser les statistiques de fréquentation et analyser le référencement d'un site Internet. SpongeStats is a analysis tools for your web site developped in PHP/MySQL/AJAX.
MWQL is an extension to MediaWiki, providing (end) users with a language for structural queries, so that they can build dynamic pages as seen in the Special pages of Wikipedia.
Ruya is a Python-based breadth-first, level-, delayed, event-based-crawler for crawling English, Japanese websites. It is targeted solely towards developers who want crawling functionality in their projects using API, and crawl control.
Realistic Workplace Simulations that Show Applicant Skills in Action
Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.
InSite is a Web site management tool written in perl. It checks link integrity and does some basic content monitoring of your site's files directly on the local disk, which gives it a huge speed advantage over similar tools.
Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
404SEF is a component for Mambo CMS (4.5.x right now, 4.6.x soon) to provide Human Readable URLs. Works with apache and IIS. Provides proper 404 status code for missing content, logs 404 errors, and user-defined custom redirection via special shortcuts
Automatic link management program. Has three functions: List links in database in html format, add links to database using browser and optionaly check for bad links (by cron job). This eliminates the need for the "Report bad link" on too many web sites
Like social boomarking, allows users to share their bookmarks online. Like wiki, anyone can freely edit links. Export / Import boomarks with your browser. Many other features: RSS and Atom feeds, URL check, popular categories, XLIink, ...
The main function of this script is to shorten long website-URL's -- converting long URL's into easy-to-remember, short ones. [htaccess, MOD_REWRITE, XHTML 1.0 strict, CSS 1, JS 1.2, PHP 5X, MySQL 4X]
A content management system which allows web developers to create and organize a collection of URLs (a.k.a. - a link farm) using a searchable labeling system.