Showing 26 open source projects for "internet archive web crawler"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LinkAce

    LinkAce

    LinkAce is a self-hosted archive to collect links of your favorite web

    LinkAce is a self-hosted archive to collect links of your favorite websites. Save articles to read them later, tools to use in your next project, or historical content to archive it for the long term. LinkAce comes with a lot of features while keeping a clean and minimal interface. It provides a long-term archive to store links to websites, media files or anything else with a valid URL. The user is able to categorize the added links to be able to find them later, and share lists of links...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 7 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ZnetDK 4 Mobile

    ZnetDK 4 Mobile

    Responsive Web App development full-stack framework in PHP and JS

    Develop easily in PHP, MySQL and JavaScript your business web app for mobile and tablet devices from a ready-to-customize starter app, fully Responsive and Installable (PWA). No need to start up from scratch. You just have to add the views that fullfill your business requirements and connect them to the user navigation menu. All the features expected for a mobile business application are already developed! Authenticate your users through the built-in login form. Show them only the pages...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Kisekae UltraKiss

    Kisekae UltraKiss

    Kisekae UltraKiss is a full featured integrated development environmen

    UltraKiss is a computer program that implements the Kisekae Set system, KiSS, a Japanese graphics system originally developed to facilitate costume changes on virtual dolls. UltraKiss was developed to help artists build their KiSS sets. It is a full featured viewer for all KiSS dolls, games, and visual applications. It is also a complete graphical development environment for creating KiSS applications. It fully implements the FKiSS event driven programming language up to and including...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    ReconSpider

    ReconSpider

    Most Advanced Open Source Intelligence (OSINT) Framework

    ...Reconnaissance is a mission to obtain information by various detection methods, about the activities and resources of an enemy or potential enemy, or geographic characteristics of a particular area. A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    HTML5 Editor Online

    HTML5 Editor Online

    Edit yours files html, css and js directly in your PC or Mac

    Edit your files html, css and javascript directly into your PC or Mac, save the result in your file system. See a preview of web design, and if you want you can also export it as pdf or html format. ===== Android app Deluxe edition: https://play.google.com/store/apps/details?id=com.ulm.html5editorfortablet Android app PRO edition: https://play.google.com/store/apps/details?id=com.ulm.html5editor Chrome app: HTML, XHTML and XML editor - https://goo.gl/bYR7fF ===== == Update...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    FireTeX: LaTeX Editor and Compiler

    FireTeX: LaTeX Editor and Compiler

    Edit Your files LaTeX and tex

    FireTeX, web based LaTeX editor complete, is a powerful, intuitive and stocked with useful functions for exporting the results in three useful formats. An editor with LaTeX compiler, highlight code, advanced search / replace and filesystem API HTML5. ======== Android app available on Play Store > https://play.google.com/store/apps/details?id=com.ulmdesign.ulmtex ======== Update 30.06.2017 Windows 7 and later and macOS 10.9 and later are supported. == Browser Extensions == Add-on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Piggydb

    Piggydb

    Piggydb helps you have more fun with knowledge creation.

    Piggydb is a flexible and scalable knowledge building platform that supports a heuristic or bottom-up approach to discover new concepts or ideas based on your input. You can begin with using it as a flexible outliner, diary or notebook, and as your database grows, Piggydb helps you to shape or elaborate your own knowledge. Piggydb is a Web application provided as a self-contained package that contains a Web server and database engine. With Piggydb, you can create highly structured content...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Zoozle Search & Download Suchmaschine

    Zoozle Search & Download Suchmaschine

    Zoozle 2008 - 2010 Webpage, Tools and SQL Files

    Download search engine and directory with Rapidshare and Torrent - zoozle Download Suchmaschine All The files that run the World Leading German Download Search Engine in 2010 with 500 000 unique visitors a day - all the tools you need to set up a clone. Code Contains: - PHP Files for zoozle - Perl Crawler for gathering new content to database and all other cool tools i have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This Amazon Mechanical Turk (MTurk) SDK is no longer under active development as of 1/1/2018. The latest MTurk SDKs, with support for most common programming languages, are now a part of the Amazon Web Service (AWS) SDKs at https://aws.amazon.com/tools/. With the AWS SDKs, you can also connect to MTurk using the AWS Command Line Interface (https://aws.amazon.com/cli/), allowing you to perform operations from your Windows, Mac, or Linux command line without having to write code. If you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Customizable browser based (text/web(WYSIWYG)) file editors environment in PHP (GPL Licensed) with loads of features. (tested only in firefox)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    HttpdBase4J is an embeddable Java web server framework that supports HTTP, HTTPS, templated content and serving content from inside an archive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DoxMentor4J is a standalone cross platform Web/Ajax based documentation library that is fully searchable and may be hosted in the file system, in an archive or embedded in the Java classpath.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ** Guys I have built a much more powerful Fully Featured CMS system at: https://github.com/MacdonaldRobinson/FlexDotnetCMS Macs CMS is a Flat File ( XML and SQLite ) based AJAX Content Management System. It focuses mainly on the Edit In Place editing concept. It comes with a built in blog with moderation support, user manager section, roles manager section, SEO / SEF URL
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This is MySQL Proxy service written in PHP. All outputs of the queries are represented as XML data. This service can be placed in the DMZ of your server. Connect your application data layer to this service. Example in C# is bundled in the archive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB