Showing 1069 open source projects for "web crawler source code"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    FinalRecon

    FinalRecon

    All-in-one Python web reconnaissance tool for fast target analysis

    FinalRecon is an all-in-one web reconnaissance tool written in Python that helps security professionals gather information about a target website quickly and efficiently. It combines multiple reconnaissance techniques into a single command-line utility so users do not need to run several separate tools to collect similar data. FinalRecon focuses on providing a fast overview of a web target while maintaining accuracy in the collected results. It includes modules for gathering server...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Floorp Browser

    Floorp Browser

    All of source code of version 10 or later of Floorp Browser

    ...Floorp will be updated every 4 weeks, with security updates provided before each Firefox release. We don't collect personal information from users. We don't track users. We don't sell user data. We have no affiliation with any advertising companies. Floorp's source code is entirely open, allowing anyone to view it and contribute to the project. Not only is the browser itself open source, but the build environment is as well.
    Downloads: 115 This Week
    Last Update:
    See Project
  • 6
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Stylus for Chrome

    Stylus for Chrome

    Stylus - Userstyles Manager

    Stylus is a fork of Stylish for Chrome, also compatible with Firefox as a WebExtension. "Stylus" is a fork of the popular Stylish extension which can be used to restyle the web. Not "ish", but "us", as in "us" the actual users. Stylus is a fork of Stylish that is based on the source code of version 1.5.2, which was the most up-to-date version before the original developer stopped working on the project. The objective in creating Stylus was to remove any and all analytics, and return to a more user-friendly UI. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 8
    eslint-plugin-compat

    eslint-plugin-compat

    Check the browser compatibility of your code

    Lint the browser compatibility of your code. Browser targets are configured using a browser list. You can configure browser targets in your package.json. If no configuration is found, browser list defaults to "> 0.5%, last 2 versions, Firefox ESR, not dead". Add polyfills to the settings section of your eslint config. Append the name of the object and the property if one exists. Toolchains for native platforms, like iOS and Android, have had API linting from the start. It's about time that...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    PostgREST

    PostgREST

    REST API for any Postgres database

    ...Object-relational mapping is a leaky abstraction leading to slow imperative code. The PostgREST philosophy establishes a single declarative source of truth: the data itself. It’s easier to ask PostgreSQL to join data for you and let its query planner figure out the details than to loop through rows yourself. It’s easier to assign permissions to db objects than to add guards in controllers. (This is especially true for cascading permissions in data dependencies.)
    Downloads: 14 This Week
    Last Update:
    See Project
  • 11
    Ungoogled Chromium

    Ungoogled Chromium

    A lightweight approach to removing Google web service dependency

    In descending order of significance (i.e. most important objective first), ungoogled-chromium is Google Chromium, sans dependency on Google web services, ungoogled-chromium retains the default Chromium experience as closely as possible. Unlike other Chromium forks that have their own visions of a web browser, ungoogled-chromium is essentially a drop-in replacement for Chromium. ungoogled-chromium features tweaks to enhance privacy, control, and transparency. However, almost all of these...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 12
    Maxun

    Maxun

    Small event-delegation library for decoupling event binding and handli

    Maxun named JsAction by Google serves as a lightweight event delegation library built in JavaScript. It allows developers to separate the logic of binding events from the code that handles those events, helping to keep DOM event wiring cleaner and more maintainable. It is archived and marked as read-only, indicating that the project is no longer actively maintained or intended for production use. The README states that ongoing development has migrated into a larger framework under the...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 13
    Proton VPN Browser Extension

    Proton VPN Browser Extension

    Proton VPN Browser Extension

    The Proton VPN Browser Extension repository houses the code for an official browser extension that lets users quickly secure their web browsing traffic through ProtonVPN from within browsers like Firefox and Chrome without routing all system traffic. This extension provides users with a convenient way to encrypt and anonymize HTTP requests, hide IP addresses, and prevent tracking while browsing, acting independently of the full OS-level VPN clients. Because browser extensions are constrained...
    Downloads: 92 This Week
    Last Update:
    See Project
  • 14
    Coraza

    Coraza

    OWASP Coraza WAF is a golang modsecurity compatible firewall library

    ...Coraza runs the OWASP Core Rule Set (CRS) to protect your web applications from a wide range of attacks, including the OWASP Top Ten, with a minimum of false alerts. CRS protects from many common attack categories including: SQL Injection (SQLi), Cross Site Scripting (XSS), PHP & Java Code Injection, HTTPoxy, Shellshock, Scripting/Scanner/Bot Detection & Metadata & Error Leakages. Coraza is a library at its core, with many integrations to deploy on-premise Web Application Firewall instances.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    WireGuard Easy

    WireGuard Easy

    The easiest way to run WireGuard VPN + Web-based Admin UI

    WireGuard Easy is a streamlined solution for deploying and managing a WireGuard VPN server through a web-based interface, designed to eliminate the complexity typically associated with manual VPN configuration. It combines the WireGuard backend with a user-friendly admin dashboard, allowing users to control clients, monitor connections, and generate configuration files without interacting directly with command-line tools. The project is commonly deployed using Docker, making installation...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Betterfox

    Betterfox

    Firefox user.js for optimal privacy and security

    Betterfox is an opinionated configuration profile for Mozilla Firefox designed to improve everyday web browsing by making the browser faster, more private, and more secure without relying on external add-ons or third-party code. Rather than being a separate browser, it consists of a curated set of preference tweaks (user.js settings) that users apply to their Firefox profile to optimize performance and harden privacy settings. The project focuses on a minimal-impact, maximum-effect approach,...
    Downloads: 55 This Week
    Last Update:
    See Project
  • 18
    Takes

    Takes

    True object-oriented Java web framework without NULLs

    Takes is a true object-oriented and immutable Java8 web development framework. Pay attention that UTF-8 encoding is set on the command line. The entire framework relies on your default Java encoding, which is not necessarily UTF-8 by default. To be sure, always set it on the command line with file.encoding Java argument. We decided not to hard-code "UTF-8" in our code mostly because this would be against the entire idea of Java localization, according to which a user always should have a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    webiny

    webiny

    Enterprise open-source serverless CMS

    Enterprise open-source serverless CMS. Includes a headless CMS, page builder, form builder and file manager. Easy to customize and expand. Deploys to AWS. Create GraphQL APIs, full-stack applications and websites. Deploy with single command to your AWS. Runs on services like AWS Lambda and DynamoDB. Highly-scalable & highly-available out of the box. You get a full-stack project with a GraphQL API and a React frontend that you can use to start building. Write custom apps and business logic...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Sanic

    Sanic

    Async Python 3.6+ web server/framework

    Build fast, run fast with Sanic! Sanic is a Python 3.6+ web server and web framework designed to go fast. It provides a way to get a highly performant HTTP server up and running fast, while also making it easy to build, expand, and eventually scale. Sanic aspires to be as simple as possible while delivering the performance that you require. It allows the usage of the async/await syntax added in Python 3.5, so your code is guaranteed to be non-blocking and speedy. It's also ASGI compliant,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JSONView

    JSONView

    A web extension that helps you view JSON documents in the browser

    A web extension that helps you view JSON documents in the browser. Normally when encountering a JSON document (content type application/json), Firefox simply prompts you to download the view. With the JSONView extension, JSON documents are shown in the browser similar to how XML documents are shown. The document is formatted, highlighted, and arrays and objects can be collapsed. Even if the JSON document contains errors, JSONView will still show the raw text. JSONView is a Web extension...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Ghostery

    Ghostery

    Ghostery Browser Extension for Firefox, Chrome, Opera and Edge

    Ghostery helps you browse smarter by giving you control over ads and tracking technologies to speed up page loads, eliminate clutter, and protect your data. This is the unified code repository for the Ghostery browser extensions in Chrome, Firefox, Opera and Edge. Browse the web safer, faster & with less annoying ads. Equipped with award-winning AI anti-tracking technology to browse the websafe and quickly. Ghostery helps you stay informed about what companies are tracking you by listing the...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 25
    WordPress

    WordPress

    Just a mirror of the WordPress subversion repository

    WordPress is one of the world’s most widely used content management systems (CMS), powering blogs, websites, and increasingly web apps. It offers a flexible architecture of themes and plugins, where users can extend functionality or customize layout without touching core code. The administrative dashboard includes post and page editors, media library, user roles, plugin/theme installation, and site settings. Through its REST API and headless mode, WordPress also serves as a backend for...
    Downloads: 25 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB