Showing 13 open source projects for "scraping"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Browserless

    Browserless

    Deploy headless browsers in Docker

    ...It lets developers connect existing Puppeteer and Playwright code to remote browser sessions over WebSocket, which helps move heavy browser work away from local machines or application servers. The project also provides REST APIs for common automation tasks such as screenshots, PDF generation, scraping, crawling, and content export. Browserless is useful for teams that need scalable browser execution for testing, data collection, rendering, or AI-agent browsing workflows. Its deployment model supports self-hosting, private infrastructure, queues, concurrency controls, and enterprise-oriented configuration. The project’s main value is turning browser automation into a managed service layer that can be reused across applications and workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    ...It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. Its architecture separates core scraping logic from the user interfaces, allowing the same metadata processing system to be reused across different modes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Happy DOM

    Happy DOM

    Happy DOM is a JavaScript implementation of a web browser

    Happy DOM is a JavaScript implementation of a web browser without its graphical user interface. It includes many web standards from WHATWG DOM and HTML. The goal of Happy DOM is to emulate enough of a web browser to be useful for testing, scraping web sites, and server-side rendering. Happy DOM focuses heavily on performance and can be used as an alternative to JSDOM. Happy DOM now supports Declarative Shadow DOM which can be used for server-side rendering of web components. This package makes it possible to use Happy DOM with Jest.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    ...It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. It includes multiple usage modes such as a desktop GUI, a web service interface, and a command line tool for flexible deployment. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Ayakashi

    Ayakashi

    The next generation web scraping framework

    The next-generation web scraping framework. The web has changed. Gone are the days when raw HTML parsing scripts were the proper tool for the job. Javascript and single-page applications are now the norms. Demand for data scraping and automation is higher than ever, from business needs to data science and machine learning. Our tools need to evolve. Ayakashi helps you build scraping and automation systems that are easy to build simple or sophisticated, highly performant, maintainable, and built for change. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    SecretAgent

    SecretAgent

    The web scraper that's nearly impossible to block

    SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Browser Pool

    Browser Pool

    A Node.js library to easily manage and rotate a pool of web browsers

    ...We created Browser Pool because we regularly needed to execute tasks concurrently in many headless browsers and their pages, but we did not want to worry about launching browsers, closing browsers, restarting them after crashes and so on. We also wanted to easily and reliably manage the whole browser/page lifecycle. You can use Browser Pool for scraping the internet at scale, testing your website in multiple browsers at the same time or launching web automation robots.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 13
    Chromeless

    Chromeless

    Chrome automation made simple. Runs locally or headless on AWS Lambda

    Chromeless is an open-source JavaScript library designed to simplify browser automation by controlling a Chrome or Chromium browser through an easy-to-use API. The project was created to make headless browser scripting more accessible for tasks such as automated testing, web scraping, and screenshot generation. Instead of manually interacting with browser debugging protocols, developers can use Chromeless commands to navigate pages, fill forms, click elements, and extract information programmatically. The library supports running Chrome locally during development or executing headless browser sessions remotely on cloud infrastructure such as AWS Lambda. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB