scraping free download

Showing 27 open source projects for "scraping"

View related business solutions

Browsers Clear Filters & Widen Search

Earn up to 16% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
1

HeadlessX

The undetected self-hosted browser automation platform

HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. ...

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
2

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering.

Downloads: 0 This Week

Last Update: 2025-09-08
See Project
3

Browserless

Deploy headless browsers in Docker

...It lets developers connect existing Puppeteer and Playwright code to remote browser sessions over WebSocket, which helps move heavy browser work away from local machines or application servers. The project also provides REST APIs for common automation tasks such as screenshots, PDF generation, scraping, crawling, and content export. Browserless is useful for teams that need scalable browser execution for testing, data collection, rendering, or AI-agent browsing workflows. Its deployment model supports self-hosting, private infrastructure, queues, concurrency controls, and enterprise-oriented configuration. The project’s main value is turning browser automation into a managed service layer that can be reused across applications and workflows.

Downloads: 12 This Week

Last Update: 1 day ago
See Project
4

crawlee

A web scraping and browser automation library for Node.js

Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back.

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
5

Symfony Panther

A browser testing and web crawling library for PHP and Symfony

Symfony Panther is a browser testing and web scraping tool that allows developers to interact with websites programmatically. It uses headless Chrome or Firefox to automate browser tasks, making it suitable for end-to-end testing and data extraction. Panther integrates well with Symfony and PHPUnit, allowing developers to write comprehensive tests for web applications.

Downloads: 0 This Week

Last Update: 2026-01-08
See Project
6

Linkedin Scraper

A library that scrapes Linkedin for user data

Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the...

Downloads: 1 This Week

Last Update: 2026-04-10
See Project
7

Rod

A Devtools driver for web automation and scraping

Rod is a high-level driver for DevTools Protocol. It's widely used for web automation and scraping. Rod can automate most things in the browser that can be done manually. Chained context design, intuitive to timeout or cancel the long-running task. Auto-wait elements to be ready. Debugging friendly, auto input tracing, remote monitoring headless browser. Thread-safe for all operations. Automatically find or download browser.

Downloads: 0 This Week

Last Update: 2024-07-12
See Project
8

chromedp

A faster, simpler way to drive browsers supporting the Chrome DevTools

...Because it communicates directly with Chrome’s debugging interface, chromedp offers high performance and reliable automation compared with tools that rely on intermediary drivers. It is frequently used for web scraping, automated testing, performance monitoring, and browser-based data extraction workflows.

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
9

CloakBrowser

Stealth Chromium that passes every bot detection test

...The project integrates with Playwright and Puppeteer while preserving familiar automation workflows for developers. It also supports isolated browser profiles with configurable fingerprints, making it useful for testing, automation research, scraping, QA, and multi-profile browser environments. The ecosystem includes a self-hosted browser profile manager that functions as an open-source alternative to commercial anti-detect browsers.

Downloads: 40 This Week

Last Update: 2026-05-21
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

Lightpanda Browser

Lightpanda: the headless browser designed for AI and automation

Lightpanda is an open-source headless browser designed specifically for automation, artificial intelligence workflows, and large-scale web interaction tasks. Unlike traditional browsers that include full graphical rendering engines meant for human users, Lightpanda is built from scratch to operate entirely in headless mode, focusing only on the components required for programmatic web interaction. This design allows it to execute JavaScript and interact with web pages while avoiding the...

Downloads: 11 This Week

Last Update: 4 days ago
See Project
11

Automa

A chrome extension for automating your browser by connecting blocks

Automa is a browser extension for browser automation. From auto-fill forms, doing a repetitive task, taking a screenshot, to scraping data of the website, it's up to you what you want to do with this extension. Automa has provided various kinds of blocks that will help you do automation, and all you need to do is connect them. Want your workflow to run every day or every time you visit a specific website? You can set the workflow trigger on the trigger block.

Downloads: 9 This Week

Last Update: 2025-08-11
See Project
12

Ferrum

Headless Chrome Ruby API

...Because of this low-level access, Ferrum offers greater flexibility and performance compared to traditional WebDriver-based automation tools. It is commonly used for tasks such as web scraping, automated testing, crawling, and screenshot or PDF generation.

Downloads: 1 This Week

Last Update: 2026-03-23
See Project
13

Happy DOM

Happy DOM is a JavaScript implementation of a web browser

Happy DOM is a JavaScript implementation of a web browser without its graphical user interface. It includes many web standards from WHATWG DOM and HTML. The goal of Happy DOM is to emulate enough of a web browser to be useful for testing, scraping web sites, and server-side rendering. Happy DOM focuses heavily on performance and can be used as an alternative to JSDOM. Happy DOM now supports Declarative Shadow DOM which can be used for server-side rendering of web components. This package makes it possible to use Happy DOM with Jest.

Downloads: 0 This Week

Last Update: 2026-04-13
See Project
14

Cuprite

Headless Chrome/Chromium driver for Capybara

...By communicating directly with Chromium-based browsers through the DevTools protocol, Cuprite enables faster and more reliable browser automation for testing and scraping tasks. The driver integrates seamlessly with Capybara, allowing developers to write feature tests that simulate real user interactions with web applications. Because it uses headless Chrome by default, Cuprite is well suited for automated test environments and continuous integration pipelines. Developers can also run tests with a visible browser window for debugging purposes during development.

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
15

Scrapy-Redis

Redis-based components for Scrapy

You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...

Downloads: 0 This Week

Last Update: 2024-07-06
See Project
16

Goutte

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method.

Downloads: 0 This Week

Last Update: 2023-04-01
See Project
17

SecretAgent

The web scraper that's nearly impossible to block

SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.

Downloads: 0 This Week

Last Update: 2023-08-14
See Project
18

Browser Pool

A Node.js library to easily manage and rotate a pool of web browsers

...We created Browser Pool because we regularly needed to execute tasks concurrently in many headless browsers and their pages, but we did not want to worry about launching browsers, closing browsers, restarting them after crashes and so on. We also wanted to easily and reliably manage the whole browser/page lifecycle. You can use Browser Pool for scraping the internet at scale, testing your website in multiple browsers at the same time or launching web automation robots.

Downloads: 4 This Week

Last Update: 2023-06-12
See Project
19

Erik

Erik is an headless browser based on WebKit

Erik is a headless browser based on WebKit, written in Swift, allowing developers to run functional tests and manipulate web pages using JavaScript.

Downloads: 0 This Week

Last Update: 2025-01-29
See Project
20

jBrowserDriver

A programmable, embeddable web browser driver

jBrowserDriver is a programmable, embeddable web browser driver compatible with the Selenium WebDriver specification, implemented in pure Java and based on WebKit.

Downloads: 0 This Week

Last Update: 2025-01-29
See Project
21

Chromeless

Chrome automation made simple. Runs locally or headless on AWS Lambda

Chromeless is an open-source JavaScript library designed to simplify browser automation by controlling a Chrome or Chromium browser through an easy-to-use API. The project was created to make headless browser scripting more accessible for tasks such as automated testing, web scraping, and screenshot generation. Instead of manually interacting with browser debugging protocols, developers can use Chromeless commands to navigate pages, fill forms, click elements, and extract information programmatically. The library supports running Chrome locally during development or executing headless browser sessions remotely on cloud infrastructure such as AWS Lambda. ...

Downloads: 2 This Week

Last Update: 2026-03-13
See Project
22

WKZombie

WKZombie is a Swift framework for iOS/OSX to navigate within websites

WKZombie is a Swift framework for iOS/OSX to navigate within websites and collect data without the need of a User Interface or API, also known as a Headless browser. It can be used to run automated tests/snapshots and manipulate websites using Javascript. WKZombie is an iOS/OSX web-browser without a graphical user interface. It was developed as an experiment in order to familiarize myself with using functional concepts written in Swift 4. It incorporates WebKit (WKWebView) for rendering and...

Downloads: 0 This Week

Last Update: 2023-06-29
See Project
23

Surf

Stateful programmatic web browsing in Go

Surf is a Go library that implements a virtual web browser, allowing developers to programmatically interact with web pages as a real browser would.

Downloads: 0 This Week

Last Update: 2025-01-29
See Project
24

RoboBrowser

On the fly web scraper

RoboBrowser is a webkit powered browser which built for web scraping purposes. It loads requested webpage, saves page source to disk, and sends it's path to a php script as first parameter.

Downloads: 0 This Week

Last Update: 2016-09-18
See Project
25

webStraktor

...It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.

Downloads: 0 This Week

Last Update: 2014-04-25
See Project