Showing 93 open source projects for "scrape"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Linkedin Scraper

    Linkedin Scraper

    A library that scrapes Linkedin for user data

    ...Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    JMX Exporter

    JMX Exporter

    A process for exposing JMX Beans via HTTP for Prometheus consumption

    JMX to Prometheus exporter: a collector that can configurable scrape and expose mBeans of a JMX target. This exporter is intended to be run as a Java Agent, exposing a HTTP server and serving metrics of the local JVM. It can be also run as a standalone HTTP server and scrape remote JMX targets, but this has various disadvantages, such as being harder to configure and being unable to expose process metrics (e.g., memory and CPU usage).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Elasticsearch Exporter

    Elasticsearch Exporter

    Elasticsearch stats exporter for Prometheus

    Prometheus exporter for various metrics about Elasticsearch, written in Go. The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with --es.all and --es.indices. We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter using a dedicated job with its own scraping interval. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Automa

    Automa

    A chrome extension for automating your browser by connecting blocks

    ...Try a workflow from the marketplace. There're dozens of workflows been shared by Automa users which you can add and customize. Auto-fill forms, do a repetitive task, take a screenshot, or scrape website data, the choice is yours. You can even schedule when the automation will execute! Browse the Automa marketplace where you can share and download workflows with others.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 6
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    DocSearch

    DocSearch

    The easiest way to add search to your documentation

    ...DocSearch understands how the user input fits into the context of your project and instantly presents the most relevant content with fewer interactions than any other method. With a design very close to the native experience on mobile, we leverage users acquaintance with the interaction patterns of each OS. We scrape your documentation or technical blog, configure the Algolia application and send you the snippet you'll have to integrate. It's that simple. You don't need to configure any settings or even have an Algolia account. We take care of this for you! We'll send you a small snippet to integrate DocSearch to your website and an invite to your fully configured Algolia application.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    ...It keeps your proxies healthy by rotating them smartly with good fingerprints that make your crawlers look human-like. It's not unblockable, but it will save you money in the long run. Crawlee is built by people who scrape for a living and use it every day to scrape millions of pages. Meet our community on Discord. We believe websites are best scraped in the language they're written in. Crawlee runs on Node.js and it's built in TypeScript to improve code completion in your IDE, even if you don't use TypeScript yourself.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Prometheus Redis Metrics Exporter

    Prometheus Redis Metrics Exporter

    Prometheus Exporter for Redis Metrics. Supports Redis 2.x, 3.x, 4.x, 5

    ...If authentication is needed for the Redis instances then you can set the password via the --redis.password command line option of the exporter (this means you can currently only use one password across the instances you try to scrape this way. Use several exporters if this is a problem). If your Redis instance requires authentication then there are several ways how you can supply a username (new in Redis 6.x with ACLs) and a password.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Polymarket Data

    Polymarket Data

    Polymarket Data Retriever that fetches, processes, and structures data

    ...The system operates as a multi-stage pipeline that integrates data from both off-chain APIs and on-chain event sources, enabling users to reconstruct full trading activity including markets, order events, and executed trades. It begins by fetching market metadata such as questions, outcomes, and trading volumes, then proceeds to scrape order-filled events from a GraphQL-based subgraph, and finally transforms these raw events into structured trade-level records with calculated prices and directions. One of its key strengths is its ability to run incrementally and resume operations automatically, making it suitable for long-running data collection without duplication or data loss.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Actionbook

    Actionbook

    Browser action engine for AI agents. 10× faster, resilient by design

    Actionbook is an AI-centric automation framework that equips intelligent agents with the ability to interact with real live web pages in a reliable and scalable way, eliminating the guesswork involved in navigating modern dynamic sites. Instead of having agents blindly scrape HTML or blindly try to click things, Actionbook supplies up-to-date action manuals and verified DOM structure, letting agents know exactly how to click, type, and navigate complex interfaces such as SPAs or streaming UIs. This design makes browsing up to 10× faster and far more resilient than ad-hoc approaches that break on minor page changes, because the action manuals codify expected flows and DOM targets. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Scweet

    Scweet

    Scrape tweets, profiles, followers and following from Twitter/X

    Scweet is a Python-based Twitter/X scraping library and CLI designed to collect tweets, profile timelines, followers, following lists, and user profile data without requiring the official Twitter/X API or a developer account. Instead of depending on deprecated unauthenticated scraping methods, it works by using X’s web GraphQL API together with authenticated browser cookies, which gives it a more current and practical approach for data extraction. The project supports a broad set of...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance....
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    SimpDL

    SimpDL

    A tool to scrape images from SimpCity

    SimpDL is an open-source media downloading tool designed to retrieve content from subscription-based or creator platforms, focusing on simplicity and ease of use. It enables users to download images, videos, and other media associated with specific creators or accounts, often through authenticated sessions. The project emphasizes a straightforward workflow where users provide login credentials or tokens, and the tool handles the retrieval and storage of content automatically. It is designed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Rod

    Rod

    A Devtools driver for web automation and scraping

    Rod is a high-level driver for DevTools Protocol. It's widely used for web automation and scraping. Rod can automate most things in the browser that can be done manually. Chained context design, intuitive to timeout or cancel the long-running task. Auto-wait elements to be ready. Debugging friendly, auto input tracing, remote monitoring headless browser. Thread-safe for all operations. Automatically find or download browser. High-level helpers like WaitStable, WaitRequestIdle,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DenchClaw

    DenchClaw

    Fully Managed OpenClaw Framework for all knowledge work ever

    ...It can ingest data from sources such as Google Drive, Notion, Gmail, and CRM platforms, consolidating everything into a centralized workspace for analysis and action. One of its most distinctive capabilities is its ability to use the user’s existing browser session, enabling it to log into services, scrape data, and perform actions like outreach or research as if it were the user.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Firecrawl MCP Server

    Firecrawl MCP Server

    Adds powerful web scraping and search to Cursor and Claude

    firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. Built-in behaviors include JavaScript rendering, automatic retries, and streamable HTTP so long pages and large crawls can flow incrementally into agents. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Soketi

    Soketi

    Just another simple, fast, and resilient open-source WebSockets server

    Ever dreamed about Serverless WebSockets? Soketi can be deployed to Cloudflare Workers. All around the world, closer to your users. Same Pusher protocol. Powered by Cloudflare's Durable Objects and KV, you can achieve great speeds at edge for your users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    SOCKS5 of Death
    SOCKS5 of Death is a SOCKS5 proxy scraper. Can scrape from 10 different sites. Tests the proxies. Removes dead proxies. Export to txt or CSV.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25

    Overdrive Ebook Scraper

    Perform OCR on an Overdrive Read ebook to convert it to plain text.

    Perform OCR on an Overdrive Read ebook to convert it to plain text. *This project has moved to Github.com*
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB