27 projects for "scraping" with 2 filters applied:

  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point. Because it’s published publicly under an open license, users are free to fork and adapt the code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Zendriver

    Zendriver

    A blazing fast, async-first, undetectable webscraping

    Zendriver is a modern Python web automation and scraping framework that leverages the Chrome DevTools Protocol to provide fast, asynchronous control over real browser instances. Unlike traditional tools that rely on Selenium or WebDriver, Zendriver communicates directly with the browser through CDP, enabling higher performance and more precise control over browser behavior. The framework is designed to be difficult to detect by anti-bot systems, making it suitable for advanced scraping and automation use cases where stealth is important. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Jikan REST

    Jikan REST

    The REST API for Jikan

    Jikan REST is an unofficial RESTful API for MyAnimeList.net, providing access to anime, manga, and user data by scraping the website. It allows developers to integrate MyAnimeList data into their applications without relying on the official API. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 9
    Helium

    Helium

    Lighter web automation with Python

    ...It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    DrissionPage

    DrissionPage

    Python based web automation tool. Powerful and elegant

    DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    reCAPTCHA

    reCAPTCHA

    PHP client library for reCAPTCHA, a free service

    ...The ecosystem supports mobile and enterprise variants, but the repo focuses on common web integrations and best practices for verifying the token securely. Deployed correctly, reCAPTCHA reduces credential stuffing, bot sign-ups, and scraping without degrading the experience for typical users.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    chrome-remote-interface

    chrome-remote-interface

    Chrome Debugging Protocol interface for Node.js

    ...By connecting to a running browser instance with remote debugging enabled, developers can inspect network activity, manipulate pages, capture screenshots, automate navigation, and analyze performance metrics. The library is commonly used for building browser automation scripts, testing tools, web scraping systems, and custom developer tooling that interacts directly with Chrome internals. Unlike higher-level tools such as Puppeteer, Chrome Remote Interface focuses on exposing the raw protocol in a more flexible and transparent way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    User Agents

    User Agents

    A JavaScript library for generating random user agents with data

    User Agents is a JavaScript library that generates realistic and up-to-date user agent strings and browser fingerprints based on real-world usage data. The library is designed to help developers simulate authentic browser traffic patterns, which is particularly useful in web scraping, testing, and automation scenarios. Unlike simpler random user agent generators, it uses frequency-weighted datasets to ensure that generated values reflect how browsers are actually used in the wild. The dataset is updated automatically on a daily basis, ensuring that generated user agents remain current and relevant over time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    prometheus-net

    prometheus-net

    .NET library to instrument your code with Prometheus metrics

    This is a .NET library for instrumenting your applications and exporting metrics to Prometheus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Twint

    Twint

    An advanced Twitter scraping & OSINT tool written in Python

    Twint is an advanced open-source Twitter scraping and OSINT tool written in Python that extracts tweets, user data, followers, likes, and more—without relying on Twitter’s API—making it highly useful for researchers, analysts, and hobbyists who want to bypass rate limits and access public Twitter data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    RobotsDisallowed

    RobotsDisallowed

    A curated list of the most common and most interesting robots.txt

    RobotsDisallowed is a public catalog that tracks websites and organizations explicitly blocking AI and web-scraping crawlers in their robots.txt or related mechanisms. It focuses on documenting the growing trend of content owners asserting control over how their data is used for model training and automated harvesting. The project aggregates domains, notes the targeted bots or user agents, and surfaces patterns for researchers, policymakers, and tool builders.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Enlive

    Enlive

    Selector-based templating and transformation system for Clojure

    Enlive is a Clojure library for HTML templating, transformation, and scraping, supporting composable manipulation of HTML/XML in a functional style. It allows selecting, transforming, and generating HTML fragments using CSS selectors, and supports server-side template composition, dynamic pages, and content rewriting. By default selector-transformation pairs are run sequentially. When you know that several transformations are independent, you can now specify (as an optimization) to process them in lockstep. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    WKZombie

    WKZombie

    WKZombie is a Swift framework for iOS/OSX to navigate within websites

    WKZombie is a Swift framework for iOS/OSX to navigate within websites and collect data without the need of a User Interface or API, also known as a Headless browser. It can be used to run automated tests/snapshots and manipulate websites using Javascript. WKZombie is an iOS/OSX web-browser without a graphical user interface. It was developed as an experiment in order to familiarize myself with using functional concepts written in Swift 4. It incorporates WebKit (WKWebView) for rendering and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    TXR

    Text scraping and data munging language.

    NOTE: TXR used SourceForge for hosting binary downloads only. As of July 26, 2016, TXR uses the site Bintray (bintray.com) for hosting binary downloads. Do not look for new releases here! TXR combines a text scraping language combined with an innovative Lisp dialect geared toward data munging. TXR cribs ideas from modern scripting languages, multiple Lisp dialects, functional languages, and Unix tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    This is simple Small application, for user of orkut.com, to make thier scraping a fun, differnt experience...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB