Showing 20 open source projects for "scraping"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point. Because it’s published publicly under an open license, users are free to fork and adapt the code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Jikan REST

    Jikan REST

    The REST API for Jikan

    Jikan REST is an unofficial RESTful API for MyAnimeList.net, providing access to anime, manga, and user data by scraping the website. It allows developers to integrate MyAnimeList data into their applications without relying on the official API. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Helium

    Helium

    Lighter web automation with Python

    ...It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    reCAPTCHA

    reCAPTCHA

    PHP client library for reCAPTCHA, a free service

    ...The ecosystem supports mobile and enterprise variants, but the repo focuses on common web integrations and best practices for verifying the token securely. Deployed correctly, reCAPTCHA reduces credential stuffing, bot sign-ups, and scraping without degrading the experience for typical users.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    My Python Eggs

    My Python Eggs

    Python Examples

    ...Rather than being a single cohesive application, it functions as a repository of utilities that demonstrate how Python can be used to solve everyday problems and automate repetitive tasks. The scripts cover a wide range of topics, including file management, networking, system monitoring, web scraping, and even simple games, making it a versatile learning resource. Many of the programs are designed to reduce manual workload by automating tasks such as renaming files, scanning directories, or checking system information. The repository also includes examples of more advanced concepts like multithreading, API interaction, and GUI development, providing a gradual learning curve for beginners.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    User Agents

    User Agents

    A JavaScript library for generating random user agents with data

    User Agents is a JavaScript library that generates realistic and up-to-date user agent strings and browser fingerprints based on real-world usage data. The library is designed to help developers simulate authentic browser traffic patterns, which is particularly useful in web scraping, testing, and automation scenarios. Unlike simpler random user agent generators, it uses frequency-weighted datasets to ensure that generated values reflect how browsers are actually used in the wild. The dataset is updated automatically on a daily basis, ensuring that generated user agents remain current and relevant over time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    prometheus-net

    prometheus-net

    .NET library to instrument your code with Prometheus metrics

    This is a .NET library for instrumenting your applications and exporting metrics to Prometheus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Zero Site Protector

    Zero Site Protector

    Human verification & attack prevention for website security

    The zero-site-protector plugin is a powerful security tool for your website that provides multiple layers of protection to safeguard against unauthorized access and potential attacks. The plugin includes features such as human verification, which ensures that only legitimate users are able to access your site. It also includes protection against various types of attacks such as cross-site scripting (XSS) and SQL injection. Additionally, the plugin allows you to block access to your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Riverpod

    Riverpod

    A reactive caching and data-binding framework

    Riverpod is a modern, framework-agnostic state management and dependency injection library for Dart and Flutter that emphasizes compile-time safety and explicit data flow. Instead of tying state to widget trees, it models state as providers that can be read anywhere, making logic more testable and decoupled from UI. It supports synchronous state, async data (futures/streams) with built-in loading/error handling, and advanced patterns like derived state and provider scopes. Riverpod’s design...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    RobotsDisallowed

    RobotsDisallowed

    A curated list of the most common and most interesting robots.txt

    RobotsDisallowed is a public catalog that tracks websites and organizations explicitly blocking AI and web-scraping crawlers in their robots.txt or related mechanisms. It focuses on documenting the growing trend of content owners asserting control over how their data is used for model training and automated harvesting. The project aggregates domains, notes the targeted bots or user agents, and surfaces patterns for researchers, policymakers, and tool builders.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Enlive

    Enlive

    Selector-based templating and transformation system for Clojure

    Enlive is a Clojure library for HTML templating, transformation, and scraping, supporting composable manipulation of HTML/XML in a functional style. It allows selecting, transforming, and generating HTML fragments using CSS selectors, and supports server-side template composition, dynamic pages, and content rewriting. By default selector-transformation pairs are run sequentially. When you know that several transformations are independent, you can now specify (as an optimization) to process them in lockstep. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    HTML DOM Parser

    HTML parser which can be used for screen-scraping applications

    htmldom parses the HTML file and provides methods for iterating and searching the parse tree in a similar way as Jquery. To report bugs please mail me at bhimsen.pes@gmail.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    dataflowkit

    Golang framework for scraping data from web pages

    Golang Web Scraper library for extracting data from web pages. Save results as CSV, JSON, XML
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB