Showing 46 open source projects for "scraping"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    ...Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. Robots.txt support. Google App Engine support.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point. Because it’s published publicly under an open license, users are free to fork and adapt the code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 5
    Zendriver

    Zendriver

    A blazing fast, async-first, undetectable webscraping

    Zendriver is a modern Python web automation and scraping framework that leverages the Chrome DevTools Protocol to provide fast, asynchronous control over real browser instances. Unlike traditional tools that rely on Selenium or WebDriver, Zendriver communicates directly with the browser through CDP, enabling higher performance and more precise control over browser behavior. The framework is designed to be difficult to detect by anti-bot systems, making it suitable for advanced scraping and automation use cases where stealth is important. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    mtail

    mtail

    Extract internal monitoring data from application logs

    ...It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application. The extraction is controlled by mtail programs which define patterns and actions. Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket. Precompiled binaries for released versions are available in the Releases page on Github. Using the latest production release binary is the recommended way of installing mtail.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Elasticsearch Exporter

    Elasticsearch Exporter

    Elasticsearch stats exporter for Prometheus

    ...The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with --es.all and --es.indices. We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter using a dedicated job with its own scraping interval. Commandline parameters start with a single - for versions less than 1.1.0rc1. Username and password can be passed either directly in the URI or through the ES_USERNAME and ES_PASSWORD environment variables. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    saml2aws

    saml2aws

    CLI tool which enables you to login and retrieve AWS credentials

    CLI tool which enables you to log in and retrieve AWS temporary credentials using ADFS or PingFederate Identity Providers. Aside from Okta, most of the providers in this project are using screen scraping to log users into SAML, this isn't ideal and hopefully, vendors make this easier in the future.
    Downloads: 17 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Jikan REST

    Jikan REST

    The REST API for Jikan

    Jikan REST is an unofficial RESTful API for MyAnimeList.net, providing access to anime, manga, and user data by scraping the website. It allows developers to integrate MyAnimeList data into their applications without relying on the official API. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 14
    Async PHP

    Async PHP

    Easily run code asynchronously

    ...It helps optimize performance by executing long-running or resource-intensive tasks concurrently, instead of sequentially. The library is easy to use and integrates well with existing PHP applications, making it suitable for batch processing, data scraping, or any scenario where concurrency can boost efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Helium

    Helium

    Lighter web automation with Python

    ...It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DrissionPage

    DrissionPage

    Python based web automation tool. Powerful and elegant

    DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    reCAPTCHA

    reCAPTCHA

    PHP client library for reCAPTCHA, a free service

    ...The ecosystem supports mobile and enterprise variants, but the repo focuses on common web integrations and best practices for verifying the token securely. Deployed correctly, reCAPTCHA reduces credential stuffing, bot sign-ups, and scraping without degrading the experience for typical users.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    My Python Eggs

    My Python Eggs

    Python Examples

    ...Rather than being a single cohesive application, it functions as a repository of utilities that demonstrate how Python can be used to solve everyday problems and automate repetitive tasks. The scripts cover a wide range of topics, including file management, networking, system monitoring, web scraping, and even simple games, making it a versatile learning resource. Many of the programs are designed to reduce manual workload by automating tasks such as renaming files, scanning directories, or checking system information. The repository also includes examples of more advanced concepts like multithreading, API interaction, and GUI development, providing a gradual learning curve for beginners.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Gooo

    Gooo

    Toolkit for developing web applications in Vue, Templ, and Go

    Gooo is an open-source project that focuses on providing tools and utilities for interacting with and analyzing online content, particularly in the context of automation and data retrieval workflows. The repository appears to function as a lightweight utility toolkit that can be adapted for specific use cases such as scraping, automation, or content processing. It is structured to allow developers to customize and extend its functionality depending on their needs, rather than acting as a fully packaged end-user application. The project emphasizes simplicity and flexibility, enabling users to integrate its components into scripts or larger systems. While not as feature-heavy as enterprise frameworks, it serves as a foundation for experimentation and rapid prototyping in data extraction or automation tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    chrome-remote-interface

    chrome-remote-interface

    Chrome Debugging Protocol interface for Node.js

    ...By connecting to a running browser instance with remote debugging enabled, developers can inspect network activity, manipulate pages, capture screenshots, automate navigation, and analyze performance metrics. The library is commonly used for building browser automation scripts, testing tools, web scraping systems, and custom developer tooling that interacts directly with Chrome internals. Unlike higher-level tools such as Puppeteer, Chrome Remote Interface focuses on exposing the raw protocol in a more flexible and transparent way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    User Agents

    User Agents

    A JavaScript library for generating random user agents with data

    User Agents is a JavaScript library that generates realistic and up-to-date user agent strings and browser fingerprints based on real-world usage data. The library is designed to help developers simulate authentic browser traffic patterns, which is particularly useful in web scraping, testing, and automation scenarios. Unlike simpler random user agent generators, it uses frequency-weighted datasets to ensure that generated values reflect how browsers are actually used in the wild. The dataset is updated automatically on a daily basis, ensuring that generated user agents remain current and relevant over time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    BrowserBox

    BrowserBox

    Remote isolated browser API for security

    Remote isolated browser API for security, automation visibility and interactivity. Run-on our cloud, or bring your own. Full scope double reverse web proxy with a multi-tab, mobile-ready browser UI frontend. Plus co-browsing, advanced adaptive streaming, secure document viewing and more! But only in the Pro version. BrowserBox is a full-stack component for a web browser that runs on a remote server, with a UI you can embed on the web. BrowserBox lets your provide controllable access to web...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 24
    prometheus-net

    prometheus-net

    .NET library to instrument your code with Prometheus metrics

    This is a .NET library for instrumenting your applications and exporting metrics to Prometheus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Quoter

    Quoter

    Quoter - The Console Based Stock Quote Tool

    Quoter is a small command line tool to fetch stock quotes. In order to minimize HTML scraping, it retrieves quotes from IEXCloud. You can signup for free and get 500k stock quotes per month. Please check their usage agreements prior to signing up and ensure you are allowed to user their service. After getting an account, log into the dashboard and you can see your API tokens. You'll need the secret token to use this program.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB