Web Scrapers for Mac

View 24 business solutions

Browse free open source Web Scrapers and projects for Mac below. Use the toggles on the left to filter open source Web Scrapers by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Kemono Downloader

    Kemono Downloader

    Kemono Downloader - A cross-platform Python app built with PyQt6

    Welcome to Kemono Downloader, a versatile Python-based desktop application built with PyQt6, designed to download content from Kemono.su. This tool enables users to archive individual posts or entire creator profiles from services like Patreon, Fanbox, and more, supporting a wide range of file types with customizable settings and advanced features.
    Leader badge
    Downloads: 1,670 This Week
    Last Update:
    See Project
  • 2
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one by one. Go to the Help menu or check out website to get started. Note that this cross-platform version requires Java (minimum version Java 8) to be installed on your Operating System. For non-java required OS specific versions, check app's website.
    Leader badge
    Downloads: 333 This Week
    Last Update:
    See Project
  • 3
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. Scylla also runs a built-in HTTP forward proxy server that can dynamically select a recently validated proxy whenever a request is made. In addition to the API, the system provides a web-based interface where users can view available proxies and monitor their global distribution through a visual dashboard. It is commonly used by developers who need scalable proxy management when gathering data from the internet or building datasets for machine learning.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    Python API for JMComic

    Python API for JMComic

    Python crawler and API for downloading JMComic albums and images

    JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. It includes command-line functionality and configuration files so users can customize download behavior, directory structures, and performance settings without modifying code. It also supports plugin-based extensions that allow additional processing.
    Downloads: 14 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring and automated testing.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 6
    BrowserBox

    BrowserBox

    Remote isolated browser API for security

    Remote isolated browser API for security, automation visibility and interactivity. Run-on our cloud, or bring your own. Full scope double reverse web proxy with a multi-tab, mobile-ready browser UI frontend. Plus co-browsing, advanced adaptive streaming, secure document viewing and more! But only in the Pro version. BrowserBox is a full-stack component for a web browser that runs on a remote server, with a UI you can embed on the web. BrowserBox lets your provide controllable access to web resources in a way that's both more sandboxed than, and less restricted than, traditional web <iframe> elements. Build applications that need cross-origin access, while delivering complex user stories that benefit from an encapsulated browser abstraction. Since the whole stack is written in JavaScript you can easily extend it to suit your needs. The technology that puts unrestricted browser capabilities within reach of a web app has never before existed in the open.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    UI.Vision RPA

    UI.Vision RPA

    Open-Source RPA Software (formerly Kantu)

    The UI Vision RPA software is the tool for visual process automation, codeless UI test automation, web scraping and screen scraping. Automate tasks on Windows, Mac and Linux. The UI Vision RPA core is open-source with enterprise security. The free and open-source browser extension can be extended with local apps for desktop UI automation. UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first and only Chrome and Firefox extension (and Selenium IDE) that has "👁👁 eyes". A huge benefit of doing visual tests is that you are not just checking one element or two elements at a time, you’re checking a whole section or page in one visual assertion. The visual UI testing and browser automation commands of UI.Vision RPA help web designers and developers to verify and validate the layout of websites and canvas elements.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    Bili23 Downloader

    Bili23 Downloader

    Cross platform GUI tool for downloading videos from Bilibili sites

    Bili23-Downloader is an open source desktop application designed for downloading video content from the Bilibili platform. It provides a graphical interface that allows users to download various types of media including user-uploaded videos, series episodes, movies, and other hosted content. It focuses on ease of use with a zero-configuration setup, making it accessible to both beginners and experienced users. It supports high performance downloads through multi-threading and includes resume capabilities so interrupted downloads can continue without starting over. It can parse different types of links such as standard video pages, short links, and collection or activity pages to automatically retrieve downloadable media. It also allows users to choose video resolution, audio quality, and encoding format based on the available sources. Additional features include downloading subtitles, comments, metadata, and artwork associated with videos.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    go-dork

    go-dork

    Fast Go-based CLI scanner for running automated search engine dorks

    go-dork is an open source command-line tool designed to automate search engine dorking and reconnaissance tasks. Written in the Go programming language, it focuses on speed and efficiency when executing advanced search queries across multiple search engines. It allows users to run specialized queries, often referred to as “dorks,” to discover publicly exposed data, misconfigurations, or potentially vulnerable resources. It supports several major search engines and enables users to switch between them depending on the target or query requirements. go-dork can retrieve results from multiple pages of search results and process them sequentially for broader coverage during scans. go-dork also supports custom HTTP headers and proxy configuration, which can help users work around restrictions such as captchas or filtering mechanisms. Because it is a command-line tool, it can be integrated into automation pipelines or chained with other security tools to streamline reconnaissance workflows.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets. SingleFile CLI works by controlling a browser through the Chrome DevTools Protocol, rendering the page before extracting and packaging all necessary resources. This approach helps ensure that the saved page closely matches the original appearance and functionality. SingleFile CLI can be used for automated archiving, research, documentation, or offline reading workflows where preserving a page exactly as displayed is important.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are numerous use cases for CEF. Use it as a modern HTML5 based rendering engine that can act as a replacement for classic desktop GUI frameworks. Think of it as Electron for Python. Embed a web browser widget in a classic Qt / GTK / wxPython desktop application. Use it for automated testing of web applications with more advanced capabilities than Selenium web browser automation due to CEF low level programming APIs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    bilibili-manga-downloader

    bilibili-manga-downloader

    Download and manage Bilibili Manga chapters with GUI downloader

    BiliBili-Manga-Downloader is an open source desktop application designed to download manga chapters from the Bilibili Manga platform for offline reading and local management. It was created to address limitations of the web reading experience, such as intrusive advertisements, inconvenient image zooming, and inconsistent navigation during reading sessions. It provides a graphical user interface that allows users to search for manga titles using keywords, view detailed information about available series, and select chapters to download. BiliBili-Manga-Downloader supports multi-threaded downloading to improve performance and includes progress tracking with estimated time remaining for active downloads. It also offers multiple output formats, allowing chapters to be saved as image folders or compressed comic archive formats suitable for local manga readers.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser crawling, allowing it to navigate modern web applications that rely heavily on JavaScript. Through headless browsing, it can analyze dynamic content and single-page applications built with modern frameworks, improving its ability to uncover hidden paths and assets. Katana offers flexible configuration options such as depth control, concurrency limits, and filtering mechanisms to refine results and manage scanning scope.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    owllook

    owllook

    Vertical novel search engine with unified reading and tracking tools

    Owllook is an open source vertical search engine designed for discovering and reading online novels from multiple sources. Instead of redirecting users to different sites, the system parses content from many novel platforms and presents it in a unified reading interface. It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf. It aggregates results from multiple search engines and applies parsing rules to extract novel metadata, chapters, and content in a consistent format. Owllook also includes functionality for tracking reading history, displaying rankings based on search activity, and recommending books using a similarity-based approach. Owllook is built using asynchronous technologies to support efficient data retrieval and responsive interactions while reading or searching.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    pandora-box

    pandora-box

    Lightweight cross-platform desktop client for managing Mihomo proxies

    Pandora-Box is a lightweight desktop client designed to provide a graphical interface for the Mihomo proxy core. It allows users to manage proxy configurations and subscriptions through a simple and user-friendly interface rather than working directly with configuration files. Pandora-Box supports multiple proxy protocols and provides tools to organize and control network routing rules. It is designed to work for both casual users who want an easy setup and advanced users who need more control over proxy behavior. It also supports automatic rule grouping and features such as TUN mode to enable system-wide proxy routing. Pandora-Box focuses on delivering a clean interface with practical features for importing, managing, and converting proxy subscriptions. Pandora-Box combines a desktop interface with backend components to create a functional proxy management environment that simplifies complex networking configurations.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    proxypool is an open source proxy aggregation tool that automatically collects proxy node information from publicly available sources on the internet. It crawls different sources such as Telegram channels, subscription links, and publicly accessible web pages to gather proxy configurations. After collecting these nodes, proxypool processes them by removing duplicates and verifying whether each node is functional. proxypool then provides a usable list of proxy nodes that have passed availability checks. proxypool supports several popular proxy protocols, allowing it to work with multiple types of proxy infrastructures. The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    CommunityScrapers

    CommunityScrapers

    This is a public repository containing scrapers

    Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing users to enrich their media libraries with accurate and detailed information without manual entry. The project supports both automatic installation through in-app feeds and manual configuration for advanced use cases. Some scrapers require additional configuration such as API keys or cookies, highlighting its flexibility and adaptability to different sources.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. It aims to help developers understand the full workflow of web scraping, including request simulation, data extraction, storage, and handling anti-scraping techniques. It includes crawlers for platforms such as ecommerce marketplaces, blogging platforms, recruitment sites, and social networks, providing real-world practice scenarios. Developers can study the individual project documentation to understand the analysis process.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    bt-btt

    bt-btt

    Guide and resources for accessing and using the U3C3 BitTorrent site

    BT-btt is a repository that provides information and guidance related to the U3C3 BitTorrent resource site and its magnet-based content search ecosystem. It primarily serves as documentation describing how the site works, how users can access it, and how magnet link resources are organized and retrieved. It explains how BitTorrent and magnet link downloads operate, including the role of trackers and distributed hash table (DHT) networks in locating peers and downloading files. BT-btt also discusses different ways users can search for torrent resources, including strategies for improving search results when dealing with multiple language variants or different character encodings. Additional documentation explains why certain statistics such as active download counts may not always be accurate when DHT-based downloads are used.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    instagram-profilecrawl is a Python-based automation script designed to collect publicly available information from Instagram profiles. It crawls profile data such as follower counts, post information, hashtags, and other engagement-related metadata. It operates by automating a web browser using Selenium and performing requests to gather structured information from the platform. instagram-profilecrawl can analyze multiple usernames in a single run and store the extracted information locally in structured formats such as JSON. The collected data can include profile metadata, post details, engagement metrics, and commenter activity, allowing users to analyze account behavior or monitor profile growth over time. It also provides scripts for downloading images from crawled profiles and logging statistics into CSV files for tracking metrics like followers, likes, and comments. Authentication is optional, meaning the crawler can access public profile data without logging in.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB