Browse free open source Go Web Scrapers and projects below. Use the toggles on the left to filter open source Go Web Scrapers by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 1
    Lux

    Lux

    Fast Go CLI tool for downloading videos from many streaming sites

    Lux is an open source command-line tool designed for downloading videos from a wide variety of online media platforms. Written in the Go programming language, the project focuses on providing a fast and lightweight downloader that can retrieve media content directly from supported websites. Lux works by extracting video information from a given page and downloading the available streams to the user’s system. Lux supports downloading individual videos as well as playlists and can display multiple available quality options before the user selects which stream to download. It includes features for resuming interrupted downloads, allowing users to continue large downloads without starting over. It also provides network-related options such as proxy support and cookies to access restricted or authenticated content. With its modular architecture and command-line interface, Lux can function both as a standalone downloader and as a library.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser crawling, allowing it to navigate modern web applications that rely heavily on JavaScript. Through headless browsing, it can analyze dynamic content and single-page applications built with modern frameworks, improving its ability to uncover hidden paths and assets. Katana offers flexible configuration options such as depth control, concurrency limits, and filtering mechanisms to refine results and manage scanning scope.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    go-dork

    go-dork

    Fast Go-based CLI scanner for running automated search engine dorks

    go-dork is an open source command-line tool designed to automate search engine dorking and reconnaissance tasks. Written in the Go programming language, it focuses on speed and efficiency when executing advanced search queries across multiple search engines. It allows users to run specialized queries, often referred to as “dorks,” to discover publicly exposed data, misconfigurations, or potentially vulnerable resources. It supports several major search engines and enables users to switch between them depending on the target or query requirements. go-dork can retrieve results from multiple pages of search results and process them sequentially for broader coverage during scans. go-dork also supports custom HTTP headers and proxy configuration, which can help users work around restrictions such as captchas or filtering mechanisms. Because it is a command-line tool, it can be integrated into automation pipelines or chained with other security tools to streamline reconnaissance workflows.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    GoSpider

    GoSpider

    Gospider - Fast web spider written in Go

    GoSpider - Fast web spider written in Go. Fast web crawling. Brute force and parse sitemap.xml. Parse robots.txt. Generate and verify link from JavaScript files. Link Finder. Find AWS-S3 from response source. Find subdomains from the response source. Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault. Format output easy to Grep. Support Burp input. Crawl multiple sites in parallel.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    proxypool is an open source proxy aggregation tool that automatically collects proxy node information from publicly available sources on the internet. It crawls different sources such as Telegram channels, subscription links, and publicly accessible web pages to gather proxy configurations. After collecting these nodes, proxypool processes them by removing duplicates and verifying whether each node is functional. proxypool then provides a usable list of proxy nodes that have passed availability checks. proxypool supports several popular proxy protocols, allowing it to work with multiple types of proxy infrastructures. The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate with each other via gRPC (a RPC framework). Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs (this can lead to bogus results) Make use of HTTP_PROXY / HTTPS_PROXY environment values + handle proxy auth. Directory-only scan mode (aka fast-scan)
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. The command-line interface supports configuration options such as proxy settings, custom user agents, and cookies, giving users flexibility when cloning websites.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. Geziyor includes features for managing requests, handling cookies, respecting robots rules, and exporting collected data in multiple formats. With built-in tools for caching, metrics collection, and proxy management, it enables developers to build robust and customizable scraping systems using Go.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Maxun

    Maxun

    Small event-delegation library for decoupling event binding and handli

    Maxun named JsAction by Google serves as a lightweight event delegation library built in JavaScript. It allows developers to separate the logic of binding events from the code that handles those events, helping to keep DOM event wiring cleaner and more maintainable. It is archived and marked as read-only, indicating that the project is no longer actively maintained or intended for production use. The README states that ongoing development has migrated into a larger framework under the Angular project. It includes modules for dispatching events, for capturing native events, for custom event details, and for action flows. Because it is purely JavaScript (and uses HTML for test harnesses), it is suited for web browsers and front-end use. Although deprecated, it can still serve as a reference for how to architect event delegation and binding abstractions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    pandora-box

    pandora-box

    Lightweight cross-platform desktop client for managing Mihomo proxies

    Pandora-Box is a lightweight desktop client designed to provide a graphical interface for the Mihomo proxy core. It allows users to manage proxy configurations and subscriptions through a simple and user-friendly interface rather than working directly with configuration files. Pandora-Box supports multiple proxy protocols and provides tools to organize and control network routing rules. It is designed to work for both casual users who want an easy setup and advanced users who need more control over proxy behavior. It also supports automatic rule grouping and features such as TUN mode to enable system-wide proxy routing. Pandora-Box focuses on delivering a clean interface with practical features for importing, managing, and converting proxy subscriptions. Pandora-Box combines a desktop interface with backend components to create a functional proxy management environment that simplifies complex networking configurations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    GOPA

    GOPA

    GOPA, a spider written in Golang, for Elasticsearch

    GOPA, a spider written in Golang, for Elasticsearch. Lightweight, low footprint, memory requirement should, be 100MB. Easy to deploy, no runtime or dependency required. Easy to use, no programming or script ability needed, out-of-box features. First of all, get it, two opinions: download the pre-built package or compile it yourself. Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program. It's safety to press ctrl+c to stop the current running Gopa, Gopa will handle the rest, saving the checkpoint, you may restore the job later, but the world is still in your hand.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Till

    Till

    DataHen Till is a companion tool to your existing web scraper

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes. Web scraping is usually easy to get started, especially on a small scale. However, as you try to scale it up, it gets exponentially difficult. Scraping 10,000 records can easily be done with simple web scraper scripts in any programming language, but as you try to scrape millions of pages, you would need to architect and build features on your web scraping script that allows you to scale, maintain and unblock your scrapers. Scraping to millions or even billions of records requires much more pre-planning. It's not simply running your existing web scraper script in a bigger CPU/Ram machine. More thoughts are needed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    crawlergo is a browser-based web crawler designed to collect URLs and request data that can be used by web vulnerability scanning tools. It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. It also automatically fills and submits forms, helping discover hidden routes or parameters that might otherwise be missed by traditional crawlers. crawlergo includes a built-in URL de-duplication system that removes repeated or pseudo-static links while maintaining fast crawling speeds for large websites. crawlergo also analyzes page content to extract links and resources from multiple sources, including JavaScript files, comments, and configuration files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    dataflowkit

    Golang framework for scraping data from web pages

    Golang Web Scraper library for extracting data from web pages. Save results as CSV, JSON, XML
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests concurrently using Go’s goroutines, allowing efficient and scalable page retrieval across multiple URLs. Developers have full control over the crawling workflow, including which URLs are visited, inspected, and processed during execution. gocrawl integrates with HTML parsing tools so responses can be inspected and queried in a structured way while crawling. Instead of implementing a full search indexing pipeline, the library provides the core crawling engine and extension hooks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB