27 projects for "linux proxy scraper" with 2 filters applied:

  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    pandora-box

    pandora-box

    Lightweight cross-platform desktop client for managing Mihomo proxies

    Pandora-Box is a lightweight desktop client designed to provide a graphical interface for the Mihomo proxy core. It allows users to manage proxy configurations and subscriptions through a simple and user-friendly interface rather than working directly with configuration files. Pandora-Box supports multiple proxy protocols and provides tools to organize and control network routing rules. It is designed to work for both casual users who want an easy setup and advanced users who need more...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    tumblr-crawler

    tumblr-crawler

    Python crawler to download photos and videos from Tumblr blogs

    tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    rnet

    rnet

    Python HTTP client with TLS and HTTP/2 fingerprint emulation support

    rnet is an ergonomic and modular Python HTTP client designed for developers who need advanced control over network requests and protocol behavior. It provides a flexible API for making HTTP requests while supporting both asynchronous and blocking workflows, allowing it to integrate easily into different Python applications and runtimes. rnet focuses on low-level protocol customization, giving users fine-grained control over TLS and HTTP/2 configuration in order to emulate specific browser...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Scweet

    Scweet

    Scrape tweets, profiles, followers and following from Twitter/X

    Scweet is a Python-based Twitter/X scraping library and CLI designed to collect tweets, profile timelines, followers, following lists, and user profile data without requiring the official Twitter/X API or a developer account. Instead of depending on deprecated unauthenticated scraping methods, it works by using X’s web GraphQL API together with authenticated browser cookies, which gives it a more current and practical approach for data extraction. The project supports a broad set of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Lux

    Lux

    Fast Go CLI tool for downloading videos from many streaming sites

    Lux is an open source command-line tool designed for downloading videos from a wide variety of online media platforms. Written in the Go programming language, the project focuses on providing a fast and lightweight downloader that can retrieve media content directly from supported websites. Lux works by extracting video information from a given page and downloading the available streams to the user’s system. Lux supports downloading individual videos as well as playlists and can display...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    autocrawler

    autocrawler

    Multiprocess Selenium crawler for downloading images by keywords

    AutoCrawler is a Python-based image crawling tool designed to automatically download large numbers of images from search engines using automated browser interaction. It uses Selenium and a Chrome browser driver to navigate image search pages and collect image sources based on keywords provided by the user. AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    go-dork

    go-dork

    Fast Go-based CLI scanner for running automated search engine dorks

    go-dork is an open source command-line tool designed to automate search engine dorking and reconnaissance tasks. Written in the Go programming language, it focuses on speed and efficiency when executing advanced search queries across multiple search engines. It allows users to run specialized queries, often referred to as “dorks,” to discover publicly exposed data, misconfigurations, or potentially vulnerable resources. It supports several major search engines and enables users to switch...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Tholian Stealth

    Tholian Stealth

    Secure, Peer-to-Peer, Private and Automateable Web Browser

    Tholian Stealth is an open-source privacy-focused web browser and automation platform designed to combine secure browsing, web scraping, and proxy functionality into a unified system. It aims to prioritize user privacy and autonomy by minimizing tracking, blocking unnecessary requests, and restricting potentially harmful web technologies such as JavaScript execution. The platform operates as both a browser and a network service, capable of acting as a proxy, scraper, and content filtering system for other applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    pspider

    pspider

    Simple Python framework for building multithreaded web crawlers

    PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 21
    ast-hook-forjs-re

    ast-hook-forjs-re

    AST-based JavaScript reverse engineering and variable tracing toolkit

    ast-hook-for-js-RE is an open source JavaScript reverse engineering toolkit designed to help analysts locate and understand client-side encryption logic used by web applications. It works by intercepting browser traffic through a local proxy server and modifying JavaScript code before it executes in the browser. Using Abstract Syntax Tree (AST) transformations, it injects hook functions into the code to monitor variable assignments and other runtime changes during execution. This allows...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    proxypool is an open source proxy aggregation tool that automatically collects proxy node information from publicly available sources on the internet. It crawls different sources such as Telegram channels, subscription links, and publicly accessible web pages to gather proxy configurations. After collecting these nodes, proxypool processes them by removing duplicates and verifying whether each node is functional. proxypool then provides a usable list of proxy nodes that have passed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    GoogleScraper

    GoogleScraper

    Python tool for scraping search engine results from many providers

    GoogleScraper is a Python-based tool designed to automatically collect and process search engine results from multiple providers. It enables developers and researchers to programmatically query search engines and extract useful information such as links, titles, and result descriptions. GoogleScraper supports several major search engines and can be used to gather structured datasets from search result pages for further analysis. It provides two different scraping approaches: sending direct...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    ProxyBroker

    ProxyBroker

    Asynchronous tool for finding and checking public proxy servers

    ProxyBroker is an open source Python tool designed to automatically discover and verify public proxy servers from many online sources. It operates asynchronously, allowing it to gather and test large numbers of proxies efficiently while performing multiple checks concurrently. It collects proxy addresses from dozens of providers and evaluates whether they are functional and suitable for use. It supports several proxy protocols, including HTTP, HTTPS, SOCKS4, and SOCKS5, making it flexible...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    mzitu is a Python-based web crawling project designed to automatically download and organize image galleries from a specific photography site. It demonstrates how to build a scraper that navigates gallery pages, retrieves image links, and saves the images locally in a structured directory layout. It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo