Showing 79 open source projects for "scraping"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Toonily Downloader

    Toonily Downloader

    A python tool for downloading manga from Toonily

    Toonily Downloader is a Python-based scraping and downloading tool designed specifically for manga and manhwa hosted on Toonily, enabling users to fetch entire series efficiently while preserving original image quality and structure. It provides both a command-line interface and a graphical user interface, making it accessible for both technical and non-technical users. The software supports downloading full series or selected chapters by parsing Toonily URLs and organizing content into clean, chapter-based directories. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 3
    Web RPA

    Web RPA

    Web Robotics Process Automation Tool

    ...The system focuses on simplicity and flexibility, allowing automation without requiring complex infrastructure. It supports interaction with web elements, navigation flows, and dynamic content handling, making it suitable for scraping and automation scenarios. WebRPA can be integrated into larger systems or used as a standalone tool for automating browser-based operations. Its lightweight design ensures efficient execution while maintaining adaptability for different use cases. Overall, it provides a practical solution for automating web workflows and repetitive tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Helium

    Helium

    Lighter web automation with Python

    ...It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    DrissionPage

    DrissionPage

    Python based web automation tool. Powerful and elegant

    DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Xianyu Intelligent Monitor Bot

    Xianyu Intelligent Monitor Bot

    AI tool for real-time monitoring and analysis of Goofish listings

    ai-goofish-monitor is an open source automation tool designed to monitor listings on the Goofish second-hand marketplace and analyze them using artificial intelligence. It combines browser automation with AI-based analysis to automatically search, collect, and evaluate newly posted items that match a user’s purchase criteria. It uses Playwright to simulate real user interactions with the marketplace, allowing the system to retrieve product data and track updates in near real time....
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    Yahoo! Finance market data downloader

    Yahoo! Finance market data downloader

    Yahoo! Finance market data downloader

    ...finance decommissioned their historical data API, many programs that relied on it to stop working. yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance. yfinance aimed to offer a temporary fix to the problem by scraping the data from Yahoo! Finance and returning a the data in the same format as pandas_datareader's get_data_yahoo(), thus keeping the code changes in existing software to a minimum. The latest version of yfinance is a complete re-write of the libray, offering a reliable method of downloading historical market data from Yahoo! Finance, up to 1 minute granularity, with a more Pythonic way. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    ...It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. It includes multiple usage modes such as a desktop GUI, a web service interface, and a command line tool for flexible deployment. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    linkedin2username

    linkedin2username

    Generate probable usernames from LinkedIn company employee lists

    linkedin2username is an open source OSINT (Open Source Intelligence) tool designed to generate lists of potential usernames by scraping employee information from a company’s LinkedIn page. It logs into LinkedIn using valid user credentials and collects publicly visible employee names associated with a specified organization. Using these names, it automatically generates multiple possible username formats that organizations commonly use for accounts or email addresses.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    My Python Eggs

    My Python Eggs

    Python Examples

    ...Rather than being a single cohesive application, it functions as a repository of utilities that demonstrate how Python can be used to solve everyday problems and automate repetitive tasks. The scripts cover a wide range of topics, including file management, networking, system monitoring, web scraping, and even simple games, making it a versatile learning resource. Many of the programs are designed to reduce manual workload by automating tasks such as renaming files, scanning directories, or checking system information. The repository also includes examples of more advanced concepts like multithreading, API interaction, and GUI development, providing a gradual learning curve for beginners.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    yt-fts

    yt-fts

    Search all of YouTube from the command line

    yt-fts, short for YouTube Full Text Search, is an open-source command-line tool that enables users to search the spoken content of YouTube videos by indexing their subtitles. The program automatically downloads subtitles from a specified YouTube channel using the yt-dlp utility and stores them in a local SQLite database. Once indexed, users can perform full-text searches across all transcripts to quickly locate keywords or phrases mentioned within the videos. The tool returns search results...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content. The API is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    rnet

    rnet

    Python HTTP client with TLS and HTTP/2 fingerprint emulation support

    ...It is powered by the underlying wreq engine and is built with performance and modularity in mind. rnet also supports advanced networking capabilities such as proxy rotation, connection pooling, and streaming transfers, which make it suitable for automation, scraping, and high-performance network.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and improving markdown conversion, reflecting active community use in research flows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    pagodo

    pagodo

    Automate Google Hacking Database scraping and searching

    pagodo automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces manually performing Google dork searches with a web GUI browser. There are 2 parts. The first is ghdb_scraper.py that retrieves the latest Google dorks and the second portion is pagodo.py that leverages the information gathered by ghdb_scraper.py. This version of pagodo also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    YoungerSibling

    YoungerSibling: Cross-platform OSINT tool for quick data gathering.

    ...It provides a set of useful tools to perform tasks like searching the web, performing lookups (Google search, IP lookup, username lookup, etc.), and extracting metadata from images, directly from the terminal. This project aims to help students, developers, and hobbyists learn about web scraping, API usage, and terminal interaction with Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    OnionSearch

    OnionSearch

    Search multiple Tor .onion engines at once and collect hidden links.

    OnionSearch is a Python-based command-line tool designed to collect and aggregate links from multiple search engines on the Tor network. The script works by scraping results from a variety of .onion search services, allowing users to perform a single query while gathering results from many sources at once. This approach helps researchers and investigators locate hidden services more efficiently without manually querying each individual search engine. It is primarily intended for educational use and open-source intelligence (OSINT) research involving the Tor network. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    CrossLinked

    CrossLinked

    LinkedIn employee enumeration tool using search engine scraping

    CrossLinked is an open source LinkedIn enumeration tool designed to collect employee names associated with a target organization. Instead of accessing LinkedIn directly or relying on its API, it performs search engine scraping using services such as Google and Bing to discover public LinkedIn profile results. By analyzing these search results, CrossLinked extracts employee names and processes them into usable formats for security assessments or reconnaissance activities. This approach allows the tool to operate without credentials, authentication tokens, or LinkedIn account access. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    URS (Universal Reddit Scraper)

    URS (Universal Reddit Scraper)

    A comprehensive Reddit scraping command-line tool written in Python

    Universal Reddit Scraper, a comprehensive Reddit scraping command-line tool written in Python. Whether you are using URS for enterprise or personal use, I am very interested in hearing about your use case and how it has helped you achieve a goal. This is a comprehensive Reddit scraping tool that integrates multiple features. All files except for those generated by the wordcloud tool are exported to JSON by default.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Pentest-Tools

    Pentest-Tools

    A collection of custom security tools for quick needs.

    Pentest-Tools is a collection of penetration testing scripts and utilities designed to help security professionals and ethical hackers perform vulnerability assessments. It includes a wide range of tools for tasks like web scraping, reconnaissance, data extraction, and network analysis. The suite is modular, allowing users to choose the tools that best fit their specific pentesting needs, from web application analysis to network penetration testing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Moriarty Project

    Moriarty Project

    Web-based OSINT tool for investigating phone number information

    ...It allows users to input a phone number and analyze various details related to that number through multiple investigation features. It performs information gathering by scraping data from online sources to retrieve insights such as owner information, spam risk, and related web references. Users can select specific investigation features to run individually or execute all available checks at once depending on their needs. Moriarty Project operates through a browser-based interface and includes multithreading improvements that help speed up the investigation process. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    ...The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases. In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction. The framework also includes machine learning algorithms that support classification, clustering, and vector space modeling for text analysis tasks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB