169 projects for "process" with 2 filters applied:

  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments including local machines, shared hosting environments, and cloud servers. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    ...Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. It supports advanced capabilities such as headless browser rendering, background crawling tasks, and configurable rules that control crawl depth or ignored paths. These capabilities make the project suitable for building search indexers, data extraction pipelines, & SEO analysis tools.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    ...It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. It includes multiple usage modes such as a desktop GUI, a web service interface, and a command line tool for flexible deployment. In addition to data collection, it supports incremental updates so users can track and gather newly published content without reprocessing previously collected data.
    Downloads: 13 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Nipe

    Nipe

    An engine to make Tor network your default gateway

    Nipe is a Perl-based engine whose primary aim is to make the Tor network act as the default gateway for outgoing traffic. In practice, it configures system firewall rules (iptables) and network routing so that almost all IPv4 traffic is redirected through Tor. The tool provides commands such as install, start, stop, restart, and status to manage its behavior. When “start” is issued, it sets up the necessary rules; when “stop” is used, it attempts to remove those rules. Nipe currently...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    Sing-box

    Sing-box

    Sing-box multi-protocol proxy tool

    Sing-box is a shell-script-based deployment project for setting up multi-protocol sing-box proxy environments across different hosting platforms and VPS scenarios. It is built around quick, automated installation rather than manual configuration, making it useful for users who want a bundled setup process. The project supports several proxy protocols, including VLESS Reality, VMess over WebSocket and TLS, Hysteria2, and Tuic. It also includes optional Nezha monitoring integration, which can help users observe node status from a monitoring panel. The scripts reference support for platforms such as Serv00, CT8, Hostuno, VPS environments, and Alpine-based systems. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata. It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    Browserless

    Browserless

    The headless Chrome/Chromium driver on top of Puppeteer

    Browserless is an open-source headless browser automation library and service built on top of Puppeteer that simplifies the process of running and scaling Chromium-based browser tasks in production environments. It provides a high-level API for interacting with headless Chrome, allowing developers to perform operations such as generating PDFs, capturing screenshots, extracting text or HTML, and automating web navigation. The project is designed to act as a production-ready abstraction layer over Puppeteer, offering improved reliability, error handling, and scalability for real-world applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    ...Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. Geziyor includes features for managing requests, handling cookies, respecting robots rules, and exporting collected data in multiple formats. With built-in tools for caching, metrics collection, and proxy management, it enables developers to build robust and customizable scraping systems using Go.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. Developers define items and routes that determine how web pages are parsed and how the resulting data is exposed through the API interface. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    SEO GEO Content Engine

    SEO GEO Content Engine

    Professional SEO and GEO content workflows for brands, SaaS teams

    ...It builds on tools like GEO Content Writer and SEO GEO Audit to create an end-to-end workflow for producing and refining search-optimized content. The system automates the process of identifying opportunities, generating content, and validating its effectiveness based on modern search criteria. It emphasizes alignment with both traditional SEO and AI-driven search systems, ensuring content remains competitive in evolving environments. The engine supports iterative improvement, allowing content to be continuously updated and refined based on performance data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Factorio

    Factorio

    Factorio headless server in a Docker container

    ...Factorio is a factory-building simulation game in which players automate production lines, research technologies, and manage complex industrial systems, and the repository focuses specifically on hosting the game server in a containerized environment. By packaging the server into a Docker image, the project simplifies the process of deploying and maintaining multiplayer servers across different operating systems and cloud environments. The container automatically handles dependencies, configuration, and updates, making it easier for system administrators and hobbyists to run dedicated Factorio servers. Users can configure server settings, maps, mods, and saved games through mounted volumes and environment variables within the container.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets. SingleFile CLI works by...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    tumblr-crawler

    tumblr-crawler

    Python crawler to download photos and videos from Tumblr blogs

    tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    videodl

    videodl

    Lightweight Python tool for downloading videos from many platforms

    ...Videodl can integrate with external command-line utilities to improve downloading performance, handle streaming formats such as HLS, and manage encrypted or segmented media streams. Additional utilities can also enable faster downloads, resume interrupted transfers, and process complex playlist structures.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    yudao-cloud

    yudao-cloud

    New Cloud version of Ruoyi-Vue-Pro optimized to refactor all features

    ...It delivers a full-stack solution that combines a Spring-based backend, MyBatis Plus for data access, and a Vue + Element-based admin front-end, along with user-facing mini-programs. The system targets enterprise scenarios and includes modules for RBAC-based dynamic permissions, multi-tenant SaaS capabilities, data permissions, and workflow/process engines. On top of the core platform, it provides integrated subsystems such as third-party login, payment, SMS, e-commerce, CRM, ERP and even AI large-model integrations. The README highlights a wide ecosystem of supporting components like Redis, MySQL, Elasticsearch, RocketMQ, Nacos, Seata, and more, giving a ready-made foundation for complex distributed applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. The DSL approach helps make scraping definitions more readable and maintainable, especially when dealing with multiple fields or nested data structures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    ...It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. DotnetSpider is modular, allowing different components such as request schedulers, downloaders, and storage systems to work together in a flexible workflow. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Expat XML Parser

    Fast XML parser library in C

    PLEASE NOTE that we are in the process of moving to GitHub: https://github.com/libexpat/libexpat This is James Clark's Expat XML parser library in C. It is a stream oriented parser that requires setting handlers to deal with the structure that the parser discovers in the document. PLEASE NOTE that we are in the process of moving to GitHub: https://github.com/libexpat/libexpat
    Leader badge
    Downloads: 481 This Week
    Last Update:
    See Project
  • 25
    Interleave is a business process management application. It enables you to model your business process and make it available online. It's meant to replace processes which currently rely on paper or spreadsheets and it has a good workflow engine.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo