Showing 41 open source projects for "page"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    ...This approach helps ensure that the saved page closely matches the original appearance and functionality. SingleFile CLI can be used for automated archiving, research, documentation, or offline reading workflows where preserving a page exactly as displayed is important.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    ...It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data extraction scenarios such as retrieving lists of titles, links, images, and other page elements from structured or semi-structured content. It also includes a powerful HTTP request system that enables complex operations such as simulated logins, proxy usage, and customized request headers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    ...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 5
    UI.Vision RPA

    UI.Vision RPA

    Open-Source RPA Software (formerly Kantu)

    ...UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first and only Chrome and Firefox extension (and Selenium IDE) that has "👁👁 eyes". A huge benefit of doing visual tests is that you are not just checking one element or two elements at a time, you’re checking a whole section or page in one visual assertion. The visual UI testing and browser automation commands of UI.Vision RPA help web designers and developers to verify and validate the layout of websites and canvas elements.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    ...Katana supports both standard HTTP crawling and headless browser crawling, allowing it to navigate modern web applications that rely heavily on JavaScript. Through headless browsing, it can analyze dynamic content and single-page applications built with modern frameworks, improving its ability to uncover hidden paths and assets. Katana offers flexible configuration options such as depth control, concurrency limits, and filtering mechanisms to refine results and manage scanning scope.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    ...Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. Because it uses asynchronous programming, Gain can handle multiple requests efficiently while minimizing blocking operations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    ...Developers define items and routes that determine how web pages are parsed and how the resulting data is exposed through the API interface. It also includes mechanisms for caching both page content and API requests, helping reduce repeated network calls and improving performance. Because the generated service is built on top of a Flask application, it can be deployed like any other Flask-based project and integrated into existing Python workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 10
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    ...It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. Developers can integrate the tool into applications through a REST API and multiple client SDKs, enabling automated data pipelines and AI data preparation workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Lux

    Lux

    Fast Go CLI tool for downloading videos from many streaming sites

    ...Written in the Go programming language, the project focuses on providing a fast and lightweight downloader that can retrieve media content directly from supported websites. Lux works by extracting video information from a given page and downloading the available streams to the user’s system. Lux supports downloading individual videos as well as playlists and can display multiple available quality options before the user selects which stream to download. It includes features for resuming interrupted downloads, allowing users to continue large downloads without starting over. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    NuzeBot

    NuzeBot

    Finds interesting news headlines.

    This is a bot to finds the news you want to see. It can be made to find the news that interests you and reject everything else. View on one page the most interesting headlines from many websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    ...In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ScrapBot 1.40 64bits

    ScrapBot 1.40 64bits

    Task automation software for accessing and manipulating website data.

    ...The system can control the accessed webpage through JavaScript, and the entire navigation can be viewed in the program window. The main.js script runs in a separate frame from the navigation frame but can access all page content without any restrictions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    crawly

    crawly

    High-level web crawling and scraping framework for Elixir apps

    Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ...It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model. ACHE also automatically learns how to prioritize links in order to efficiently locate relevant content while avoiding the retrieval of irrelevant pages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tholian Stealth

    Tholian Stealth

    Secure, Peer-to-Peer, Private and Automateable Web Browser

    Tholian Stealth is an open-source privacy-focused web browser and automation platform designed to combine secure browsing, web scraping, and proxy functionality into a unified system. It aims to prioritize user privacy and autonomy by minimizing tracking, blocking unnecessary requests, and restricting potentially harmful web technologies such as JavaScript execution. The platform operates as both a browser and a network service, capable of acting as a proxy, scraper, and content filtering...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    ...It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. It also automatically fills and submits forms, helping discover hidden routes or parameters that might otherwise be missed by traditional crawlers. crawlergo includes a built-in URL de-duplication system that removes repeated or pseudo-static links while maintaining fast crawling speeds for large websites. crawlergo also analyzes page content to extract links and resources from multiple sources, including JavaScript files, comments, and configuration files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    grab-site

    grab-site

    Web crawler for archiving and backing up sites into WARC archives

    ...Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Abot

    Abot

    Fast and flexible C# framework for building customizable web crawlers

    ...Abot follows a modular architecture that allows developers to customize nearly every stage of the crawl process by implementing or replacing core interfaces. Abot exposes an event-driven model that enables applications to react to crawling events such as page completion or crawl restrictions. It also provides configuration options that control crawling behavior including concurrency limits, crawl delays, and request parameters. Designed to be lightweight and dependency-free, Abot runs without requiring external services or databases, making it easy to integrate.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo