page free download - SourceForge

Showing 41 open source projects for "page"

View related business solutions

Web Scrapers Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
1

single-file-cli

CLI tool to save complete web pages as single self-contained HTML file

...This approach helps ensure that the saved page closely matches the original appearance and functionality. SingleFile CLI can be used for automated archiving, research, documentation, or offline reading workflows where preserving a page exactly as displayed is important.

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
2

QueryList

Progressive PHP web crawler framework with jQuery-like DOM parsing

...It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data extraction scenarios such as retrieving lists of titles, links, images, and other page elements from structured or semi-structured content. It also includes a powerful HTTP request system that enables complex operations such as simulated logins, proxy usage, and customized request headers. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
3

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 0 This Week

Last Update: 3 days ago
See Project
4

Spider

High-performance Rust web crawler and scraper for large-scale data

...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
5

UI.Vision RPA

Open-Source RPA Software (formerly Kantu)

...UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first and only Chrome and Firefox extension (and Selenium IDE) that has "👁👁 eyes". A huge benefit of doing visual tests is that you are not just checking one element or two elements at a time, you’re checking a whole section or page in one visual assertion. The visual UI testing and browser automation commands of UI.Vision RPA help web designers and developers to verify and validate the layout of websites and canvas elements.

Downloads: 6 This Week

Last Update: 2026-03-20
See Project
6

katana

Fast CLI web crawler for discovering endpoints in modern web apps

...Katana supports both standard HTTP crawling and headless browser crawling, allowing it to navigate modern web applications that rely heavily on JavaScript. Through headless browsing, it can analyze dynamic content and single-page applications built with modern frameworks, improving its ability to uncover hidden paths and assets. Katana offers flexible configuration options such as depth control, concurrency limits, and filtering mechanisms to refine results and manage scanning scope.

Downloads: 2 This Week

Last Update: 2026-05-05
See Project
7

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 0 This Week

Last Update: 2025-08-29
See Project
8

gain

Asyncio-based Python framework for building fast web crawling spiders

...Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. Because it uses asynchronous programming, Gain can handle multiple requests efficiently while minimizing blocking operations.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
9

Toapi

Convert websites into structured APIs automatically with Python tool

...Developers define items and routes that determine how web pages are parsed and how the resulting data is exposed through the API interface. It also includes mechanisms for caching both page content and API requests, helping reduce repeated network calls and improving performance. Because the generated service is built on top of a Flask application, it can be deployed like any other Flask-based project and integrated into existing Python workflows.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
10

Ferret

Declarative web scraping

A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
11

watercrawl

AI-ready web crawler that extracts and structures website content

...It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. Developers can integrate the tool into applications through a REST API and multiple client SDKs, enabling automated data pipelines and AI data preparation workflows.

Downloads: 1 This Week

Last Update: 2026-05-20
See Project
12

Scrapling

An adaptive Web Scraping framework

Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. ...

Downloads: 1 This Week

Last Update: 2026-06-07
See Project
13

wombat

Lightweight Ruby DSL for scraping structured data from web pages

Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
14

Lux

Fast Go CLI tool for downloading videos from many streaming sites

...Written in the Go programming language, the project focuses on providing a fast and lightweight downloader that can retrieve media content directly from supported websites. Lux works by extracting video information from a given page and downloading the available streams to the user’s system. Lux supports downloading individual videos as well as playlists and can display multiple available quality options before the user selects which stream to download. It includes features for resuming interrupted downloads, allowing users to continue large downloads without starting over. ...

Downloads: 3 This Week

Last Update: 2026-03-10
See Project
15

NuzeBot

Finds interesting news headlines.

This is a bot to finds the news you want to see. It can be made to find the news that interests you and reject everything else. View on one page the most interesting headlines from many websites.

Downloads: 0 This Week

Last Update: 2024-10-31
See Project
16

PHPScraper

A universal web-util for PHP

...In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.

Downloads: 0 This Week

Last Update: 2024-04-09
See Project
17

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t...

Downloads: 0 This Week

Last Update: 2023-07-19
See Project
18

ScrapBot 1.40 64bits

Task automation software for accessing and manipulating website data.

...The system can control the accessed webpage through JavaScript, and the entire navigation can be viewed in the program window. The main.js script runs in a separate frame from the navigation frame but can access all page content without any restrictions.

Downloads: 0 This Week

Last Update: 2023-08-01
See Project
19

crawly

High-level web crawling and scraping framework for Elixir apps

Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
20

ACHE Focused Crawler

ACHE is a web crawler for domain-specific search

...It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model. ACHE also automatically learns how to prioritize links in order to efficiently locate relevant content while avoiding the retrieval of irrelevant pages. ...

Downloads: 0 This Week

Last Update: 2023-04-12
See Project
21

Tholian Stealth

Secure, Peer-to-Peer, Private and Automateable Web Browser

Tholian Stealth is an open-source privacy-focused web browser and automation platform designed to combine secure browsing, web scraping, and proxy functionality into a unified system. It aims to prioritize user privacy and autonomy by minimizing tracking, blocking unnecessary requests, and restricting potentially harmful web technologies such as JavaScript execution. The platform operates as both a browser and a network service, capable of acting as a proxy, scraper, and content filtering...

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
22

crawlergo

Headless Chrome crawler for collecting URLs for vulnerability scans

...It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. It also automatically fills and submits forms, helping discover hidden routes or parameters that might otherwise be missed by traditional crawlers. crawlergo includes a built-in URL de-duplication system that removes repeated or pseudo-static links while maintaining fast crawling speeds for large websites. crawlergo also analyzes page content to extract links and resources from multiple sources, including JavaScript files, comments, and configuration files.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
23

grab-site

Web crawler for archiving and backing up sites into WARC archives

...Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.

Downloads: 2 This Week

Last Update: 2 days ago
See Project
24

AutoScraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.

Downloads: 0 This Week

Last Update: 2023-04-12
See Project
25

Abot

Fast and flexible C# framework for building customizable web crawlers

...Abot follows a modular architecture that allows developers to customize nearly every stage of the crawl process by implementing or replacing core interfaces. Abot exposes an event-driven model that enables applications to react to crawling events such as page completion or crawl restrictions. It also provides configuration options that control crawling behavior including concurrency limits, crawl delays, and request parameters. Designed to be lightweight and dependency-free, Abot runs without requiring external services or databases, making it easy to integrate.

Downloads: 0 This Week

Last Update: 5 days ago
See Project