Showing 686 open source projects for "web pages"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Translate Web Pages

    Translate Web Pages

    Translate your page in real time using Google or Yandex

    ...You can select to automatically translate. To change the translation engine just touch the Google Translate icon. To translate any website it is necessary to access and modify the text of the web pages. And the extension can only do that, with that permission. The pages are translated using the Google or Yandex translation engine (you choose). We do not collect any information. However, to translate, the contents of the web pages will be sent to Google or Yandex servers. You can also install via crx file, download the file using a download manager/or firefox. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 2
    Web Archives

    Web Archives

    Browser extension for viewing archived and cached versions of websites

    Browser extension for viewing archived and cached versions of web pages, available for Chrome, Edge and Safari. Web Archives is a browser extension that enables you to find archived and cached versions of web pages, and comes with support for more than 10 search engines. Searches can be initiated from the context menu and the browser toolbar. A diverse set of archive and cache sources are supported, which can be toggled and reordered from the extension's options. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Playwright

    Playwright

    Node library to automate Chromium, Firefox & WebKit with a single API

    Playwright is a Node library for automating Chromium, Firefox and WebKit using a single API. It supports headless execution for all these browsers on Linux, macOS and Windows, providing automated web browser interactions that are fast, capable, reliable and ever-green. Playwright enables a broad spectrum of cross-browser web automation capabilities, which are used by Single Page Apps and Progressive Web Apps. These include scenarios that span multiple pages, domains and iframes; emulation of mobile devices, geolocation, and permissions; upload and download files and many more.
    Downloads: 142 This Week
    Last Update:
    See Project
  • 8
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    academicpages.github.io

    academicpages.github.io

    Github Pages template based upon HTML and Markdown for personal

    AcademicPages is a ready-made Jekyll theme for academics to build personal websites, blogs, and CV pages. It includes features like publication lists, project showcases, writing blogs, and optimized layouts for easier GitHub Pages deployment. With support for LaTeX rendering, RSS feeds, and responsive design, it's popular among students, researchers, and educators looking to create professional web presences without coding from scratch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Lightpanda Browser

    Lightpanda Browser

    Lightpanda: the headless browser designed for AI and automation

    Lightpanda is an open-source headless browser designed specifically for automation, artificial intelligence workflows, and large-scale web interaction tasks. Unlike traditional browsers that include full graphical rendering engines meant for human users, Lightpanda is built from scratch to operate entirely in headless mode, focusing only on the components required for programmatic web interaction. This design allows it to execute JavaScript and interact with web pages while avoiding the overhead associated with rendering images, fonts, and layout elements intended for visual display. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 13
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Lighthouse

    Lighthouse

    Automated auditing, performance metrics, & best practices for the web

    Lighthouse is an open-source, automated tool that analyzes and audits web apps and web pages in order to improve their quality. Lighthouse collects modern performance metrics and insights on developer best practices; auditing for performance, accessibility, SEO and more. After auditing it produces a report either in JSON or HTML. Included in the report is a reference doc that explains the importance of the audit and how to fix the problem areas, which you can use to improve the web app or web page. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Bili23 Downloader

    Bili23 Downloader

    Cross platform GUI tool for downloading videos from Bilibili sites

    Bili23-Downloader is an open source desktop application designed for downloading video content from the Bilibili platform. It provides a graphical interface that allows users to download various types of media including user-uploaded videos, series episodes, movies, and other hosted content. It focuses on ease of use with a zero-configuration setup, making it accessible to both beginners and experienced users. It supports high performance downloads through multi-threading and includes resume...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    Min

    Min

    A fast, minimal browser that protects your privacy

    Tabs in Min take up less space, giving you more room to browse the web. Pages you haven’t looked at in a while fade out, letting you see what’s important, and Focus Mode hides your other tabs to prevent you from getting distracted. See quick definitions and answers with information from DuckDuckGo, including Wikipedia entries and more. Jump to any site quickly with fuzzy search. Or search through the full text of every page you've visited, even if you don't remember the title. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 17
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Adguard Browser Extension

    Adguard Browser Extension

    AdGuard browser extension

    AdGuard is a fast and lightweight ad-blocking browser extension that effectively blocks all types of ads and trackers. AdGuard is a fast and lightweight ad blocking browser extension that effectively blocks all types of ads and trackers on all web pages. We focus on advanced privacy protection features to not just block known trackers, but prevent web sites from building your shadow profile. Unlike its standalone counterparts (AG for Windows, Mac), the browser extension is completely free and open source. You can learn more about the difference here. AdGuard does not collect any information about you, and does not participate in any acceptable ads program. ...
    Downloads: 43 This Week
    Last Update:
    See Project
  • 19
    Dev Browser

    Dev Browser

    A Claude Skill to give your agent the ability to use a web browser

    Dev Browser is a browser automation skill/plugin that enables an AI agent to control a real browser for verification and testing during development. Its purpose is to close the gap between “code was written” and “the UI actually works,” by letting the agent navigate, interact with pages, and validate behavior in a live environment. A key idea is persistence: the browser can keep pages open so the agent can navigate once and then perform multiple interactions across scripts without losing...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Zenario

    Zenario

    Zenario is a web-based content management system (CMS)

    Zenario is a web-based content management system (CMS). It can be used for simple sites, with many "wysiwyg" features for making regular web pages, news items, blogs, and so on. It has powerful features for running extranet sites, such as customer portals, and online databases (e.g. of products, documents or videos). It also has multilingual features built in from the core, so that a site can easily be set up to deliver content in in multiple languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    ...It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Mink

    Mink

    PHP web browser emulator abstraction

    ...Mink is commonly used in behavior-driven development workflows, particularly with frameworks like Behat, where it helps simulate real user behavior such as clicking links, filling forms, and navigating pages. The library supports session management, allowing multiple browser sessions to run simultaneously and interact with different pages or environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Dillo

    Dillo

    Dillo, a multi-platform graphical web browser

    Dillo is a lightweight, minimal graphical web browser, designed for speed, low resource usage, and privacy. It is written in C and C++ using the FLTK (Fast Light Toolkit) GUI library. Its goals include enabling web access on old or constrained hardware, using slow or unreliable network connections, minimizing dependencies, and avoiding many of the complexities and overheads of modern full-featured browsers. It omits many modern features (notably JavaScript), instead focusing on rendering...
    Downloads: 26 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB