Showing 1577 open source projects for "html source extractor"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    ...Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. The extractor tries to strike a balance between limiting noise (precision) and including all valid parts (recall). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Maxun

    Maxun

    Small event-delegation library for decoupling event binding and handli

    Maxun named JsAction by Google serves as a lightweight event delegation library built in JavaScript. It allows developers to separate the logic of binding events from the code that handles those events, helping to keep DOM event wiring cleaner and more maintainable. It is archived and marked as read-only, indicating that the project is no longer actively maintained or intended for production use. The README states that ongoing development has migrated into a larger framework under the...
    Downloads: 86 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Happy DOM

    Happy DOM

    Happy DOM is a JavaScript implementation of a web browser

    Happy DOM is a JavaScript implementation of a web browser without its graphical user interface. It includes many web standards from WHATWG DOM and HTML. The goal of Happy DOM is to emulate enough of a web browser to be useful for testing, scraping web sites, and server-side rendering. Happy DOM focuses heavily on performance and can be used as an alternative to JSDOM. Happy DOM now supports Declarative Shadow DOM which can be used for server-side rendering of web components. This package...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 7
    openvpn-monitor

    openvpn-monitor

    openvpn-monitor is a web based OpenVPN monitor

    openvpn-monitor is a simple Python program to generate HTML that displays the status of an OpenVPN server, including all current connections. It uses the OpenVPN management console. It typically runs on the same host as the OpenVPN server, however, it does not necessarily need to. OpenVPN-monitor is a web-based OpenVPN monitor, that shows current connection information, such as users, location, and data transferred.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    geckodriver

    geckodriver

    WebDriver for Firefox

    geckodriver is an implementation of WebDriver, and WebDriver can be used for widely different purposes. How you invoke geckodriver largely depends on your use case. If you are using geckodriver through Selenium, you must ensure that you have version 3.11 or greater. Because geckodriver implements the W3C WebDriver standard and not the same Selenium wire protocol older drivers are using, you may experience incompatibilities and migration problems when making the switch from FirefoxDriver to...
    Downloads: 68 This Week
    Last Update:
    See Project
  • 9
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Beacon

    Beacon

    Open-source Content Management System (CMS)

    Beacon is a modern open-source CMS built with Phoenix LiveView, offering fast server-rendered HTML for content-heavy pages with LiveView interactivity layered on top. It includes runtime content reloading, SEO-optimized rendering, and an admin interface (Beacon LiveAdmin) for managing pages, layouts, and components in a cluster-friendly setup. Developed by DockYard, Beacon aims to deliver high performance content sites fully within the Elixir ecosystem.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 12
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    Lighthouse

    Lighthouse

    Automated auditing, performance metrics, & best practices for the web

    Lighthouse is an open-source, automated tool that analyzes and audits web apps and web pages in order to improve their quality. Lighthouse collects modern performance metrics and insights on developer best practices; auditing for performance, accessibility, SEO and more. After auditing it produces a report either in JSON or HTML. Included in the report is a reference doc that explains the importance of the audit and how to fix the problem areas, which you can use to improve the web app or web page. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    Eruda

    Eruda

    Console for mobile browsers

    With Eruda you can display JavaScript logs, check dom state, show requests status, show localStorage, cookie information, show url, user agent info, include snippets used most often, Html, js, css source viewer, and install. The JavaScript file size is quite huge(about 100kb gzipped) and therefore not suitable to include in mobile pages. It's recommended to make sure eruda is loaded only when eruda is set to true. When initialization, a configuration object can be passed in. Container element, if not set, it will append an element directly under html root element. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    Winter

    Winter

    Free, open-source, self-hosted CMS platform based on the Laravel PHP

    ...Build intricate websites with little more than HTML, CSS and JavaScript through a beautiful, user-friendly and easy backend.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    Browserless

    Browserless

    The headless Chrome/Chromium driver on top of Puppeteer

    Browserless is an open-source headless browser automation library and service built on top of Puppeteer that simplifies the process of running and scaling Chromium-based browser tasks in production environments. It provides a high-level API for interacting with headless Chrome, allowing developers to perform operations such as generating PDFs, capturing screenshots, extracting text or HTML, and automating web navigation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Dillo

    Dillo

    Dillo, a multi-platform graphical web browser

    ...Its goals include enabling web access on old or constrained hardware, using slow or unreliable network connections, minimizing dependencies, and avoiding many of the complexities and overheads of modern full-featured browsers. It omits many modern features (notably JavaScript), instead focusing on rendering HTML (mostly older/standardized subsets), images, and some CSS, while keeping the codebase small. It is free/open source under GPL-3.0.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 19
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    TinyStatus

    TinyStatus

    Tiny status page generated by a Python script

    TinyStatus is a simple, customizable status page generator that allows you to monitor the status of various services and display them on a clean, responsive web page.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    Web-Maker

    Web-Maker

    A blazing fast & offline frontend playground

    Web-Maker is an offline playground for your web experiments. Something like CodePen or JSFiddle, but much more faster and works offline because it runs completely on your system. Supports Preprocessors: HTML (Pug & Markdown), CSS (SCSS, LESS & Stylus, Atomic CSS) & JavaScript (ES6, TypeScript & CoffeeScript). Hi! I am Kushagra Gour. Web Maker is a free and open-source project. To keep me motivated for working on such open-source and free side projects, I have launched a Patreon campaign. Your pledge, no matter how small, will act as an appreciation towards my work and keep me going forward making Web Maker more awesome. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Jackett

    Jackett

    API Support for your favorite torrent trackers

    Jackett works as a proxy server, it translates queries from apps (Sonarr, Radarr, SickRage, CouchPotato, Mylar3, Lidarr, DuckieTV, qBittorrent, Nefarious, etc.) into tracker-site-specific HTTP queries, parses the HTML or JSON response, and then sends results back to the requesting software. This allows for getting recent uploads (like RSS) and performing searches. Jackett is a single repository of maintained indexer scraping & translation logic, removing the burden from other apps. Trackers...
    Downloads: 182 This Week
    Last Update:
    See Project
  • 23
    Tesla

    Tesla

    The flexible HTTP client library for Elixir

    The flexible HTTP client library for Elixir, with support for middleware and multiple adapters. Tesla is an HTTP client loosely based on Faraday. It embraces the concept of middleware when processing the request/response cycle. Define module with use Tesla and choose from a variety of middleware. Tesla is built around the concept of composable middlewares. This is very similar to how Plug Router works. All HTTP functions, such as Tesla.get/3 and Tesla.post/4, can take a dynamic client as the...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    reveal.js

    reveal.js

    The HTML Presentation Framework

    reveal.js is a framework for creating beautiful interactive presentations using HTML. It comes with a wide range of features, including nested slides, auto-sliding, touch navigation, Markdown support, PDF export, speaker notes, theming and more. It also comes with a JavaScript API that allows you to control various other options, and a list of plugins that can be used to extend reveal.js further. reveal.js currently offers full support for any recently released version of the following...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    webpack DevServer

    webpack DevServer

    Serves a webpack app and updates the browser on changes

    webpack-dev-server can be used to quickly develop an application. Options that are compatible with webpack-dev-middleware have a key icon next to them. This set of options is picked up by webpack-dev-server and can be used to change its behavior in various ways. When the server is started, there will be a message prior to the list of resolved modules. If you're using dev-server through the Node.js API, the options in devServer will be ignored. Pass the options as a second parameter instead:...
    Downloads: 9 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB