Showing 1578 open source projects for "html source extractor"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Web-Maker

    Web-Maker

    A blazing fast & offline frontend playground

    Web-Maker is an offline playground for your web experiments. Something like CodePen or JSFiddle, but much more faster and works offline because it runs completely on your system. Supports Preprocessors: HTML (Pug & Markdown), CSS (SCSS, LESS & Stylus, Atomic CSS) & JavaScript (ES6, TypeScript & CoffeeScript). Hi! I am Kushagra Gour. Web Maker is a free and open-source project. To keep me motivated for working on such open-source and free side projects, I have launched a Patreon campaign. Your pledge, no matter how small, will act as an appreciation towards my work and keep me going forward making Web Maker more awesome. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    mitmproxy

    mitmproxy

    A free and open source interactive HTTPS proxy

    mitmproxy is an open source, interactive SSL/TLS-capable intercepting HTTP proxy, with a console interface fit for HTTP/1, HTTP/2, and WebSockets. It's the ideal tool for penetration testers and software developers, able to debug, test, and make privacy measurements. It can intercept, inspect, modify and replay web traffic, and can even prettify and decode a variety of message types. Its web-based interface mitmweb gives you a similar experience as Chrome's DevTools, with the addition of...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4
    Echo HTML Viewer

    Echo HTML Viewer

    Fast offline HTML viewer for opening local HTML files on Windows

    Echo HTML Viewer is a lightweight desktop app for viewing local HTML files without a browser or internet connection. Designed for simplicity and privacy, it lets you open saved web pages, documentation, and archived content in a clean, distraction-free interface. Key features: • Open HTML files instantly • Drag & drop support • Fast startup and low resource usage • Fully offline — no telemetry, no tracking • No background services Use cases: • View saved websites...
    Leader badge
    Downloads: 75 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    LinkifyJS

    LinkifyJS

    JavaScript plugin for finding links in plain-text

    A JavaScript library for detecting and converting URLs, email addresses, and hashtags into clickable links in text content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Dev Browser

    Dev Browser

    A Claude Skill to give your agent the ability to use a web browser

    Dev Browser is a browser automation skill/plugin that enables an AI agent to control a real browser for verification and testing during development. Its purpose is to close the gap between “code was written” and “the UI actually works,” by letting the agent navigate, interact with pages, and validate behavior in a live environment. A key idea is persistence: the browser can keep pages open so the agent can navigate once and then perform multiple interactions across scripts without losing...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 8
    eleventy

    eleventy

    A simpler site generator. Transforms a directory of templates

    A static site generator for modern web development, focusing on flexibility and customization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 19 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    webpack DevServer

    webpack DevServer

    Serves a webpack app and updates the browser on changes

    webpack-dev-server can be used to quickly develop an application. Options that are compatible with webpack-dev-middleware have a key icon next to them. This set of options is picked up by webpack-dev-server and can be used to change its behavior in various ways. When the server is started, there will be a message prior to the list of resolved modules. If you're using dev-server through the Node.js API, the options in devServer will be ignored. Pass the options as a second parameter instead:...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    SingleFile

    SingleFile

    Web Extension for saving a copy of complete web page in a single file

    Web Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex Browser, and Opera. It helps you to save a complete web page into a single HTML file. Wait until the page is fully loaded. Click on the SingleFile button in the extension toolbar to save the page. You can click again on the...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    LittleLink

    LittleLink

    A lightweight DIY alternative to services like Linktree

    The DIY self-hosted LinkTree alternative. LittleLink has more than 60 branded button styles you can easily use, with more being added by our community all the time. LittleLink has more than 60 branded button styles you can use (with even more being added by our community). You'll also find a light and dark theme ready to go. Not a fan of the default colors? Update skeleton-light.css or skeleton-dark.css to the HEX values of your choosing. You can also set your CSS to skeleton-auto.css, which...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Tesla

    Tesla

    The flexible HTTP client library for Elixir

    The flexible HTTP client library for Elixir, with support for middleware and multiple adapters. Tesla is an HTTP client loosely based on Faraday. It embraces the concept of middleware when processing the request/response cycle. Define module with use Tesla and choose from a variety of middleware. Tesla is built around the concept of composable middlewares. This is very similar to how Plug Router works. All HTTP functions, such as Tesla.get/3 and Tesla.post/4, can take a dynamic client as the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Twisted

    Twisted

    Event-driven networking engine written in Python

    Twisted is an event-based framework for internet applications, supporting Python 3.6+. It includes modules for many different purposes. Twisted supports all major system event loops, select (all platforms), poll (most POSIX platforms), epoll (Linux), kqueue (FreeBSD, macOS), IOCP (Windows), and various GUI event loops (GTK+2/3, Qt, wxWidgets). Third-party reactors can plug into Twisted, and provide support for additional event loops.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    TinyStatus

    TinyStatus

    Tiny status page generated by a Python script

    TinyStatus is a simple, customizable status page generator that allows you to monitor the status of various services and display them on a clean, responsive web page.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Jekyll

    Jekyll

    A simple, blog-aware static site generator written in Ruby

    Jekyll is a simple, blog-aware, static site generator that’s ideal for creating personal, project, or organization sites. Jekyll is incredibly simple-- it just takes your content, renders Markdown and Liquid templates, and spits out a complete, static website ready for deployment. No configurations, databases, pesky updates and other needless complexities. Jekyll lets you focus on what really matters: your content. Jekyll is easy to install and run. You can have your own website or blog...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Pelican

    Pelican

    Static site generator that supports Markdown and reST syntax

    Pelican is a static site generator that requires no database or server-side logic. Chronological content (e.g., articles, blog posts) as well as static pages. Integration with external services. Site themes (created using Jinja2 templates). Publication of articles in multiple languages. Generation of Atom and RSS feeds. Code syntax highlighting via Pygments. Import existing content from WordPress, Dotclear, or RSS feeds. Fast rebuild times due to content caching and selective output writing....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Hurl

    Hurl

    Hurl, run and test HTTP requests with plain text

    Hurl is a command line tool that runs HTTP requests defined in a simple plain text format. It can chain requests, capture values and evaluate queries on headers and body responses. Hurl is very versatile: it can be used for both fetching data and testing HTTP sessions. Hurl makes it easy to work with HTML content, REST / SOAP / GraphQL APIs, or any other XML / JSON-based APIs. Hurl can run HTTP requests but can also be used to test HTTP responses. Different types of queries and predicates...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    axe-core

    axe-core

    Accessibility engine for automated Web UI testing

    Axe is an accessibility testing engine for websites and other HTML-based user interfaces. It's fast, secure, lightweight, and was built to seamlessly integrate with any existing test environment so you can automate accessibility testing alongside your regular functional testing. Axe-core has different types of rules, for WCAG 2.0 and 2.1 on level A and AA, as well as a number of best practices that help you identify common accessibility practices like ensuring every page has an h1 heading,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    Linkwarden

    Linkwarden

    Self-hosted collaborative bookmark manager

    Linkwarden is a self-hosted, open-source bookmark manager built to help individuals and teams collect, organize, and preserve important web content in a way that stays useful long after the original pages change or disappear. Instead of saving only a URL, it captures durable archived formats so your saved knowledge remains accessible even when link rot happens. The experience is designed to feel like a modern “read-it-later” tool, with a reader view that makes long articles easier to consume...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Twill

    Twill

    Twill is an open source CMS toolkit for Laravel

    Twill is an open source Laravel package that helps developers rapidly create a custom CMS that is beautiful, powerful, and flexible. By standardizing common functions without compromising developer control, Twill makes it easy to deliver a feature-rich admin console that focuses on modern publishing needs. Twill is an AREA 17 product. It was crafted with the belief that content management should be a creative, productive, and enjoyable experience for both publishers and developers. With a...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB