Showing 64 open source projects for "html source extractor"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Lexbor

    Lexbor

    Lexbor is development of an open source HTML Renderer library

    Lexbor is the development of a web browser engine available as a software library; it ships with a free license and has no extra dependencies. For us, speed is an absolute must-have. In our development process, we focus on fastest parsing techniques for HTML, CSS, and fonts, fastest data processing methods, and fastest ways to serve content to end users. Whether you are building a backend that handles millions of HTML documents or a UI-heavy user app, your software’s response rate always...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Nokogiri

    Nokogiri

    Tool to work with XML and HTML from Ruby

    Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java). Be secure-by-default by treating all documents as untrusted by default. Be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers. "Native...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MultiMarkdown-6

    MultiMarkdown-6

    Lightweight markup processor to produce HTML, LaTeX, and more

    Lightweight markup processor to produce HTML, LaTeX, and more. MultiMarkdown is a superset of the Markdown lightweight markup syntax with support for additional output formats and features. Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MyPoorWebServer

    MyPoorWebServer

    Demonstrates fundamental HTTP server implementation principles

    MyPoorWebServer is a C/C++-based web server project that demonstrates fundamental HTTP server implementation principles drawn from classic network programming literature and high-performance server design books. The repository contains source code that implements a basic HTTP server, intended to be compiled and run from the command line, exposing introductory web server functionality such as serving static HTML files and handling simple POST requests. It was originally developed as a resume project to demonstrate understanding of TCP/IP network programming, socket handling, and server lifecycle management, and so it reflects a hands-on approach to building server software from scratch rather than relying on frameworks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5

    Exlibris HTML renderer

    Exlibris is a console (text mode) HTML renderer.

    Exlibris is a console (text mode) HTML renderer.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Tidy

    Tidy

    The granddaddy of HTML tools, with support for modern standards

    The granddaddy of HTML tools. Supports modern standards. Thanks to the efforts of HTACG and prominent contributors, HTML Tidy has a whole new heartbeat and a whole new life. Tidy tidies HTML and XML. It can tidy your documents by itself, and developers can easily integrate its features into even more powerful tools. Tidy is a console application for macOS, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Flat file extractor can be used for reading and parsing different flat file structures and printing them in different formats. ffe is a command line tool developed in GNU/Linux environment and it is distributed under GPL. Project moved to https://github.com/igitur/ffe
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8

    Obscure-Extractor-GTK

    Extract files from unusual archive formats

    Small Gtk program to extract files from (mostly) game archive formats. Currently supports Neverwinter Nights, Homeworld 2, BloodRayne, WC IV and a "generic" module to find RIFF, BMP, PNG and other files. Old Delphi version supports a few others.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    xmlfy

    xmlfy

    Convert to XML on the fly

    xmlfy converts text/UTF based output into XML formatted output using schema files and/or options to control its behaviour. By Arthur Gouros.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    A common markup language and a parser to generate documentation in any target format (Html, Latex, Trac, Mediawiki...). The core command relies on a Tcl library: it is easy to create new target formats. Doc files are parameterizable via a header.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    ftdetector

    File type detector library

    This project is a tool to detect file types by signatures and mime types. It uses hash tables to make the detection of a file type as fast as possible. The signature and mime types lists are stored at simple user-friendly files. This file type detector supports a lot of formats (image, archive, text, documents, audio, video, fonts and others). It also includes Microsoft OLE compound file types. The detector's algorythm has special features to detect text file types like (HTML, XML, JSON,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    libdropbox

    Small ANSI C lib for dropbox/windows azure communication

    Small ANSI C lib for dropbox and windows azure communication. Built for small platforms. Using PolarSSL for https communication. Features a small self contained https module and a modified version of the JSMN json parser. Originally based on the dropbox_uploader script. Able to do most dropbox actions. Eg. Upload file, download file, list, file info, account info, share link. Also contains a small CLI programs that interfaces with the lib. Also capable of windows azure service bus...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MyHTML

    MyHTML

    Fast C/C++ HTML 5 Parser

    Fast C/C++ HTML 5 Parser. Using threads. MyHTML is a fast HTML Parser using Threads implemented as a pure C99 library with no outside dependencies. Asynchronous Parsing, Build Tree, and Indexation. Fully conformant to the HTML5 specification. Two APIs - high and low-level. Manipulation of elements: add, change, delete, and others. Manipulation of elements attributes: add, change, delete, and other. Support 39-character encoding. Support detecting character encodings. Support Single Mode...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Rbmake is both a library of routines and a set of command-line utilities that enables a user to transform content into Rocket Ebook format (.rb) files and back again (unencrypted files only). Compatible with the Rocket Ebook and the REB 1100.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15

    Z Notation E-Mail Mark-up Tools

    Tools to convert Z mark-up to HTML or text.

    A small library and two command-line tools to parse and convert Z notation from the "e-mail" mark-up into HTML code, or into UTF-8 text with box-drawing graphics, or into the Z Standard text format. See the project's Wiki Home Page for details --- the "Wiki" button in the bar above, or the following link:
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Decodes CGI data from standard input, $QUERY_STRING , and $HTTP_COOKIE. Stores data in lookup table(s) for easy retrieval. Uploads files by copying directly to files created with mkstemp(). Has several handy string conversion functions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Phantom Has A New Template Oriented Mask Phantom is a php/cgi extension that allow to design efficiently html content for a php base website, written in PHP/C/C++. Phantom has been designed for speed and expandability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Gumbo

    Gumbo

    An HTML5 parsing library in pure C99

    Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies. It's designed to serve as a building block for other tools and libraries such as linters, validators, templating languages, and refactoring and analysis tools. Gumbo gains some of this by virtue of being written in C, but it is not an important consideration for the intended use-case, and was not a major design factor. Gumbo is intentionally designed to turn an HTML...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    trafalgar.map

    trafalgar.map

    Open Street Map (OSM) tools, OSM/XML parser, tag extractor

    This is going to be a set of tools which is intended to be used with huge OSM files like the planet files in XML format. The parser reads directly from packed *.gz files and it is not needed to unpack the OSM/XML data files to the local disk. Now in 0.3.0: osm_tags: tag analyzer (like tag watch) osm_split: split osm file in single files for nodes, ways and relations and collect some meta information (will be used as input for other tools). osm_cut: create rectangular...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Lightweight C-HTTP & HTML Wrapper

    Lightweight C-HTTP & HTML Wrapper

    This is my first Project originally named "CoLiBro". It is a C-HTTP and HTML Wrapper to process webdata automatically. Some examples are: - Receive website to check if it is up - Download your personal data from eg. Online-Banking, Bills, ... - Process data contained in websites eg. weather data - Receive mass of data, especially to crawl a website? - ... Examples and API Documentation planned
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DocFrac is a document converter that can convert between RTF, HTML and ASCII text. This includes RTF to HTML and HTML to RTF. Supports text formatting (e.g. bold); tables; and most European languages. Available for Windows; Linux; ActiveX and DLL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Multi-connection command line tool to download Internet sites. Similar to wget and cURL, but it manages up to 50 parallel links. Main features are: recursive fetching, Metalink retrieving, segmented download and image filtering by width and height.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    entdec

    entdec

    This is the simple program lets you decode file, conains html entities

    This program have been written for decoding files, contains html entities to utf-8 encoded file for simple editing it. The main applying of this program - decode html files, prodused by tex to html converter htlatex, uses to publishing your scientific articles and other works in web. So, it can be used by web programmes for writing gateways applications, same as such finctions, relised, e.d. in perl or php programming language. Texnical description: You have file, contains html...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    This is a JavaScriptStream generator. I consider it as a professionnal tool that target webmasters. It convert HTML formatted document to a serie of JavaScript text output commands. It is a good tool to help write HTML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Read, parse, merge and write RSS (and Atom) feeds. It has some other functions build-in like text, html, property file output or templates with custom tags to insert RSS feeds into pages that could be uploaded to a server that supports only static html
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB