Showing 32 open source projects for "html parser c++"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    jsoup

    jsoup

    Java library for working with real-world HTML

    ...The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes. The tool combines deterministic parsing...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    HtmlCleaner is HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that the most web-browsers use.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    The Accelerator Markup Language (AML) / Universal Accelerator Parser (UAP) project will develop an XML based format for describing high energy particle accelerators along with associated software to convert lattice files to a standard internal struct
    Downloads: 0 This Week
    Last Update:
    See Project
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 5

    ConcatPDF

    PDF Concatenation Tool

    ConcatPDF is the tool to concatenate PDF files. It can concatenate, extract, encrypt, decrypt, configure PDF files, convert image files to PDF. GUI version and CUI version are both available. iText.NET is iText porting on .NET Framework by J#. This library allows you to generate PDF, (X)HTML, XML, RTF files on Microsoft.NET Framework including ASP.NET.
    Leader badge
    Downloads: 33 This Week
    Last Update:
    See Project
  • 6
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. ...
    Leader badge
    Downloads: 156 This Week
    Last Update:
    See Project
  • 7
    Cool Reader

    Cool Reader

    A cross-platform XML/CSS based eBook reader

    CoolReader is fast and small cross-platform XML/CSS based eBook reader for desktops and handheld devices. Supported formats: FB2, TXT, RTF, DOC, TCR, HTML, EPUB, CHM, PDB, MOBI. Platforms: Win32, Linux, Android. Ported on some eInk based devices.
    Leader badge
    Downloads: 513 This Week
    Last Update:
    See Project
  • 8

    StoryParser

    A set of tools and libraries to help with writing eBooks

    A set of tools and libraries (available for C# and Java) that help with writing fiction and non-fiction drafts and then produce ePUB and Kindle eBooks. With these tools/libraries, drafts, written in HTML, can be analyzed to help with writing. such as generating outlines and associating scenes with keywords. When done writing, the tools/libraries can be used to make publishable eBook, automatically producing additional material, such as Table of Contents and Title Pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    VTD-XML is the next generation XML parser/indexer/editor/slicer/assembler/xpath-engine that goes beyond DOM, SAX and PULL in performance, memory usage, and ease of use.
    Downloads: 10 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10

    JSONjuicer

    JSON parser and encoder

    A Java open-source library which makes encoding and decoding Java data-structures to and from JSON text easy and intuitive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 13
    A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    sasgml

    SAX-like API for SGML (SGML parser for Java)

    SGML parser for Java, based on OpenSP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    jStyleParser

    Java CSS parser and DOM style assignment library

    jStyleParser is a CSS parser written in Java. It has its own application interface that is designed to allow an efficient CSS processing in Java and mapping the values to the Java data types. It is also able to apply the parsed style sheets to a DOM that represents an HTML or XML document and to compute the resulting style of the individual document elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    knowceans

    Utility classes from maps to search engine to random samplers

    .... --- Highlights: --- org.knowceans.util: IndexQuickSort, TableList: apply order of one array/list to others +++ Vectors, ArrayUtils: array convenience +++ RandomSamplers, CokusRandom, ArmSampler, Densities: random sampling and distributions +++ Arguments: command line parser +++ StopWatch, Which, ExternalProcess: runtime stuff +++ ParallelFor: OpenMP workalike +++ PatternString, NamedGroupRegex: regex convenience --- org.knowceans.corpus: CorpusSearcher: full-text search engine +++ LabelNumCorpus: svmlight corpus storage and filtering +++ NIPS corpus with text, authors, labels and citations --- org.knowceans.map: InvertibleHashMultiMap, BijectiveHashMap: implement n:m and 1:1 relations. --- Other libs: knowceans-arms = port of the Adaptive Rejection Metropolis Sampler (ARMS) for arbitrary distributions +++ lda-j = port of lda-c, implementing Latent Dirichlet Allocation (LDA)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Kelvina is a platform independent Java HTML parser, which outputs Document(org.w3c.dom.Document) object from any html input, including invalid one.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Lightweight XML processor; XML-POJO mapping via Java5 annotations or DTD; Preprocessing of XML documents using expression language; Binary XML; RMI friendly XML; JSON format support; XML marshall/unmarshall; HTML as XML parser; Swing XML Viewer
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ZK Light is renamed to ZKuery and moved to http://code.google.com/p/zkuery/. ZK Light is a client-only version of ZK; Support Java, C, PHP, Python...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    POESIA= Public Opensource Environment for a Safer Internet Access an opensource Internet content filter (multimodal, mulitlingual) aimed for protection of youth (in schools...); partly funded by the European Commission
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MozillaParser is a Java Html parser based on mozilla's html parser. it acts as a bridge from java classes to Mozilla's classes and outputs a java Document object from a raw ( and dirty) HTML input
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Wow Log Parser is a combat log parser for the game World of Warcraft. The purpose of the program is to parse the files generated with the /combatlog command. The source code can be found on: http://www.gurre.eu/wowlogparser/forum
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Converting C/C++/Java Code to HTML format
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    NeatCleaner is a velocity template file parser written in Java. It transforms dirty HTML/VTL/JS to well-formed Node Tree and render the Node Tree to output pretty source.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    The tool putting HTML/XHTML documents into the Lotus Notes/Domino R5, R6 or R7 databases. The HTML code is aggregated in defined field of document, including files of the resources. Currently is only supported Polish language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB