Join/Login
Open Source Software
Business Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Open Source Software

Business Software

SourceForge Podcast

Articles
Case Studies
Learn
Blog

Menu

Help
Create
Join
Login

Home
Browse Open Source
Search Results

Search Results for "html extractor parser"

x

Sort By:

Relevance

OS

Windows 190
Linux 185
Mac 154
More...
BSD 91
ChromeOS 65
Desktop Operating Systems 8
Mobile Operating Systems 5
Embedded Operating Systems 1

Category

Software Development 84
Internet 73
Formats and Protocols 70
Text Editors 18
Business 10
Communications 10
System 9
Multimedia 8
Scientific/Engineering 7
Database 6
Security 6
Artificial Intelligence 4
Mobile 4
Desktop Environment 3
Education 3
Games 2
Social sciences 1

License

OSI-Approved Open Source 187
Other License 5
Creative Commons Attribution License 4
Public Domain 3

Translations

English 68
French 3
German 2
Brazilian Portuguese 1
More...
Italian 1
Polish 1
Portuguese 1
Slovene 1
Spanish 1

Programming Language

Java 50
PHP 28
JavaScript 27
Python 18
More...
Perl 16
C 15
C++ 14
C# 11
Delphi/Kylix 5
Go 5
Ruby 5
Swift 5
Rust 4
TypeScript 4
Tcl 2
Unix Shell 2
Visual Basic 2
Visual Basic .NET 2
ActionScript 1
ASP 1
Common Lisp 1
Elixir 1
Lazarus 1
Object Pascal 1
Objective C 1
PL/SQL 1
Scala 1

Status

Production/Stable 61
Beta 47
Alpha 22
Planning 6
More...
Pre-Alpha 6
Mature 2
Inactive 1

Showing 226 open source projects for "html extractor parser"

View related business solutions

Gain insights and build data-powered applications
Your unified business intelligence platform. Self-service. Governed. Embedded.

Chat with your business data with Looker. More than just a modern business intelligence platform, you can turn to Looker for self-service or governed BI, build your own custom applications with trusted metrics, or even bring Looker modeling to your existing BI environment.

Try it free
Automated quote and proposal software for IT solution providers. | ConnectWise CPQ
Create IT quote templates, automate workflows, add integrations & price catalogs to save time & reduce errors on manual data entry & updates.

ConnectWise CPQ, formerly ConnectWise Sell, is a professional quote and proposal automation software for IT solution providers. ConnectWise CPQ offers a wide range of tools that enables IT solution providers to save time, quote more, and win big. Top features include professional quote or proposal templates, product catalog and sourcing, workflow automation, sales reporting, and integrations with best-in-breed solutions like Cisco, Dell, HP, and Salesforce.

Learn More
1

pdf-extractor

Node.js module for rendering pdf pages to images, svgs and HTML files

Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text...

Downloads: 3 This Week

Last Update: 2023-03-23
See Project
2

html-react-parser

HTML to React parser

HTML to React parser that works on both the server (Node.js) and the client (browser). The parser converts an HTML string to one or more React elements. Available as part of the Tidelift Subscription. For TypeScript projects, you may need to check that domNode is an instance of domhandler's Element. Make sure to render parsed adjacent elements under a parent element.

Downloads: 0 This Week

Last Update: 2024-09-11
See Project
3

html-loader

HTML Loader

... and attributes. By default, the parser in html-loader interprets content inside noscript tags as #text, so processing of content inside this tag will be ignored. A very common scenario is exporting the HTML into their own .html file, to serve them directly instead of injecting with javascript.

Downloads: 5 This Week

Last Update: 2024-07-25
See Project
4

html-to-markdown

Convert HTML to Markdown. Even works with entire websites

Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.

Downloads: 0 This Week

Last Update: 13 minutes ago
See Project
Business Continuity Solutions | ConnectWise BCDR
Build a foundation for data security and disaster recovery to fit your clients’ needs no matter the budget.

Whether natural disaster, cyberattack, or plain-old human error, data can disappear in the blink of an eye. ConnectWise BCDR (formerly Recover) delivers reliable and secure backup and disaster recovery backed by powerful automation and a 24/7 NOC to get your clients back to work in minutes, not days.

Learn More
5

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard...

Downloads: 0 This Week

Last Update: 2024-08-24
See Project
6

LOL HTML

Low output latency streaming HTML parser/rewriter with CSS API

Low Output Latency streaming HTML rewriter/parser with CSS-selector based API. It is designed to modify HTML on the fly with minimal buffering. It can quickly handle very large documents, and operate in environments with limited memory resources. The crate serves as a back-end for the HTML rewriting functionality of Cloudflare Workers, but can be used as a standalone library with a convenient API for a wide variety of HTML rewriting/analysis tasks. The parser switches back to the tag scanner...

Downloads: 2 This Week

Last Update: 5 hours ago
See Project
7

fast-xml-parser

Validate XML, Parse XML and Build XML rapidly

Validate XML, Parse XML to JS Object, or Build XML from JS Object without C/C++ based libraries and no callback.

Downloads: 0 This Week

Last Update: 2023-10-11
See Project
8

xq

Command-line XML and HTML beautifier and content extractor

Command-line XML and HTML beautifier and content extractor. Syntax highlighting, automatic indentation, and formatting. Automatic pagination and node content extraction.

Downloads: 75 This Week

Last Update: 2024-08-29
See Project
9

HtmlSanitizer

Cleans HTML to avoid XSS attacks

HtmlSanitizer is a .NET library for cleaning HTML fragments and documents from constructs that can lead to XSS attacks. It uses AngleSharp to parse, manipulate, and render HTML and CSS. Because HtmlSanitizer is based on a robust HTML parser it can also shield you from deliberate or accidental "tag poisoning" where invalid HTML in one fragment can corrupt the whole document leading to broken layout or style. In order to facilitate different use cases, HtmlSanitizer can be customized at several...

Downloads: 7 This Week

Last Update: 2024-07-26
See Project
All-in-One Payroll and HR Platform
For small and mid-sized businesses that need a comprehensive payroll and HR solution with personalized support

We design our technology to make workforce management easier. APS offers core HR, payroll, benefits administration, attendance, recruiting, employee onboarding, and more.

Learn More
10

jsoup

Java library for working with real-world HTML

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...

Downloads: 4 This Week

Last Update: 2024-07-10
See Project
11

Trafilatura

Python & command-line tool to gather text on the Web

Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...

Downloads: 4 This Week

Last Update: 2024-09-10
See Project
12

DevHub Application

A feature-rich offline application

A feature-rich offline application, carefully crafted to support developers' daily tasks and ensure the highest security for their data. I am actively developing it with a bold goal in mind: to release updates weekly. I strive to maintain a lean footprint, aiming to curate an extensive collection comprising over 100 utilities, providing developers with a diverse array of tools. This initiative reflects my commitment to continuous improvement, offering rich tools to empower developers. DevHub...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
13

Jupyter Notebook Tools for Sphinx

Sphinx source parser for Jupyter notebooks

nbsphinx is a Sphinx extension that provides a source parser for *.ipynb files. Custom Sphinx directives are used to show Jupyter Notebook code cells (and of course their results) in both HTML and LaTeX output. Un-evaluated notebooks – i.e. notebooks without stored output cells – will be automatically executed during the Sphinx build process.

Downloads: 1 This Week

Last Update: 2024-08-13
See Project
14

parse5

HTML parsing/serialization toolset for Node.js.

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant. parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular, Lit, Cheerio, rehype and many more.

Downloads: 0 This Week

Last Update: 2023-04-18
See Project
15

htmlparser2

The fast & forgiving HTML and XML parser

The fast & forgiving HTML and XML parser. htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5. htmlparser2 itself provides a callback interface that allows the consumption of documents with minimal allocations. While the Parser interface closely resembles Node.js streams, it’s not a 100% match. Use the WritableStream interface to process a streaming input.

Downloads: 0 This Week

Last Update: 2024-01-05
See Project
16

DiDOM

Simple and fast HTML and XML parser

Simple and fast HTML and XML parser. DiDom allows loading HTML in several ways.

Downloads: 0 This Week

Last Update: 2023-04-20
See Project
17

Floki

Floki is a simple HTML parser that enables search for nodes using CSS

Floki is a simple HTML parser that enables search for nodes using CSS selectors. Floki needs the :leex module in order to compile. Normally this module is installed with Erlang in a complete installation. By default, Floki uses a patched version of mochiweb_html for parsing fragments due to its ease of installation (it's written in Erlang and has no outside dependencies). fast_html is generally faster, according to the benchmarks conducted by its developers.

Downloads: 0 This Week

Last Update: 2024-04-26
See Project
18

AngleSharp

The ultimate angle brackets parser library parsing HTML5, MathML, SVG

AngleSharp follows the W3C specifications and gives you the same results as state of the art browsers. Besides the official API AngleSharp adds some useful extension methods on top. This makes working with the DOM convenient. AngleSharp integrates everything you need to explore and mutate the DOM tree. Node retrieval is straight forward by using powerful CSS query selectors. The CSS queries in AngleSharp are super fast and very simple to use. AngleSharp respects the relationship of HTML...

Downloads: 1 This Week

Last Update: 2024-03-07
See Project
19

mdBook

Create books from markdown files

... documentation and a fine example of what mdBook produces. mdBook includes built in support for both preprocessing your Markdown and alternative renderers for producing formats other than HTML. These facilities also enable other functionality such as validation. Searching Rust's crates.io is a great way to discover more extensions.

Downloads: 1 This Week

Last Update: 2024-05-17
See Project
20

Sanitize

Ruby HTML and CSS sanitizer

... that you don't explicitly allow will be removed. Sanitize is based on the Nokogiri HTML5 parser, which parses HTML the same way modern browsers do, and Crass, which parses CSS the same way modern browsers do. As long as your allowlist config only allows safe markup and CSS, even the most malformed or malicious input will be transformed into safe output.

Downloads: 0 This Week

Last Update: 2024-08-14
See Project
21

Nokogiri

Tool to work with XML and HTML from Ruby

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java). Be secure-by-default by treating all documents as untrusted by default. Be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers. "Native gems...

Downloads: 0 This Week

Last Update: 2024-07-27
See Project
22

goquery

A little like that j-thing, only in Go

goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/HTML package and the CSS Selector library Cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8...

Downloads: 0 This Week

Last Update: 2024-09-06
See Project
23

markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions

markdown-rs is an open-source markdown parser written in Rust. It’s implemented as a state machine (#![no_std] + alloc) that emits concrete tokens, so that every byte is accounted for, with positional info. The API then exposes this information as an AST, which is easier to work with, or it compiles directly to HTML. While most markdown parsers work towards compliancy with CommonMark (or GFM), this project goes further by following how the reference parsers (cmark, cmark-gfm) work, which...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
24

Opal

Opal is a Ruby to JavaScript source-to-source compiler

Opal is a Ruby to JavaScript source-to-source compiler. It comes packed with the Ruby corelib you know and love. It is both fast as a runtime and small in its footprint. The lib directory holds the Opal parser/compiler used to compile Ruby into JavaScript. It is also built ready for the browser into opal-parser.js to allow compilation in any JavaScript environment. This directory holds the Opal runtime and corelib implemented in Ruby and JavaScript. opal-parser allows you to eval Ruby code...

Downloads: 0 This Week

Last Update: 2023-11-23
See Project
25

Redcarpet

The safe Markdown parser, reloaded

Redcarpet is written with sugar, spice and everything nice. Redcarpet is a Ruby library for Markdown processing that smells like butterflies and popcorn. Redcarpet would not be possible without the Sundown library and its authors (Natacha Porté, Vicent Martí, and its many awesome contributors). Redcarpet is readily available as a Ruby gem. It will build some native extensions, but the parser is standalone and requires no installed libraries. Starting with Redcarpet 3.0, the minimum required...

Downloads: 0 This Week

Last Update: 2023-01-29
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

pdf to jpg converter

Related Categories

Software Development

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
225 Broadway Suite 1600
San Diego, CA 92101
+1 (858) 454-5900

Resources

Support
Site Documentation
Site Status

© 2024 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: