html parser free download

Showing 64 open source projects for "html parser"

View related business solutions

Internet Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Context for your AI agents
Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.

Try for free
1

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...

Downloads: 0 This Week

Last Update: 2025-04-30
See Project
2

jsoup

Java library for working with real-world HTML

...The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.

Downloads: 2 This Week

Last Update: 2026-01-01
See Project
3

crawley

The unix-way web crawler

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured.

Downloads: 1 This Week

Last Update: 2025-12-09
See Project
4

Lobo Evolution - Java Web Browser

Lobo Evolution is an extensible all-Java web browser and RIA platform

Lobo Evolution is a fork of Lobo Browser. The project continuing the work of Lobo Browser(lobochief). Lobo Evolution is an extensible all-Java web browser and RIA platform. It supports HTML 4, HTML5 Javascript, CSS 3 and Java (Swing) rendering. CobraEvolution is the web browser's renderer API; also a Javascript-aware HTML parser. Lobo Evolution 5.0 relesed CHANGELOG: https://github.com/LoboEvolution/LoboEvolution/releases Read wiki: https://loboevolution.github.io/LoboEvolution/project-info.html Javadoc site: https://oswetto.github.io/LoboEvolution Now you can fork the project and help me with code. ...

1 Review

Downloads: 6 This Week

Last Update: 2025-05-07
See Project
Collect! is a highly configurable debt collection software
Everything that matters to debt collection, all in one solution.

The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.

Learn More
5

JSSoup

JavaScript + BeautifulSoup = JSSoup

I'm a fan of Python library BeautifulSoup. It's feature-rich and very easy to use. But when I am working on a small react-native project, and I tried to find a HTML parser library like BeautifulSoup, I failed. So I want to write a HTML parser library that can be so easy to use just like BeautifulSoup in Javascript. JSSoup uses tautologistics/node-htmlparser as HTML dom parser, and creates a series of BeautifulSoup like API on top of it. JSSoup supports both node and react-native. JSSoup tries to use the same interfaces as BeautifulSoup so BeautifulSoup user can use JSSoup seamlessly. ...

Downloads: 0 This Week

Last Update: 2023-04-10
See Project
6

JDynamiTe, Dynamic Template in Java

Dynamically generate documents from templates

JDynamiTe is a tool which allows you to dynamically create documents in any format from "template" documents. And very few lines of code (or no line at all!) are needed to do that. Some typical usage domains of JDynamiTe are: - dynamic Web pages creation, - text document generation, - source code generation... In fact, it can be useful in any case where pre-defined documents (templates) have to be dynamically populated with data. The main benefit of JDynamiTe is to allow a true...

Downloads: 0 This Week

Last Update: 2022-01-04
See Project
7

the hotdog web browser

The hotdog web browser and browser engine

the hotdog web browser project is a hobbyist web browser and layout engine written entirely from scratch in Go to explore how browsers work under the hood, implementing core components like an HTML parser, CSS rendering, UI toolkit, networking, and layout logic without relying on heavy external dependencies. It’s far from being a complete or spec-compliant browser, but it’s designed to be a learning platform and experimental codebase for anyone curious about browser internals and rendering architecture. The repository includes custom named modules such as ketchup for HTML parsing, mayo for CSS rendering, and a minimal OpenGL/GLFW-based UI toolkit termed mustard, among others. ...

Downloads: 2 This Week

Last Update: 1 day ago
See Project
8

TemplateLite

A small fast Template Engine for PHP, without a huge framework.

Template Lite is a very fast, small HTML template engine written in PHP. The engine supports most of the Smarty2 template engine functions and filters. This template engine is no longer a Smarty Replacement. But is still similar to Smarty. The new TemplateLite3 is currently in the works and has a new parser and compiler structure along with a modified syntax. The new TemplateLite is not 100% backward compatible for the templates but, the usage from php should be.

3 Reviews

Downloads: 0 This Week

Last Update: 2017-12-23
See Project
9

MangaStream Downloader

The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader

3 Reviews

Downloads: 0 This Week

Last Update: 2017-12-08
See Project
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
10

Jericho HTML Parser

Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.

16 Reviews

Downloads: 4 This Week

Last Update: 2015-10-24
See Project
11

CyberNeko HTML Parser

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.

17 Reviews

Downloads: 2 This Week

Last Update: 2015-04-17
See Project
12

Torrtux

A terminal-program for downloading torrents from PirateBay

...It also allows you to get the details of your torrent, the author, the date, the type, the size, etc., just like being on the TPB site ! Moreover, it retrieves subs from www.opensubtitles.org. It retrieves informations in the source code of the TPB page and parses it with regexp and the library html-parser. In the config file ~/.torrtuxrc, you can chose your display, subs, comments preferences, your torrent-manager and a proxy if needed ! Thanks for reporting all bugs you find !

Downloads: 0 This Week

Last Update: 2016-07-07
See Project
13

HTML XHTML Parser + XPath

Delphi HTML XHTML Parser +XPath

Delphi HTML Parser This module lets you work with HTML documents as DOM tree and use XPath for searching tags. It is very simple way to parse HTML. This tested with version Delphi XE5,6 Usage Add in Uses parser.pas; begin HtmlTxt:= ''; //here your html NodeList:= TNodeList.Create; ValueList:= TStringList.Create; DomTree:= TDomTree.Create; DomTreeNode:= DomTree.RootNode; If DomTreeNode.RunParse(HtmlTxt) then begin {your code example: DomTreeNode.FindXPath('//*[@id="TopBox"]/div[1]/div[@class="draw default"]'),NodeList,ValueList)} end; end; Xpath support: attributes - //*[@id="TopBox"]/div/@class comment - //*[@id="TopBox"]/div/comment()[3] text - //*[@id="TopBox"]/div/text()[2] previous level - /.....

Downloads: 0 This Week

Last Update: 2014-10-26
See Project
14

python-web_excavator

Genral Data Mining API: Only write html parsing code.

A general web scraper that uses the requests library to communicate with the website. Scraper() contains a parser object, which you can add parsing handles to. ParseHandle() is the code mining for you data from an html source. Repo: https://github.com/crispycret/web_excavator

Downloads: 0 This Week

Last Update: 2014-12-15
See Project
15

CppWeb - C++ Web developement framework

Cross-platform C++ library for developing CGI Web applications

CppWeb is cross-platform C++ library for developing web applications with server push support. The library decodes CGI variables and cookies, supports file uploads, performs automatic cookie detection, provides URL and HTML entity encode/decode functions, supports server-push (long-polling via ajax), has built-in HTML parser, SQLite database wrapper etc. CppWeb compiles on Windows, Linux and MacOSX (tested with GNU C++, MingW, MS Visual C++ and Borland C++ compiler) and can run with almost any web server (Apache, IIS, Boa etc.). Can be used in embedded systems (tested with FriendlyARM Mini2440 and Raspberry PI)

Downloads: 0 This Week

Last Update: 2016-04-15
See Project
16

PynDora

Python WebServer Log File Analyzer

This is a web log file analyzer we are making using python. First the IIS parsing engine wil be built and then Apache and possibly other servers. It is going to support multiple log files from any date and output the statistics in html formatted files, incorporating automatically build charts. It will be a pure python solution which is going to be self contained, ie no installation will be required other from the standard python modules.

Downloads: 0 This Week

Last Update: 2014-01-13
See Project
17

CPoll based C++ server pages

Server side scripting language similar to ASP and PHP, but using C++.

CPPSP (C++ Server Pages) is an open source web application framework similar to ASP.NET. It features a template parser that parses, compiles, and loads CPPSP pages automatically at runtime. CPPSP pages have a very similar syntax to ASP and ASP.NET, where all code is considered HTML by default, and server-side active code can be embedded using "<% ... %>". CPPSP is built upon the CPoll asynchronous I/O and utility library, which offers simple I/O abstraction, network abstraction, memory management, and container classes. ...

Downloads: 5 This Week

Last Update: 2016-11-23
See Project
18

DStyles

Simple and lightweight HTML templates parser

DStyles is an easy way to build your website with dynamically-generated templates. It helps to separate logic from view in Your project. Based on PHP, templates parsed by DStyles are generated quickly and code itself is lightweight.

Downloads: 0 This Week

Last Update: 2015-05-07
See Project
19

HTML DOM Parser

HTML parser which can be used for screen-scraping applications

htmldom parses the HTML file and provides methods for iterating and searching the parse tree in a similar way as Jquery. To report bugs please mail me at bhimsen.pes@gmail.com

1 Review

Downloads: 0 This Week

Last Update: 2012-08-29
See Project
20

Spondulas

Spondulas is browser emulator designed to retrieve web pages for hunti

Spondulas is browser emulator and parser designed to retrieve web pages for hunting malware. It supports generation of browser user agents, GET/POST requests, and SOCKS5 proxy. It can be used to parse HTML files sent via e-mail. Monitor mode allows a website to be monitored at intervals to discover changes in DNS or content over time. Autolog mode creates an investigation file that documents redirection chains.

1 Review

Downloads: 0 This Week

Last Update: 2015-05-05
See Project
21

Parser Jazdy

Aplikacja wyświetlająca rozkład jazdy z formatu danych jazdy.net

Aplikacja korzysta z formatu danych rozkładów jazdy pochodzących z serwisu jazdy.net. Z powodu tego, że ww. serwis przestanie niedługo istnieć, postanowiłem stworzyć aplikację PHP, której zadaniem jest zamiana plików tekstowych formatu danych na format HTML. Przykładowe użycie skryptu: http://rozklad_jazdy.p98-games.tk/

Downloads: 0 This Week

Last Update: 2014-12-17
See Project
22

HXPath

XPath HTML parser

HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.

Downloads: 0 This Week

Last Update: 2016-05-26
See Project
23

SpringNETImageCrawler

ImageCrawler Application to extract Images from Websites. A Thumbnail view is provided. Based on Spring.NET and the HTML Agility Pack

2 Reviews

Downloads: 0 This Week

Last Update: 2014-04-28
See Project
24

System.Net.Pop3 .NET Client Library

C# .NET library implementing the Pop3 message retrieval protocol

Downloads: 0 This Week

Last Update: 2014-07-01
See Project
25

JTidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.

7 Reviews

Downloads: 15 This Week

Last Update: 2012-10-09
See Project