Showing 64 open source projects for "html parser"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Context for your AI agents Icon
    Context for your AI agents

    Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

    Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
    Try for free
  • 1
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    jsoup

    jsoup

    Java library for working with real-world HTML

    ...The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Lobo Evolution - Java Web Browser

    Lobo Evolution - Java Web Browser

    Lobo Evolution is an extensible all-Java web browser and RIA platform

    Lobo Evolution is a fork of Lobo Browser. The project continuing the work of Lobo Browser(lobochief). Lobo Evolution is an extensible all-Java web browser and RIA platform. It supports HTML 4, HTML5 Javascript, CSS 3 and Java (Swing) rendering. CobraEvolution is the web browser's renderer API; also a Javascript-aware HTML parser. Lobo Evolution 5.0 relesed CHANGELOG: https://github.com/LoboEvolution/LoboEvolution/releases Read wiki: https://loboevolution.github.io/LoboEvolution/project-info.html Javadoc site: https://oswetto.github.io/LoboEvolution Now you can fork the project and help me with code. ...
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 5
    JSSoup

    JSSoup

    JavaScript + BeautifulSoup = JSSoup

    I'm a fan of Python library BeautifulSoup. It's feature-rich and very easy to use. But when I am working on a small react-native project, and I tried to find a HTML parser library like BeautifulSoup, I failed. So I want to write a HTML parser library that can be so easy to use just like BeautifulSoup in Javascript. JSSoup uses tautologistics/node-htmlparser as HTML dom parser, and creates a series of BeautifulSoup like API on top of it. JSSoup supports both node and react-native. JSSoup tries to use the same interfaces as BeautifulSoup so BeautifulSoup user can use JSSoup seamlessly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    JDynamiTe, Dynamic Template in Java

    JDynamiTe, Dynamic Template in Java

    Dynamically generate documents from templates

    JDynamiTe is a tool which allows you to dynamically create documents in any format from "template" documents. And very few lines of code (or no line at all!) are needed to do that. Some typical usage domains of JDynamiTe are: - dynamic Web pages creation, - text document generation, - source code generation... In fact, it can be useful in any case where pre-defined documents (templates) have to be dynamically populated with data. The main benefit of JDynamiTe is to allow a true...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    the hotdog web browser

    the hotdog web browser

    The hotdog web browser and browser engine

    the hotdog web browser project is a hobbyist web browser and layout engine written entirely from scratch in Go to explore how browsers work under the hood, implementing core components like an HTML parser, CSS rendering, UI toolkit, networking, and layout logic without relying on heavy external dependencies. It’s far from being a complete or spec-compliant browser, but it’s designed to be a learning platform and experimental codebase for anyone curious about browser internals and rendering architecture. The repository includes custom named modules such as ketchup for HTML parsing, mayo for CSS rendering, and a minimal OpenGL/GLFW-based UI toolkit termed mustard, among others. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8

    TemplateLite

    A small fast Template Engine for PHP, without a huge framework.

    Template Lite is a very fast, small HTML template engine written in PHP. The engine supports most of the Smarty2 template engine functions and filters. This template engine is no longer a Smarty Replacement. But is still similar to Smarty. The new TemplateLite3 is currently in the works and has a new parser and compiler structure along with a modified syntax. The new TemplateLite is not 100% backward compatible for the templates but, the usage from php should be.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
    Downloads: 0 This Week
    Last Update:
    See Project
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 10
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Torrtux

    Torrtux

    A terminal-program for downloading torrents from PirateBay

    ...It also allows you to get the details of your torrent, the author, the date, the type, the size, etc., just like being on the TPB site ! Moreover, it retrieves subs from www.opensubtitles.org. It retrieves informations in the source code of the TPB page and parses it with regexp and the library html-parser. In the config file ~/.torrtuxrc, you can chose your display, subs, comments preferences, your torrent-manager and a proxy if needed ! Thanks for reporting all bugs you find !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    HTML XHTML Parser + XPath

    Delphi HTML XHTML Parser +XPath

    Delphi HTML Parser This module lets you work with HTML documents as DOM tree and use XPath for searching tags. It is very simple way to parse HTML. This tested with version Delphi XE5,6 Usage Add in Uses parser.pas; begin HtmlTxt:= ''; //here your html NodeList:= TNodeList.Create; ValueList:= TStringList.Create; DomTree:= TDomTree.Create; DomTreeNode:= DomTree.RootNode; If DomTreeNode.RunParse(HtmlTxt) then begin {your code example: DomTreeNode.FindXPath('//*[@id="TopBox"]/div[1]/div[@class="draw default"]'),NodeList,ValueList)} end; end; Xpath support: attributes - //*[@id="TopBox"]/div/@class comment - //*[@id="TopBox"]/div/comment()[3] text - //*[@id="TopBox"]/div/text()[2] previous level - /.....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    python-web_excavator

    Genral Data Mining API: Only write html parsing code.

    A general web scraper that uses the requests library to communicate with the website. Scraper() contains a parser object, which you can add parsing handles to. ParseHandle() is the code mining for you data from an html source. Repo: https://github.com/crispycret/web_excavator
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CppWeb - C++ Web developement framework

    CppWeb - C++ Web developement framework

    Cross-platform C++ library for developing CGI Web applications

    CppWeb is cross-platform C++ library for developing web applications with server push support. The library decodes CGI variables and cookies, supports file uploads, performs automatic cookie detection, provides URL and HTML entity encode/decode functions, supports server-push (long-polling via ajax), has built-in HTML parser, SQLite database wrapper etc. CppWeb compiles on Windows, Linux and MacOSX (tested with GNU C++, MingW, MS Visual C++ and Borland C++ compiler) and can run with almost any web server (Apache, IIS, Boa etc.). Can be used in embedded systems (tested with FriendlyARM Mini2440 and Raspberry PI)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PynDora

    PynDora

    Python WebServer Log File Analyzer

    This is a web log file analyzer we are making using python. First the IIS parsing engine wil be built and then Apache and possibly other servers. It is going to support multiple log files from any date and output the statistics in html formatted files, incorporating automatically build charts. It will be a pure python solution which is going to be self contained, ie no installation will be required other from the standard python modules.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    CPoll based C++ server pages

    Server side scripting language similar to ASP and PHP, but using C++.

    CPPSP (C++ Server Pages) is an open source web application framework similar to ASP.NET. It features a template parser that parses, compiles, and loads CPPSP pages automatically at runtime. CPPSP pages have a very similar syntax to ASP and ASP.NET, where all code is considered HTML by default, and server-side active code can be embedded using "<% ... %>". CPPSP is built upon the CPoll asynchronous I/O and utility library, which offers simple I/O abstraction, network abstraction, memory management, and container classes. ...
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18

    DStyles

    Simple and lightweight HTML templates parser

    DStyles is an easy way to build your website with dynamically-generated templates. It helps to separate logic from view in Your project. Based on PHP, templates parsed by DStyles are generated quickly and code itself is lightweight.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    HTML DOM Parser

    HTML parser which can be used for screen-scraping applications

    htmldom parses the HTML file and provides methods for iterating and searching the parse tree in a similar way as Jquery. To report bugs please mail me at bhimsen.pes@gmail.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Spondulas

    Spondulas is browser emulator designed to retrieve web pages for hunti

    Spondulas is browser emulator and parser designed to retrieve web pages for hunting malware. It supports generation of browser user agents, GET/POST requests, and SOCKS5 proxy. It can be used to parse HTML files sent via e-mail. Monitor mode allows a website to be monitored at intervals to discover changes in DNS or content over time. Autolog mode creates an investigation file that documents redirection chains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Parser Jazdy

    Parser Jazdy

    Aplikacja wyświetlająca rozkład jazdy z formatu danych jazdy.net

    Aplikacja korzysta z formatu danych rozkładów jazdy pochodzących z serwisu jazdy.net. Z powodu tego, że ww. serwis przestanie niedługo istnieć, postanowiłem stworzyć aplikację PHP, której zadaniem jest zamiana plików tekstowych formatu danych na format HTML. Przykładowe użycie skryptu: http://rozklad_jazdy.p98-games.tk/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    HXPath

    XPath HTML parser

    HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ImageCrawler Application to extract Images from Websites. A Thumbnail view is provided. Based on Spring.NET and the HTML Agility Pack
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    C# .NET library implementing the Pop3 message retrieval protocol
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.
    Downloads: 15 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next