A JAVA suite for parsing arbitrary text data. Not just HTML or XML or Java, but all of them.
Use it when the JDK tokenizers are too limited, JavaCC, JTB etc. are too complicated, or You need dynamic parser configuration
In a nutshell it takes as input the formal specification of any text protocol in ABNF and generates the parser in C language for that grammar using lex/yacc.
Web documents that look similar often use different HTML tags to achieve their layout effect. These tags often make it difficult for a machine to find text or images of interest. Our goal is to implement a parser to overcome this.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Piccolo is the fastest SAX parser for Java, supporting SAX1, SAX2, and JAXP (SAX only). Piccolo is different from other parsers in that it was developed using parser generators. It weighs 160K including XML APIs. See http://piccolo.sf.net for more info.
A shorthand alternative to XML. A set of software tools written in Java for dealing with text that is structured by indentation rather than with tags. The tools include a parser, an object representation, XPath evaluator, a schema validator and more.
NunniMJAX is a minimal java library for parsing XML. The API and functioning recall SAX and is sequential and event-driven. The parser strives to verify that the XML is well-formed, but no validation. NunniMJAX's FSM has been generated using NunniFSMGen
Chaperon is a LALR(1) parser, which parse structured text documents and
generate XML documents as output. It includes a parser generator like yacc
and a regex scaner like lex. As input use Chaperon a grammar written in XML.
XML C Parser Generator (xmlcpg) is a xml processor coupled with a flex/bison C parser generator. A DTD can be processed to build a specialized parser for the grammar.
Code to process human readable input is often highly stylized and repetitive.
This project extracts the common elements found in such code and makes
them available in a concise form as C tables and subroutines.
xSiteable is a fully relational website compiler written entirely in XSLT, using topic maps (using XTM directly) as the backbone information technology, bundled with the fast Sablotron XSLT parser, a GUI admin tool and other nifty features. Watch this sp
This is a parser which reads plain-text input files and generates HTML output files.
It combines the presentation features of HTML with the simplicity of plain-text notes.
Generates HTML index files and hyperlinks for the words you choose to index.
Java API to process or parse HTML documents.
If your Java application needs or would like to be able to process some text in HTML format, you'd probably find this API interesting.
Command line XML parser using "expat" libraries allowing you to on the fly extract / add / modify / delete / split / format / unformat / count tag value, name and attributes. Usefull for shell scripting on UNIX or Linux based systems.
This is an RDF editor written in Java(Swing) and uses xerces. Using this editor it will be very easy to write RDF documents and then generate reports. It will have support to generate reports on the fly using ARP: Another RDF Parser based on Jena . It wi
JXMLEditor is a XML editor developed in Java which is based on the
Xerces Java parser. The goal of this editor is to offer some features (tree view, drag & drop, syntax colorizing) to create XML documents easily. Also available as Eclipse Plugin.
The CVS Log Stats Parser (cvsstats) is a suite of tools to gather
information on a checked out CVS source tree. Stats are gathered globaly and on a per-user basis with the generation of text and graphic reports.
PM2HTML takes PageMaker files and makes a cohesive newspaper website. It comprises a PMScript that exports all stories to a directory of tagged txts, and a python program to act as a converter to turn those tagged text files into HTML, a parser to guess