Menu

Introduction to SQLDOM

SQLDOM is a set of temporary tables and native T-SQL code for Microsoft SQL 2005 or later.

SQLDOM allows for easy and robust parsing of HTML into a table-based DOM (Document Object Model). It also provides routines for manipulating the DOM, and for rendering the DOM out as HTML.

This means that SQLDOM is useful for digesting existing HTML pages, modifying HTML, and creating new HTML programatically--all inside SQL, with no external dependencies.

Additionally, the SQLDOM parser turns out to be farily fault tollerant--it can often process bad, erroneous HTML.

In the SQLDOM data, one of the columns is a HUID (hierarchical unique identifier). This is an outline-like dotted numbering system for the elements of the DOM.

For example, given HTML like this:

<html>
<head>
<title>Example</title>
</head>
<body>
<div>Hello World</div>
</body>
</html>

SQL DOM would parse like this:

Tag HUID
html 1
head 1.1
title 1.1.1
{text} 1.1.1.1
body 1.2
div 1.2.1
{text} 1.2.1.1

What this means for detecting erroneous HTML is this: when HUID numbering jumps unexpectedly, this is indicative of bad HTML.

For example, failing to close a non-singleton HTML tag will cause the subsequent tag to be numbered as a root-level element (i.e. 2, 3, 4 etc). In this way, it is easy to detect mis-numbered elements.

In informal testing it seems that SQLDOM is able to parse out HTML that other LINT tools cannot--making it a simple and useful way to identify specific problems with HTML that other LINT tools cannot process.

Posted by David Rueter 2012-02-19 Labels: SQLDOM introduction parse HUID

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.