DBvolution Blog

Java library to remove Object/Relation Impedance

Brought to you by: gregorydgraham

Recursive Queries Generate Paths and Trees

Currently I'm implementing recursive queries in DBvolution.

I'm still not finished with the design but I'm creating a method within DBQuery to follow a self-referencing foreign key to the root or leaves of a tree structure.

The primary use case is showing the path or hierarchy above the current record, like the directory path to a folder, or the chapter, section, and sub-sections in a document.

It's an interesting area of relational database design and throws up intriguing problems for implementing recursion automatically.

Firstly the data must be stored in a recursive manner, that is it must contain a foreign key to its own table: A1.foreignKey → A2.primaryKey. In general almost any self-referencing relation would work: A1.fk → A2.pk

Secondly the query must use the recursive syntax to extend the query beyond a simple self-join: A1.fk → A2.pk,A2.fk → A3.pk,A3.fk → ...

The syntax is usually summarised as:
WITH RECURSIVE table AS(priming query UNION ALL recursive query)SELECT * FROM table;

However that doesn't tell you a lot. In particular what are the priming and recursive queries? As part of my research I discovered the de facto standard for recursive queries:
WITH RECURSIVE A1(c1, c2, ...)
AS(
SELECT A0.c1, A0.c2, ... FROM A0 ...
UNION ALL
SELECT A2.c1, A2.c2, ...
FROM A1 INNER JOIN A2 ON( A2.pk = A1.fk )
)
SELECT c1, c2, ... FROM A1;

As with all de facto standards, your database might use different syntax.

The "priming query" above is a completely normal query with the slight constraint that the SELECT clause on "A0" will define the columns of "A1". This is an important point to note: all of the queries interact with each other through the columns and rows. This means there is considerable scope for bad queries and infinite result sets.

The columns from the priming query limits the column of the entire query so include all the output columns you'll need. Also the priming query produces the first rows of the recursive query so include all the columns you need for the recursive part.

Recursive query will be fed back into itself so the recursive query columns will need to match up to the results of A0 as well.

Recursion occurs during the second query and is accomplished by referencing the recursion table's alias (A1 in the example) and joining it with the original table (A2 in the example). Behind the scenes the database will create a new A1 and feed it back into the recursive query to be joined again with A2. Eventually, if everything is correct, the recursion produces no new rows in "A1".

Pseudocode of this process might be: A1=A0; while (A1!=A1+A2) A1+=A2.

This simple process is easily turned into an accidental infinite loop generating an infinite result set. In programming we normally protect against this by specifying the terminating conditions first. In SQL you need code the query very carefully and hope that the data doesn't create loops. (I hope to cover loops and Graphs as well...)

After the recursion has stopped, the last statement creates the final return table. This is the point where you reduce the columns of "A1" down to just the output columns required. It is still a full query so you can join it with other tables or apply further conditions.

The query I demonstrated will ascend the hierarchy from the leaves to the root node/row. If you want to descend from a root node to the leaves flip the join: "A2.pk = A1.fk" becomes "A1.pk = A2.fk".

Alternatively you can use DBQuery.getAncestors() and DBQuery.getDescendants().

Posted by 2014-12-27

Anonymous