Menu

#37 "ERROR: WITH RECURSIVE currently not supported on distributed tables" even on the replicated ones.

9.2rc
open
nobody
None
None
None
nobody
2014-11-17
2014-11-16
knienart
No

'With recursive' is not allowed even for the replicated tables and tying to use it results in confusing error:

"ERROR: WITH RECURSIVE currently not supported on distributed tables."

Steps:

create table t_hierarchy(id int, parent_id int, name text);
insert into t_hierarchy select i,i+1, i || ' son of ' || i+1 from generate_series(1,5) i;
with recursive subc as
(
select 0 l,name,null::text path,id,parent_id from t_hierarchy where id=1
union all
select l+1,s.name,rpad('-',l+1) || s.name nident, s.id,s.parent_id from t_hierarchy s, subc sc
where s.id=sc.parent_id) select * from subc;

Should give:

l | name | path | id | parent_id
---+------------+----------------+----+-----------
0 | 1 son of 2 | [null] | 1 | 2
1 | 2 son of 3 | -2 son of 3 | 2 | 3
2 | 3 son of 4 | - 3 son of 4 | 3 | 4
3 | 4 son of 5 | - 4 son of 5 | 4 | 5
4 | 5 son of 6 | - 5 son of 6 | 5 | 6
(5 rows)

but gives before mentioned error.

This is on the head from git.

I understand recursive query could be cumbersome on the distributed tables and was wondering if you planned to implement it in any foreseeable future?

Best,
Krzysztof

Discussion

  • knienart

    knienart - 2014-11-16

    Same error happens on XC, should I report it there too?

     
  • mason_s

    mason_s - 2014-11-16

    This is known (also to XC). There are no immediate plans to add support for it soon with other work going on. This is one of those where either someone could submit a patch, or some organization could sponsor development work to have this implemented sooner.

    It may take some effort to implement, or require execution in a suboptimal manner that reduces parallelism, or introduces a lot of latency. Perhaps work could be done such that if the top most parent is always carried forward and it appears in all of the tuples and is also the distribution column, that we can recognize that to safely push down.

     
  • knienart

    knienart - 2014-11-17

    Thanks for the clarifications. I forgot to add 'distribute by replication' to the example ddl.
    I was surprised replicated tables pose this kind of problems as such query could be simply pushed down to any of the replicas? - or I miss something?

     
  • knienart

    knienart - 2014-11-17

    nb. even purely synthetic table is not supported:

    WITH RECURSIVE t(n) AS (
    VALUES (1)
    UNION ALL
    SELECT n+1 FROM t WHERE n < 100
    )
    SELECT sum(n) FROM t;

    from the documentation example gives the same error.

     
  • knienart

    knienart - 2014-11-17

    I tried go a bit farther and debugged a little more.
    Below is a very naive attempt to enable it on syntetic and replicated tables as I have never really analyzed how the planner and executor work (nor patched Postgres core). I managed to get synthetic table recursive scan working, while the replicated table still refuses to work and stops at the initial subquery returning always a single, first row. Probably due to disjoint first and a subsequent subqueries.
    Any hints what else would have to be changed would be appreciated.
    I am after 'with recursive' for replicated tables, not distributed for now, error for synthetic tables is confusing anyways.

    diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
    index 86a83cb..2b2d8ba 100644
    --- a/src/backend/optimizer/plan/planner.c
    +++ b/src/backend/optimizer/plan/planner.c
    @@ -704,6 +704,27 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
                            {
                                    continue;
                            }
    +                       else if (rte->rtekind == RTE_SUBQUERY) /* otherwise never executes for recursive */
    +                       {
    +                               RangeTblEntry *subrte;
    +                               ListCell *lc;
    +
    +                               foreach(lc, rte->subquery->rtable)
    +                               {
    +                                       RangeTblEntry *subrte = (RangeTblEntry *) lfirst(lc);
    +                                       char loc_type;
    +                                       /* Allow for synthetic tables */
    +                                       if (!subrte->relid) continue;
    +                                       /* Allow for replciated only */
    +                                       loc_type = GetRelationLocType(subrte->relid);
    +                                       if (!IsLocatorReplicated(loc_type))
    +                                       {
    +                                               recursiveOk = false;
    +                                               break;
    +                                       }
    +
    +                               }
    +                       }
                            else if (rte->rtekind == RTE_RELATION)
                            {
                                    char loc_type;
    

    This helps with he synthetic query:

     WITH RECURSIVE t(n) AS (
        VALUES (1)
      UNION ALL
        SELECT n+1 FROM t WHERE n < 100
    )
    SELECT sum(n) FROM t;
    
     sum  
    ------
     5050
    (1 row)
    
                                                        QUERY PLAN                                                      
    ---------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=3.65..3.66 rows=1 width=4) (actual time=1.511..1.512 rows=1 loops=1)
       CTE t
         ->  Recursive Union  (cost=0.00..2.96 rows=31 width=4) (actual time=0.017..0.977 rows=100 loops=1)
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4) (actual time=0.012..0.014 rows=1 loops=1)
               ->  WorkTable Scan on t  (cost=0.00..0.23 rows=3 width=4) (actual time=0.002..0.004 rows=1 loops=100)
                     Filter: (n < 100)
                     Rows Removed by Filter: 0
       ->  CTE Scan on t  (cost=0.00..0.62 rows=31 width=4) (actual time=0.022..1.330 rows=100 loops=1)
     Total runtime: 1.545 ms
    

    But not so much with the replicated tables (on 3 nodes):

     with recursive subc as
    (
    select 0 l,name,null::text path,id,parent_id from t_hierarchy where id=1
    union all
    select l+1,s.name,rpad('-',l+1) || s.name nident, s.id,s.parent_id from t_hierarchy s, subc sc
    where s.id=sc.parent_id) select * from subc;
     l |    name    | path | id | parent_id 
    ---+------------+------+----+-----------
     0 | 1 son of 2 |      |  1 |         2
    
    
         QUERY PLAN                                                           
    
    ---------------------------------------------------------------------------------------------------------------
    
     CTE Scan on subc  (cost=1057.94..1337.78 rows=13992 width=76) (actual time=1.858..11.163 rows=1 loops=1)
       CTE subc
         ->  Recursive Union  (cost=0.00..1057.94 rows=13992 width=44) (actual time=1.853..11.154 rows=1 loops=1)
               ->  Remote Subquery Scan on all (datanode3)  (cost=0.00..39.12 rows=12 width=40) (actual time=1.841..1.843 rows=1 loops=1
    )
               ->  Remote Subquery Scan on all (datanode2)  (cost=3.90..73.90 rows=1398 width=44) (actual time=9.203..9.203 rows=0 loops
    =1)
     Total runtime: 14.720 ms
    (6 rows)
    

    Cheers,
    Krzysztof

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.