EXPLAIN ANALYZE outputs a different plan from ANALYZE
test2=# explain analyze select count(*) from t1;
QUERY PLAN
Aggregate (cost=2.05..2.06 rows=1 width=0) (actual time=20.303..20.303 rows=1 loops=1)
-> Remote Subquery Scan on all (datanode_1,datanode_2) (cost=0.00..2.04 rows=4 width=0) (actual t
ime=10.578..11.318 rows=2 loops=1)
Total runtime: 50.685 ms
(3 rows)
test2=# explain select count(*) from t1;
QUERY PLAN
Aggregate (cost=2.05..2.06 rows=1 width=0)
-> Remote Subquery Scan on all (datanode_1,datanode_2) (cost=0.00..2.04 rows=4 width=0)
-> Aggregate (cost=0.00..2.04 rows=1 width=0)
-> Seq Scan on t1 (cost=0.00..2.04 rows=4 width=0)
(4 rows)
Basically, we are missing the output below the Remote Subquery Scan with ANALYZE.
The issue here is that there are a lot of callbacks to pass back the stats, and in Postgres-XL's case, these subtrees are remote on the other nodes-- everything below "Remote Subquery Scan". To handle this properly long term will take a good amount of effort and message passing amongst the nodes.
A medium term solution would be to display statistics properly for top level nodes (everything not below Remote Subquery Scans), but add something to the output to indicate that remote statistics are unavailable. For example:
WARNING: EXPLAIN ANALYZE does not support collecting of remote statistics
QUERY PLAN
Aggregate (cost=2.05..2.06 rows=1 width=0) (actual time=20.303..20.303 rows=1 loops=1)
-> Remote Subquery Scan on all (datanode_1,datanode_2) (cost=0.00..2.04 rows=4 width=0) (actual time=10.578..11.318 rows=2 loops=1)
-> Aggregate (cost=0.00..2.04 rows=1 width=0) (actual stats unavailable)
-> Seq Scan on t1 (cost=0.00..2.04 rows=4 width=0) (actual stats unavailable)
Total runtime: 50.685 ms
(5 rows)