Re: [Postgres-xc-general] XC Performance with Subquery

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Matt,
If you have 8 nodes node1 to node8, the query is being fired to all those
nodes. It might happen that the data distribution across the datanodes is
skewed, which might cause different loads on different nodes. You may check
that by looking at count(*) output from each node using EXECUTE DIRECT.

On Mon, Jun 10, 2013 at 10:22 PM, Matt Warner <MW...@xi...> wrote:

> My apologies for the delay. Here’s the verbose output:****
>
> ** **
>
> ** **
>
> psql (PGXC 1.1devel, based on PG 9.2beta2)****
>
> Type "help" for help.****
>
> ** **
>
> postgres=# explain verbose select count(*) from accn a1 where exists
> (select null from accn_proc a2 where a2.fk_accn_id=a1.pk_accn_id and
> a2.fk_sta_id=52);****
>
>                                                  QUERY
> PLAN                                                 ****
>
>
> ------------------------------------------------------------------------------------------------------------
> ****
>
> Aggregate  (cost=0.05..0.06 rows=1 width=0)****
>
>    Output: count(*)****
>
>    ->  Hash Semi Join  (cost=0.01..0.05 rows=1 width=0)****
>
>          Hash Cond: ((a1.pk_accn_id)::text = (a2.fk_accn_id)::text)****
>
>          ->  Data Node Scan on accn "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=98)****
>
>                Output: a1.pk_accn_id****
>
>                Node/s: node1, node2, node3, node4, node5, node6, node7,
> node8****
>
>                Remote query: SELECT pk_accn_id FROM ONLY accn a1 WHERE true
> ****
>
>          ->  Hash  (cost=0.00..0.00 rows=1000 width=98)****
>
>                Output: a2.fk_accn_id****
>
>                ->  Data Node Scan on accn_proc "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=98)****
>
>                      Output: a2.fk_accn_id****
>
>                      Node/s: node1, node2, node3, node4, node5, node6,
> node7, node8****
>
>                      Remote query: SELECT fk_accn_id FROM ONLY accn_proc
> a2 WHERE (fk_sta_id = 52)****
>
> (14 rows)****
>
> ** **
>
> ** **
>
> *From:* Ashutosh Bapat [mailto:ash...@en...]
> *Sent:* Thursday, June 06, 2013 9:29 PM
>
> *To:* Matt Warner
> *Cc:* Pos...@li...
> *Subject:* Re: [Postgres-xc-general] XC Performance with Subquery****
>
> ** **
>
> It might help, if you can send us the EXPLAIN VERBOSE output of this query.
> ****
>
> ** **
>
> On Thu, Jun 6, 2013 at 2:38 AM, Matt Warner <MW...@xi...> wrote:****
>
> Just to follow up on this, the pre-1.2 version works much better—the query
> actually completes and I do see more CPU resources being used, but
> curiously, not all of them.****
>
>  ****
>
> Still trying to figure out why that would be, so if anyone has any
> suggestions, please let me know.****
>
>  ****
>
> Matt****
>
>  ****
>
>  ****
>
> *From:* Matt Warner ****
>
> *Sent:* Wednesday, June 05, 2013 8:52 AM
> *To:* 'Ashutosh Bapat'
> *Cc:* Pos...@li...****
>
> *Subject:* RE: [Postgres-xc-general] XC Performance with Subquery****
>
>  ****
>
> I’m using 1.0.3, but will try out pre-1.2.****
>
>  ****
>
> BTW, I had to make some minor changes to get 1.0.3 to compile correctly on
> Solaris. Is anyone interested in receiving these changes? They’re things
> such as illegal return statement from void functions (which doesn’t
> actually make sense, as far as I know) that the Solaris compiler flags as
> errors.****
>
>  ****
>
> -bash-4.1$ psql****
>
> psql (PGXC 1.0.3, based on PG 9.1.9)****
>
> Type "help" for help.****
>
>  ****
>
> postgres=# explain select count(*) from accn a1 where exists (select null
> from accn_proc a2 where a2.fk_accn_id=a1.pk_accn_id and a2.fk_sta_id=52);*
> ***
>
>                                   QUERY
> PLAN                                  ****
>
>
> ------------------------------------------------------------------------------
> ****
>
> Aggregate  (cost=0.02..0.03 rows=1 width=0)****
>
>    ->  Nested Loop Semi Join  (cost=0.00..0.01 rows=1 width=0)****
>
>          Join Filter: ((a1.pk_accn_id)::text = (a2.fk_accn_id)::text)****
>
>          ->  Data Node Scan on a1  (cost=0.00..0.00 rows=1000 width=98)***
> *
>
>                Node/s: node1, node2, node3, node4, node5, node6, node7,
> node8****
>
>          ->  Data Node Scan on a2  (cost=0.00..0.00 rows=1000 width=98)***
> *
>
>                Node/s: node1, node2, node3, node4, node5, node6, node7,
> node8****
>
> (7 rows)****
>
>  ****
>
> create table accn(pk_accn_id character varying(40),****
>
> <lots of other column definitions deleted for brevity>)****
>
> distribute by hash(pk_accn_id)****
>
> to node node1, node2, node3, node4, node5, node6, node7,node8;****
>
>  ****
>
> create table accn(pk_accn_id character varying(40),****
>
> <lots of other column definitions deleted for brevity>)****
>
> distribute by hash(fk_accn_id)****
>
> to node node1, node2, node3, node4, node5, node6, node7,node8;****
>
>  ****
>
>  ****
>
> *From:* Ashutosh Bapat [mailto:ash...@en...<ash...@en...>]
>
> *Sent:* Tuesday, June 04, 2013 9:15 PM
> *To:* Matt Warner
> *Cc:* Pos...@li...
> *Subject:* Re: [Postgres-xc-general] XC Performance with Subquery****
>
>  ****
>
> Hi Matt,****
>
> Which version of XC are you using? There has been a lot of change in the
> planner since last release. You may try the latest master HEAD (to be
> released as 1.2 in about a month).****
>
> It will help if you can provide all the table definitions and EXPLAIN
> outputs.****
>
>  ****
>
> On Wed, Jun 5, 2013 at 5:40 AM, Matt Warner <MW...@xi...> wrote:****
>
> I need to correct item 3, below. The coordinator and only one of the data
> nodes goes to work. One by one, each of the data nodes appears to spin up
> to process the request and then go back to sleep.****
>
>  ****
>
> ?****
>
>  ****
>
> *From:* Matt Warner
> *Sent:* Tuesday, June 04, 2013 5:00 PM
> *To:* 'Pos...@li...'
> *Subject:* XC Performance with Subquery****
>
>  ****
>
> I’ve been experimenting with XC and see interesting results. I’m hoping
> someone can help explain something I’m seeing.****
>
>  ****
>
> 1.       I created two distributed tables, one with a primary key, one
> with a foreign key, and hashed both tables by that key. I’m expecting this
> to mean that the data for a given key is localized to a single node.****
>
> 2.       When I perform a simple “select count(*) from table1” I see all
> 8 data nodes consuming CPU (plus the coordinator), which I take to be a
> good sign—all nodes are working in parallel.****
>
> 3.       When I perform a join on the distribution key, I see only the
> coordinator go to work instead of all 8 data nodes.****
>
> 4.       I notice that the explain plan appears similar to page 55 of
> this document (
> http://www.pgcon.org/2012/schedule/attachments/224_Postgres-XC_tutorial.pdf
> ).****
>
> 5.       I have indexes on the distribution keys, but that does not seem
> to make any difference.****
>
>  ****
>
> How do I get XC to perform the join on the data nodes? To be verbose, I am
> expecting to see more CPU resources consumed in this query:****
>
>  ****
>
> select count(*) from tablea a1 where exists (select null from tableb a2
> where a2.fk_accn_id=a1.pk_accn_id and a2.fk_sta_id=52);****
>
>  ****
>
> Rewriting this as a simple join does not seem to work any better.****
>
>  ****
>
> What am I missing?****
>
>  ****
>
> TIA,****
>
>  ****
>
> Matt****
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general****
>
>
>
>
> -- ****
>
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Postgres Database Company****
>
>
>
>
> -- ****
>
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Postgres Database Company****
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company