Re: [Postgres-xc-general] Data Node Scan Performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

That was my thinking as well.  I indirectly executed the queries independently.  Basically, each query takes about 5-10 seconds each for 4 data nodes - specifically, I used psql -h <datanode> -p <port> - to time the individual data node performance individually.  So you figure, worst case = 10 seconds x 4 nodes = 40 seconds of aggregate time on a serial request.  But I'm seeing 65 seconds which means there's some other overhead that I'm missing.

The 65 second aggregate is also the reason why I asked if the requests were parallel or serial because it *feels* serial though it could be other factors.

I'll retest and update.

________________________________
From: Ashutosh Bapat [ash...@en...]
Sent: Tuesday, April 29, 2014 1:05 AM
To: Aaron Jackson
Cc: amul sul; pos...@li...
Subject: Re: [Postgres-xc-general] Data Node Scan Performance

Hi Aaron,
Can you please take the timing of executing "EXECUTE DIRECT <query to the datanode>" to some datanode. I suspect that the delay you are seeing is added by the sheer communication between coord and datanode. Some of that would be libpq overhead and some of it will be network overhead.

On Tue, Apr 29, 2014 at 10:58 AM, Aaron Jackson <aja...@re...<mailto:aja...@re...>> wrote:
Interesting,

So, I wonder why I am seeing query times that are more than the sum of the total times required to perform the process without the coordinator.  For example, let's say the query was 'SELECT 500 as Id, Foo, Bar from MyTable WHERE Id = 186' - I could perform this query at all 4 nodes and they would take no more than 10 seconds to run individually.  However, when performed against the coordinator, this same query takes 65 seconds.  That's more than the total aggregate of all data nodes.

Any thoughts - is it completely attributed to the coordinator?

________________________________________
From: amul sul [sul...@ya...<mailto:sul...@ya...>]
Sent: Tuesday, April 29, 2014 12:23 AM
To: Aaron Jackson; pos...@li...<mailto:pos...@li...>
Subject: Re: [Postgres-xc-general] Data Node Scan Performance

>On Tuesday, 29 April 2014 10:38 AM, Aaron Jackson <aja...@re...<mailto:aja...@re...>> wrote:
> my question is, does the coordinator execute the data node scan serially
>or in parallel - and if it's serially,
>is there any thought around how to make it parallel?

IMO, scan on data happens independently i.e parallel, the scan result is collected at co-ordinator and returned to client.

Referring distributed table using other than distribution key(in your case it Q instead of k), has little penalty.

Regards,
Amul Sul

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Postgres-xc-general mailing list
Pos...@li...<mailto:Pos...@li...>
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company