|
From: Josh B. <jo...@ag...> - 2014-05-20 16:23:58
|
On 04/29/2014 12:38 PM, Aaron Jackson wrote: > When I load data into my table "detail" with COPY, the table loads at a rate of about 56k rows per second. The data is distributed on a key to achieve this rate of insert (width is 678). However, when I do the following: > > INSERT INTO DETAIL SELECT 123 as Id, ... FROM DETAIL WHERE Id = 500; > > I see the write performance drop to only 2.5K rows per second. The total data set loaded from Id = 500 is 200k rows and takes about 7s to load into the data coordinator. So, I can attribute almost all of the time (about 80 seconds) directly to the insert. > > Insert on detail (cost=0.00..10.00 rows=1000 width=678) (actual time=79438.038..79438.038 rows=0 loops=1) > Node/s: node_pgs01_1, node_pgs01_2, node_pgs02_1, node_pgs02_2 > Node expr: productid > -> Data Node Scan on detail "_REMOTE_TABLE_QUERY_" (cost=0.00..10.00 rows=1000 width=678) (actual time=3.917..2147.231 rows=200000 loops=1) > Node/s: node_pgs01_1, node_pgs01_2, node_pgs02_1, node_pgs02_2 > > IMO, it seems like an insert like this should approach the performance of a COPY. Am I missing something or can you recommend a different approach? Well, COPY is much faster on vanilla Postgres, for a variety of optimization reasons. I don't see why PostgresXC would be different. Admittedly, the 20X differential is higher than single-node Postgres, so that seems worth investigating. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |