|
From: Ashutosh B. <ash...@en...> - 2014-05-21 05:00:38
|
Hi Aaron, >From the plan you have given we can see that INSERT is happening on the coordinator, inserting one row at a time. Although the INSERT statement is prepared on the datanode, each EXECUTE incurs the libpq and execution overheads on datanode. What should ideally happen is, all the rows to be inserted on a same datanode should be stored in some sort of file and bulk inserted (using COPY protocol). But this is not implemented yet, because 1. We do not have resources to implement it 2. We do not have global statistics at the coordinator to estimate, how many rows SELECT is going to returns, and hence can not decide whether to use single insert at a time (for small number of rows) or bulk insert (large number of rows). On Tue, Apr 29, 2014 at 10:08 PM, Aaron Jackson <aja...@re...>wrote: > When I load data into my table "detail" with COPY, the table loads at a > rate of about 56k rows per second. The data is distributed on a key to > achieve this rate of insert (width is 678). However, when I do the > following: > > INSERT INTO DETAIL SELECT 123 as Id, ... FROM DETAIL WHERE Id = 500; > > I see the write performance drop to only 2.5K rows per second. The > total data set loaded from Id = 500 is 200k rows and takes about 7s to load > into the data coordinator. So, I can attribute almost all of the time > (about 80 seconds) directly to the insert. > > Insert on detail (cost=0.00..10.00 rows=1000 width=678) (actual > time=79438.038..79438.038 rows=0 loops=1) > Node/s: node_pgs01_1, node_pgs01_2, node_pgs02_1, node_pgs02_2 > Node expr: productid > -> Data Node Scan on detail "_REMOTE_TABLE_QUERY_" (cost=0.00..10.00 > rows=1000 width=678) (actual time=3.917..2147.231 rows=200000 loops=1) > Node/s: node_pgs01_1, node_pgs01_2, node_pgs02_1, node_pgs02_2 > > IMO, it seems like an insert like this should approach the performance > of a COPY. Am I missing something or can you recommend a different > approach? > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |