|
From: Matt W. <MW...@XI...> - 2013-06-11 23:52:54
|
For comparison, it only takes about 15 minutes to export just the reduced number of columns for the 56M rows to a file and then another 15 minutes to import that into a new, distributed table. Matt From: Matt Warner Sent: Tuesday, June 11, 2013 4:12 PM To: Pos...@li... Subject: Problem With Memory Handling When Creating Table Subset I'm using the pre-release version of XC and noticed that with very large tables the coordinator is consuming a huge amount of memory. Specifically: * I have a table distributed by hash across 8 nodes. The initial load had no issues (using 'copy table from file' syntax). * The table has 56M total records in it. What's failing is the statement below, which selects a subset of the columns in the original table and creates a new distributed table. Note that I'm using the same hash key that was used to originally distribute the data. That is, the data isn't really being redistributed, but (conceptually) just copied within the node to a different table with fewer columns: create table accn_proc_reduced distribute by hash(fk_accn_id) to node (node1, node2, node3, node4, node5, node6, node7,node8) as select pk_accn_proc_seq_id,fk_proc_id,fk_fac_id,fk_accn_id,fk_subm_file_seq_id,fk_sta_id,exp_prc,bil_prc,gross_prc,due_amt,due_amt_with_bulk,exp_due_amt,exp_due_amt_without_bulk,fk_bill_fac_id from accn_proc; Version of XC: psql (PGXC 1.1devel, based on PG 9.2beta2) When I say that the coordinator is consuming a huge amount of RAM, I mean that a system running these 8 nodes, the GTM, and the coordinator has 64GB RAM, and that the coordinator's consumption keeps growing until the system runs out of RAM+swap space and finally generates an "ERROR: out of memory" message, and then dumps a core file. The process of running the system out of RAM while trying to create this new table takes about 6 hours, much longer than the initial load of the data. The only changes to the default postgresql.conf file are to set shared_buffers=1GB and effective_cache_size=1GB (8 nodes, 64GB RAM + swap in the system). Assuming this is not expected behavior, what debug data can I provide to troubleshoot? Matt |