When defining a large number of tables pgxc receives the following messages:
LOG: failed to connect to Datanode
WARNING: can not connect to node 16385
ERROR: Failed to get pooled connections
LOG: failed to acquire connections
WARNING: unexpected EOF on datanode connection
ERROR: sorry, too many clients already
I have attached a detailed explanation of the installation, configuration and a tzg file containing the ddl and logs from the coordinator and logs from the data nodes.
I'm also seeing this as well:
psql:cnaf_pgxc_ddl.sql:1687: PANIC: sorry, too many clients already
PANIC: sorry, too many clients already
psql:cnaf_pgxc_ddl.sql:1687: connection to server was lost
Also, on each of the guests I did the following:
sysctl -w kernel.shmmax=17179869184
sysctl -w kernel.shmall=4194304
The guest configuration is the same for all 4 VMs running on ESXi 5.
Ubuntu 14 server
4 vCPUs and 8GB real 128GB Disk (each has dedicated drive)
I'm having the very same issue running 1.1-2ubuntu2 (installed from apt repo) and am using Ubuntu 14.04 Trusty.
Each VM has 4GB of RAM with 36GB of Disk. I've set my VM's to overcommit memory to 2, and have the shared memory cranked up to around 3GB.
Running our DDL script, exported from a vanilla postgres database, it hits the connection limit about 2/3 of the way through, then fails with "PANIC: sorry, too many clients already.
Issuing a select on pg_stat_activity shows the only connection is my psql connection. I've tried cranking up max_connections and max_pool_size with no luck. max_pool_size is set to the number of nodes * max_connections as recommended. I'm currently trying to dig using strace and gdb, but haven't run against anything yet that is giving me a clue as to what the problem might be.
I also tried loading a sample database from the postgres wiki (Booktown). It loads with out running into this specific issue (although not error free). Since my cluster environment is prototype only at this point, and behind our fortress, I don't have pg_hba locked down.
Lastly, issuing a ps command and search doesn't show anything abnormal. My config is setup using pgxc_ctl.
This one has me scratching my head...
Below is some stat output from strace (if that's helpful). I notice the top hits have to do with shared memory. Is it possible shared memory is getting corrupted? Hitting it's limit (it's already set to a healthy amount already based on available RAM)?
^CProcess 25970 detached
% time seconds usecs/call calls errors syscall
40.58 0.008323 8323 1 shmctl
32.25 0.006616 441 15 clone
16.61 0.003407 3407 1 shmdt
3.09 0.000633 25 25 wait4
2.41 0.000495 29 17 14 select
0.88 0.000180 3 64 rt_sigprocmask
0.86 0.000176 7 27 write
0.56 0.000114 1 123 semctl
0.34 0.000069 69 1 fsync
0.32 0.000065 5 12 kill
0.31 0.000064 4 15 14 rt_sigreturn