Menu

#486 Posgtres-xc 1.2 loses connection when defining many tables

1.2 Dev Q
open
nobody
None
5
2014-07-10
2014-06-27
Mike
No

When defining a large number of tables pgxc receives the following messages:
LOG: failed to connect to Datanode
WARNING: can not connect to node 16385
ERROR: Failed to get pooled connections
LOG: failed to acquire connections
WARNING: unexpected EOF on datanode connection
ERROR: sorry, too many clients already

I have attached a detailed explanation of the installation, configuration and a tzg file containing the ddl and logs from the coordinator and logs from the data nodes.

4 Attachments

Discussion

  • Mike

    Mike - 2014-06-27

    I'm also seeing this as well:
    psql:cnaf_pgxc_ddl.sql:1687: PANIC: sorry, too many clients already
    PANIC: sorry, too many clients already
    psql:cnaf_pgxc_ddl.sql:1687: connection to server was lost

     
  • Mike

    Mike - 2014-06-27

    Also, on each of the guests I did the following:

    sysctl -w kernel.shmmax=17179869184
    sysctl -w kernel.shmall=4194304

     
  • Mike

    Mike - 2014-06-27

    The guest configuration is the same for all 4 VMs running on ESXi 5.

    Ubuntu 14 server

    4 vCPUs and 8GB real 128GB Disk (each has dedicated drive)

     
  • Daniel

    Daniel - 2014-07-10

    I'm having the very same issue running 1.1-2ubuntu2 (installed from apt repo) and am using Ubuntu 14.04 Trusty.

    Each VM has 4GB of RAM with 36GB of Disk. I've set my VM's to overcommit memory to 2, and have the shared memory cranked up to around 3GB.

    Running our DDL script, exported from a vanilla postgres database, it hits the connection limit about 2/3 of the way through, then fails with "PANIC: sorry, too many clients already.

    Issuing a select on pg_stat_activity shows the only connection is my psql connection. I've tried cranking up max_connections and max_pool_size with no luck. max_pool_size is set to the number of nodes * max_connections as recommended. I'm currently trying to dig using strace and gdb, but haven't run against anything yet that is giving me a clue as to what the problem might be.

    I also tried loading a sample database from the postgres wiki (Booktown). It loads with out running into this specific issue (although not error free). Since my cluster environment is prototype only at this point, and behind our fortress, I don't have pg_hba locked down.

    Lastly, issuing a ps command and search doesn't show anything abnormal. My config is setup using pgxc_ctl.

    This one has me scratching my head...

    Below is some stat output from strace (if that's helpful). I notice the top hits have to do with shared memory. Is it possible shared memory is getting corrupted? Hitting it's limit (it's already set to a healthy amount already based on available RAM)?

    ^CProcess 25970 detached
    % time seconds usecs/call calls errors syscall


    40.58 0.008323 8323 1 shmctl
    32.25 0.006616 441 15 clone
    16.61 0.003407 3407 1 shmdt
    3.09 0.000633 25 25 wait4
    2.41 0.000495 29 17 14 select
    0.88 0.000180 3 64 rt_sigprocmask
    0.86 0.000176 7 27 write
    0.56 0.000114 1 123 semctl
    0.34 0.000069 69 1 fsync
    0.32 0.000065 5 12 kill
    0.31 0.000064 4 15 14 rt_sigreturn

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.