#54 X can't start vbe because shmget fails

closed
nobody
IPC (12)
5
2004-05-27
2004-05-05
Anonymous
No

A node crashed and was rebooted. After the boot it
could not start X, the X log showed that it was failing
to load the "vbe" module (vesa bios extensions). Using
strace it was found that a shmget call was failing:

shmget(IPC_PRIVATE, 655360, IPC_CREAT|0x180|0600) = -1
ENOSPC (No space left on device)

/proc/sysvipc/shm was empty on all nodes.

SHMMNI is 4096, SHMALL is 2097152.

After rebooting the cluster everything is OK and
/proc/sysvipc/shm now shows:

key shmid perms size cpid lpid
nattch uid gid cuid cgid atime dtime
ctime view node_num
0 131074 1644 106496 462255 462366
2 0 0 0 0 1083772776 0
1083772776 default 7
0 114688 1644 106496 462255 331742
2 0 0 0 0 1083772606 0
1083772606 default 5
1 65537 600 655360 462150 462258
2 0 0 0 0 1083772598 1083772598
1083772322 default 7

The kernel is rc3.

Discussion

  • John Hughes

    John Hughes - 2004-05-05

    Logged In: YES
    user_id=166336

    This was submitted by me but sauceforge seems not to have
    noticed.

     
  • Brian J. Watson

    Brian J. Watson - 2004-05-07
    • assigned_to: bjbrew --> nobody
     
  • John Hughes

    John Hughes - 2004-05-25

    Logged In: YES
    user_id=166336

    Looking at ssi_shmget I see a place that won't return ENOSPC
    when it should:

    id = newseg (....);
    if (id < 0)
    goto out_nolock;
    ....
    out_nolock:
    up(&shm_ids.sem);
    return retval;

    But nowhere it does return ENOSPC.

    Drat.

    1.

     
  • John Hughes

    John Hughes - 2004-05-26

    Logged In: YES
    user_id=166336

    Ok, here's what happens.

    When a shared memory segment is created on a node other than
    the nameserver node newseg gets called on the nameserver
    node, which increases shm_tot. Presumably to avoid
    overflowing shm_tot ipcname_shm_genid then reduces shm_tot
    by the size of the segment allocated by newseg. However if
    the node that created the segment goes down then
    ripc_shm_rmid does some cleanup and reduces shm_tot by the
    size of the segment again. This double-decrement makes
    shm_tot go negative, and since shm_ctlall is a size_t (i.e.
    unsigned) negative numbers look like we've allocated too
    much memory.

     
  • Nobody/Anonymous

    Logged In: NO

    Here's a patch that "fixes" the problem for me.

     
  • John Hughes

    John Hughes - 2004-05-26

    Logged In: YES
    user_id=166336

    Drat, can't see how to attach pactch. Here it is (copy sent
    to devel list)

    --- cluster/ssi/ipc/ipcshm_svr.c.orig 2004-02-19
    09:43:02.000000000 +0100
    +++ cluster/ssi/ipc/ipcshm_svr.c 2004-05-26
    11:38:52.000000000 +0200
    @@ -53,6 +53,7 @@
    extern int shm_get_segsize(struct shmid_kernel *);
    extern int shm_get_cpid(struct shmid_kernel *);
    static int do_ssi_shm_noclients(int, clusternode_t, int);
    +static int do_shm_rmid(clusternode_t, int*, int, int);

    int
    ripc_shm_get_shmid_kernel(
    @@ -335,8 +336,17 @@
    int id,
    int size)
    {
    + return do_shm_rmid (node, rval, id, 0);
    +}
    +
    +int
    +do_shm_rmid(
    + clusternode_t node,
    + int *rval,
    + int id,
    + int client)
    +{
    struct shmid_kernel *shp;
    - int sz;

    shp = (struct shmid_kernel *)shm_cli_get(id);
    if (!shp)
    @@ -345,8 +355,11 @@
    /* Reacquire the locks */
    ipc_get_locks(id, &shm_ids, 1);

    - sz = shp->shm_file->f_dentry->d_inode->i_size;
    - shm_tot -= (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
    + if (!client) {
    + int sz =
    shp->shm_file->f_dentry->d_inode->i_size;
    + shm_tot -= (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
    + }
    +
    shm_rmid(id);

    /* Drop the locks aqcuired above */
    @@ -449,7 +462,7 @@
    ret = RIPC_SHM_CLEANUP(svrnode, &rval, id,
    clinode);
    if (ret == -EREMOTE) {
    /* svr has gone down cleanup client
    structure only */
    - ret = ripc_shm_rmid(this_node,
    &rval, id, 1);
    + ret = do_shm_rmid(this_node, &rval,
    id, 1);
    }
    if (ret || rval)
    printk("Failed to cleanup IPC shm
    structures\n");

     
  • Laura Ramirez

    Laura Ramirez - 2004-05-27
    • status: open --> closed
     
  • Laura Ramirez

    Laura Ramirez - 2004-05-27

    Logged In: YES
    user_id=300036

    Fixed has been checked into the devel and stable branch:

    ipc/shm.c - new revision: 1.2.2.13; Stable: 1.2.2.12.2.1;
    - Fix error handling to return error.
    cluster/ssi/ipc/ipcshm_svr.c - new revision: 1.8.2.3;
    Stable: 1.8.2.2.2.1;
    - Fix bug #948584 - where shm_tot was getting decremented
    twice on nodedown, causing the value to become
    negative.

     

Log in to post a comment.