#14 no such local uid?

tom wible

i've got queued -D running on a solaris & an sgi, and
it seems to work, sorta...if i
queue -i -w -- hostname
from the sgi, i get an answer from the sgi qdemon

SENDMAIL: To 'tomw' from 'queued': Subject: batch
queue_b on opus.rva.trw.com: now/CFDIR/cfm779179935:
Job is starting now.
now/CFDIR/cfm779179935: Job is starting now.

but the qdemon on the sun says

SENDMAIL: To 'tomw' from 'tomw': Subject: queued error
on lisa.rva.trw.com: now/CFDIR/cfm779179935:
1476657152: no such local uid
now/CFDIR/cfm779179935: 1476657152: no such local uid

and vice versa( running queue -i -w -- hostname on the
sun gets an answer from the sun, but the sgi says no
such local uid)

so, does that mean i can only run locally???


  • tom wible

    tom wible - 2001-04-26

    Logged In: YES

    seems that this is due to byte-swapping...if i force the
    solaris build little_endian, the solaris queued is happy
    with the uid from the client (on the sgi), but then it barfs
    on the cookie:

    QueueD: Received invalid cookie. In NO_ROOT, COOKIEFILE must
    the same on all machines! Received cookie: VERSION1

    and the client sees:

    Cookiefile authentication with server failed! Someone else
    is running Queue on this cluster or the other side has the
    wrong cookiefile!

    now what???

  • Werner G Krebs

    Werner G Krebs - 2001-05-11
    • priority: 5 --> 1
  • Werner G Krebs

    Werner G Krebs - 2001-05-11

    Logged In: YES

    Some people have gotten hetergenous clusters to work (usually GNU/Linux & another system), but the code
    probably hasn't been fully debugged.

    Unless you're willing to debug the code and find it what goes wrong you'll need to run the server and clients
    within the same arch system. I.e. sun<->sun and sgi<->sgi should work fine, but not sgi<->sun.

    What's probably going on is the length of the structures still changes slightly between the different archs.
    GNU/Linux structure sizes were supposed to be the standard, with the other archs using that, but this
    probably still hasn't been fuly implemented.

  • Werner G Krebs

    Werner G Krebs - 2001-05-11
    • assigned_to: nobody --> ericdeal
    • status: open --> open-remind
  • Werner G Krebs

    Werner G Krebs - 2001-05-11

    Logged In: YES

    I'm assigning this (politely) to Eric Deal (EJD) who is working on the portability code in hopes that this will
    bring it to his attention.

  • Eric Deal

    Eric Deal - 2001-05-11

    Logged In: YES


    I no longer have time to contribute to Queue and haven't
    done anything with it over the past 9-10 months.

    I had gotten to the point that it looked like the last
    major obstacle remaining to handle cross-platform
    queueing is the passing of the terminfo structure
    to the machine accepting the job.

    This structure is passed in the format of the submitting
    machine, which causes problems when it is extracted on
    the execute machine since the structures are likely to
    be different sizes as well as differently formatted
    (and possibly endian-switched since I believe the structure
    was dumped without going through the endian-swapping

    As Werner indicated, the solution is to standardize on
    a method of passing this structure. This probably involves
    a wrapper on each supported platform to format the
    data on write and read into the format used on Linux.

    Another more portable method (involving more work), might
    be to encode this in a way that is totally independent
    of the linux implementation using keys and values for
    each attribute supported on the submit machine.


  • tom wible

    tom wible - 2001-05-12

    Logged In: YES

    thanx, guys, but i implemented my own platform-independent
    que manager, although not secure or load-sensing (i'm behind
    a firewall & just limit each server to 1 job at a time...rmi
    in java makes it so easy;-)

    i'll donate it to the java project when i get a chance to
    clean it up/extract dependencies...

  • Werner G Krebs

    Werner G Krebs - 2001-05-12

    Logged In: YES

    Cookie verification comes before the UID is checked, so I don't if its happy with the new uid.

    The right fix is to make sure that uid is transmitted and processed in the right order. You could try
    recompiling with big-endian and then changing the code of the client to do byte-swapping on the UID as a
    test to see if it now reads the UID correctly.

    The netfwrite routines are charged with sending things in a consistent byte-order. If this is not happening,
    then this code may need to be tweaked.


Log in to post a comment.