no such local uid?
Brought to you by:
wkrebs
i've got queued -D running on a solaris & an sgi, and
it seems to work, sorta...if i
queue -i -w -- hostname
from the sgi, i get an answer from the sgi qdemon
SENDMAIL: To 'tomw' from 'queued': Subject: batch
queue_b on opus.rva.trw.com: now/CFDIR/cfm779179935:
Job is starting now.
now/CFDIR/cfm779179935: Job is starting now.
but the qdemon on the sun says
SENDMAIL: To 'tomw' from 'tomw': Subject: queued error
on lisa.rva.trw.com: now/CFDIR/cfm779179935:
1476657152: no such local uid
now/CFDIR/cfm779179935: 1476657152: no such local uid
and vice versa( running queue -i -w -- hostname on the
sun gets an answer from the sun, but the sgi says no
such local uid)
so, does that mean i can only run locally???
Logged In: YES
user_id=204522
seems that this is due to byte-swapping...if i force the
solaris build little_endian, the solaris queued is happy
with the uid from the client (on the sgi), but then it barfs
on the cookie:
QueueD: Received invalid cookie. In NO_ROOT, COOKIEFILE must
be
the same on all machines! Received cookie: VERSION1
and the client sees:
Cookiefile authentication with server failed! Someone else
is running Queue on this cluster or the other side has the
wrong cookiefile!
now what???
Logged In: YES
user_id=32209
Some people have gotten hetergenous clusters to work (usually GNU/Linux & another system), but the code
probably hasn't been fully debugged.
Unless you're willing to debug the code and find it what goes wrong you'll need to run the server and clients
within the same arch system. I.e. sun<->sun and sgi<->sgi should work fine, but not sgi<->sun.
What's probably going on is the length of the structures still changes slightly between the different archs.
GNU/Linux structure sizes were supposed to be the standard, with the other archs using that, but this
probably still hasn't been fuly implemented.
Logged In: YES
user_id=32209
I'm assigning this (politely) to Eric Deal (EJD) who is working on the portability code in hopes that this will
bring it to his attention.
Logged In: YES
user_id=60213
Tom/Werner,
I no longer have time to contribute to Queue and haven't
done anything with it over the past 9-10 months.
I had gotten to the point that it looked like the last
major obstacle remaining to handle cross-platform
queueing is the passing of the terminfo structure
to the machine accepting the job.
This structure is passed in the format of the submitting
machine, which causes problems when it is extracted on
the execute machine since the structures are likely to
be different sizes as well as differently formatted
(and possibly endian-switched since I believe the structure
was dumped without going through the endian-swapping
wrappers).
As Werner indicated, the solution is to standardize on
a method of passing this structure. This probably involves
a wrapper on each supported platform to format the
data on write and read into the format used on Linux.
Another more portable method (involving more work), might
be to encode this in a way that is totally independent
of the linux implementation using keys and values for
each attribute supported on the submit machine.
Eric
Logged In: YES
user_id=204522
thanx, guys, but i implemented my own platform-independent
que manager, although not secure or load-sensing (i'm behind
a firewall & just limit each server to 1 job at a time...rmi
in java makes it so easy;-)
i'll donate it to the java project when i get a chance to
clean it up/extract dependencies...
Logged In: YES
user_id=32209
Cookie verification comes before the UID is checked, so I don't if its happy with the new uid.
The right fix is to make sure that uid is transmitted and processed in the right order. You could try
recompiling with big-endian and then changing the code of the client to do byte-swapping on the UID as a
test to see if it now reads the UID correctly.
The netfwrite routines are charged with sending things in a consistent byte-order. If this is not happening,
then this code may need to be tweaked.