Tracker: Bugs

1 no such local uid? - ID: 418905
Last Update: Comment added ( wkrebs )

i've got queued -D running on a solaris & an sgi, and
it seems to work, sorta...if i
queue -i -w -- hostname
from the sgi, i get an answer from the sgi qdemon

SENDMAIL: To 'tomw' from 'queued': Subject: batch
queue_b on opus.rva.trw.com: now/CFDIR/cfm779179935:
Job is starting now.
now/CFDIR/cfm779179935: Job is starting now.

but the qdemon on the sun says

SENDMAIL: To 'tomw' from 'tomw': Subject: queued error
on lisa.rva.trw.com: now/CFDIR/cfm779179935:
1476657152: no such local uid
now/CFDIR/cfm779179935: 1476657152: no such local uid

and vice versa( running queue -i -w -- hostname on the
sun gets an answer from the sun, but the sgi says no
such local uid)

so, does that mean i can only run locally???


tom wible ( airdrummer ) - 2001-04-25 11:25:55 PDT

1

Open

Remind

Eric Deal

None

None

Public


Comments ( 6 )

Date: 2001-05-12 12:13:51 PDT
Sender: wkrebsProject Admin

Logged In: YES
user_id=32209


Cookie verification comes before the UID is checked, so I don't if its
happy with the new uid.

The right fix is to make sure that uid is transmitted and processed in the
right order. You could try
recompiling with big-endian and then changing the code of the client to do
byte-swapping on the UID as a
test to see if it now reads the UID correctly.

The netfwrite routines are charged with sending things in a consistent
byte-order. If this is not happening,
then this code may need to be tweaked.


Date: 2001-05-12 05:07:10 PDT
Sender: airdrummer

Logged In: YES
user_id=204522

thanx, guys, but i implemented my own platform-independent
que manager, although not secure or load-sensing (i'm behind
a firewall & just limit each server to 1 job at a time...rmi
in java makes it so easy;-)

i'll donate it to the java project when i get a chance to
clean it up/extract dependencies...



Date: 2001-05-11 15:40:31 PDT
Sender: ericdeal

Logged In: YES
user_id=60213

Tom/Werner,

I no longer have time to contribute to Queue and haven't
done anything with it over the past 9-10 months.

I had gotten to the point that it looked like the last
major obstacle remaining to handle cross-platform
queueing is the passing of the terminfo structure
to the machine accepting the job.

This structure is passed in the format of the submitting
machine, which causes problems when it is extracted on
the execute machine since the structures are likely to
be different sizes as well as differently formatted
(and possibly endian-switched since I believe the structure
was dumped without going through the endian-swapping
wrappers).

As Werner indicated, the solution is to standardize on
a method of passing this structure. This probably involves
a wrapper on each supported platform to format the
data on write and read into the format used on Linux.

Another more portable method (involving more work), might
be to encode this in a way that is totally independent
of the linux implementation using keys and values for
each attribute supported on the submit machine.

Eric


Date: 2001-05-11 15:15:04 PDT
Sender: wkrebsProject Admin

Logged In: YES
user_id=32209


I'm assigning this (politely) to Eric Deal (EJD) who is working on the
portability code in hopes that this will
bring it to his attention.


Date: 2001-05-11 15:10:02 PDT
Sender: wkrebsProject Admin

Logged In: YES
user_id=32209


Some people have gotten hetergenous clusters to work (usually GNU/Linux
& another system), but the code
probably hasn't been fully debugged.

Unless you're willing to debug the code and find it what goes wrong you'll
need to run the server and clients
within the same arch system. I.e. sun<->sun and sgi<->sgi
should work fine, but not sgi<->sun.

What's probably going on is the length of the structures still changes
slightly between the different archs.
GNU/Linux structure sizes were supposed to be the standard, with the other
archs using that, but this
probably still hasn't been fuly implemented.


Date: 2001-04-26 10:30:28 PDT
Sender: airdrummer

Logged In: YES
user_id=204522

seems that this is due to byte-swapping...if i force the
solaris build little_endian, the solaris queued is happy
with the uid from the client (on the sgi), but then it barfs
on the cookie:

QueueD: Received invalid cookie. In NO_ROOT, COOKIEFILE must
be
the same on all machines! Received cookie: VERSION1

and the client sees:

Cookiefile authentication with server failed! Someone else
is running Queue on this cluster or the other side has the
wrong cookiefile!

now what???


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
resolution_id None 2001-05-11 15:15:04 PDT wkrebs
assigned_to nobody 2001-05-11 15:15:04 PDT wkrebs
priority 5 2001-05-11 15:10:02 PDT wkrebs