Jeremy,

We have 6 nodes, dual cpu, so 12 processors. The other cluster has 10 nodes, dual cpus. Both cluster have very similar configuration. I am
debugging now the 6 nodes cluster.

The c3.conf file is:

cluster oscar_cluster {
        tecws004
        dead remove_line_for_0-indexing
        ce101.cetsia
        ce102.cetsia
        ce103.cetsia
        ce104.cetsia
        ce105.cetsia
}

Thanks again,

Carlos
 
 

Jeremy Enos wrote:

Odd... so you have 10 total nodes or 20?  Either way, how many show up in your /etc/c3.conf file?  I want to check this because the C3 configuration depends on the same thing that the PBS configuration depends on.
thx-

Jeremy

At 02:13 AM 3/29/2004, Carlos Vasco Ortiz wrote:

Jeremy, here is the pbs conf. of the cluster with problems. Another problem in the new production cluster (it is a clone of this 'trial' cluster) is that, when we added 4 more nodes (it had 6, like this one), the routing queue left working at all, for every number of nodes requested.

I am keeping the oscar list informed...

Thanks,
Carlos

#
# Create queues and set their attributes.
#
#
# Create and define queue serie_medium
#
create queue serie_medium
set queue serie_medium queue_type = Execution
set queue serie_medium resources_max.ncpus = 20
set queue serie_medium resources_max.nodect = 1
set queue serie_medium resources_max.pcput = 72:00:00
set queue serie_medium resources_default.ncpus = 1
set queue serie_medium resources_default.nodect = 1
set queue serie_medium resources_default.pcput = 72:00:00
set queue serie_medium enabled = True
set queue serie_medium started = True
#
# Create and define queue feed
#
create queue feed
set queue feed queue_type = Route
set queue feed route_destinations = serie_medium
set queue feed route_destinations += serie_vlong
set queue feed route_destinations += parallel_medium
set queue feed route_destinations += parallel_vlong
set queue feed enabled = True
set queue feed started = True
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq resources_max.cput = 10000:00:00
set queue workq resources_max.ncpus = 12
set queue workq resources_max.nodect = 6
set queue workq resources_max.walltime = 10000:00:00
set queue workq resources_min.cput = 00:00:01
set queue workq resources_min.ncpus = 1
set queue workq resources_min.nodect = 1
set queue workq resources_min.walltime = 00:00:01
set queue workq resources_default.cput = 10000:00:00
set queue workq resources_default.ncpus = 1
set queue workq resources_default.nodect = 1
set queue workq resources_default.walltime = 10000:00:00
set queue workq resources_available.nodect = 6
set queue workq enabled = True
set queue workq started = True
#
# Create and define queue parallel_medium
#
create queue parallel_medium
set queue parallel_medium queue_type = Execution
set queue parallel_medium resources_max.ncpus = 20
set queue parallel_medium resources_max.nodect = 20
set queue parallel_medium resources_max.pcput = 72:00:00
set queue parallel_medium resources_default.ncpus = 1
set queue parallel_medium resources_default.nodect = 1
set queue parallel_medium resources_default.pcput = 72:00:00
set queue parallel_medium enabled = True
set queue parallel_medium started = True
#
# Create and define queue serie_vlong
#
create queue serie_vlong
set queue serie_vlong queue_type = Execution
set queue serie_vlong resources_max.ncpus = 20
set queue serie_vlong resources_max.nodect = 1
set queue serie_vlong resources_max.pcput = 2000:00:00
set queue serie_vlong resources_min.pcput = 72:00:01
set queue serie_vlong resources_default.ncpus = 1
set queue serie_vlong resources_default.nodect = 1
set queue serie_vlong resources_default.pcput = 2000:00:00
set queue serie_vlong enabled = True
#
# Create and define queue parallel_vlong
#
create queue parallel_vlong
set queue parallel_vlong queue_type = Execution
set queue parallel_vlong resources_max.ncpus = 20
set queue parallel_vlong resources_max.nodect = 20
set queue parallel_vlong resources_max.pcput = 2000:00:00
set queue parallel_vlong resources_default.ncpus = 1
set queue parallel_vlong resources_default.nodect = 1
set queue parallel_vlong resources_default.pcput = 2000:00:00
set queue parallel_vlong enabled = True
set queue parallel_vlong started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server default_queue = feed
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.ncpus = 12
set server resources_available.nodect = 6
set server resources_available.nodes = 6
set server resources_max.ncpus = 12
set server resources_max.nodes = 6
set server scheduler_iteration = 60
set server node_pack = False
 
 

Jeremy Enos wrote:

It sounds like your routing queue might be more restrictive than the workq.  Could you send the output of the following command?
 qmgr -c "print server"

thx-

Jeremy

At 09:43 AM 3/26/2004, Carlos Vasco Ortiz wrote:

We have installed our second OSCAR cluster, this time with OSCAR 3.0 (our first OSCAR cluster had 2.3), and we have found a problem
with PBS and the usual routing queue, in spite of having the same queue configuration in both cluster.
The problem is the following:

We have a feed queue, that routes to the workq defined queue. If I submit a job asking for all the nodes in the cluster through  the feed queue, the following message appears:
qsub: Job exceeds queue resource limits

If we ask for all the nodes minus one, it works. If I ask for the queue workq instead of feed, it also works....

Any idea of what is happening?

Thank you,

Carlos

-- 
Carlos Vasco Ortiz (ITP Tecnología y Métodos)
Tel: 34 91 207 91 21   [ITP-only internal ext.: 91 21]
Fax: 34 91 207 94 11
mailto:carlos.vasco@itp.es
 
-- 
Carlos Vasco Ortiz (ITP Tecnología y Métodos)
Tel: 34 91 207 91 21   [ITP-only internal ext.: 91 21]
Fax: 34 91 207 94 11
mailto:carlos.vasco@itp.es
 
-- 
Carlos Vasco Ortiz (ITP Tecnología y Métodos)
Tel: 34 91 207 91 21   [ITP-only internal ext.: 91 21]
Fax: 34 91 207 94 11
mailto:carlos.vasco@itp.es