From: Kern S. <ke...@si...> - 2006-07-22 15:32:58
|
On Saturday 22 July 2006 17:27, Alan Brown wrote: > On Sat, 22 Jul 2006, Kern Sibbald wrote: > >> While things are waiting for the storage, the storage daemon becomes > >> completely unresponsive, 'status storage' hangs forever and my running > >> bacula-tray-monitor wedged itself. > >> > >> It _usually_ sorts itself out after the waiting job manages to get a > >> connection and start spooling in parallel, but this means delays of > >> anything from a few minutes to a few hours, depending on how long it > >> takes the spool to fill (incrementals on multi-million file filesystems > >> take forever to fill a spool file due to the sheer size of directory > >> structures) > >> > >> > >> > >> Addtionally if a job is waiting for a pool which isn't available, > >> subsequent jobs for pools which are in the drives are also locked out. > >> > >> > >> I hope that made sense. :) > > > > The blocking makes sense only if you have not set your Maximum Concurrent > > Jobs sufficiently high in your Storage daemon's conf file. > > I have it set to 100 (impossibly high) and have it that way for quite a > while - during this entire round of testing and at least since I updated > to 1.38.11 > > Storage { # definition of myself > Name = msslay-sd > SDPort = 9103 # Director's port > WorkingDirectory = "/var/bacula/working" > Pid Directory = "/var/run" > Maximum Concurrent Jobs = 100 > Heartbeat Interval = 61s > } > > > I believe it is > > documented somewhere, but it may not be so obvious that for each Director > > Job that runs, you need at least a value of two for Maximum Concurrent > > Jobs, because it is in fact Maximum Concurrent connections rather than > > Jobs, and you will always have a connection from the Director and one > > from the File daemon. > > I did read that. Part of the reason for setting it so high was to allow > for tray monitors and/or bconsoles operating on administrative > workstations to connect (not done yet). > > > If you then want to connect the console and the system tray, you > > will need two more. Also, each job that the Director permits to start, > > but cannot reserve a drive, will be constantly using one of the SD's > > "Maximum Concurrent Jobs" slots. > > > > I suspect that your blocking is simply that you don't allow enough > > concurrency in your SD. In general, you should set it *very* high and > > control the concurrency within the Director, and in any case, the > > reservation system will limit the concurrency depending on how you use > > your drives. > > As you can see, the concurrency is set high in the SD. > > In the clients, Maximum Concurrent Jobs = 20 > > In the Director, the settings are: > > Director { # define myself > Maximum Concurrent Jobs = 40 > } > > Storage { > . > Filestore = 20 > Changer = 10 > Changer drive 0 = 10 > Changer drive 1 = 10 > DVD writer = 1 > . > } > > Clients { > . > Almost all = 5 > (6 clients defined at present) > . > } > > JobDefs { > . > Maximum Concurrent Jobs = 1 > . > } > > No jobs are defined with higher concurrency. > > > Does that help with this part? Well, in that case, I have no idea why consoles and tray monitors block in your SD. There are a few "global" type locks such as on accessing certain resources, but they should never block very long. The complexity of this kind of thing goes up something like exponentially with the number of threads running, so I have no idea how to debug it, unless you can reproduce it with say two jobs. |