From: Mantis B. T. <no...@bu...> - 2012-12-14 16:50:27
|
The following issue has been SUBMITTED. ====================================================================== http://bugs.bacula.org/view.php?id=1971 ====================================================================== Reported By: giorgi3 Assigned To: ====================================================================== Project: bacula Issue ID: 1971 Category: Director Reproducibility: random Severity: major Priority: normal Status: new ====================================================================== Date Submitted: 2012-12-14 16:50 UTC Last Modified: 2012-12-14 16:50 UTC ====================================================================== Summary: Bacula-dir hang between autoprune process and reload process Description: Customer has multiple hang conditions in bacula-dir on a bacula server when a bconsole reload command is issued while backups are also running. They moved to a latest version and still see the problem. It appears to be a deadlock in rwl_writelock_p between the reload process and an autoprune process. We advised the customer to avoid using the reload command while backups are running. Here is some information we captured with gdb showing the issue: The reload process: (gdb) bt http://bugs.bacula.org/view.php?id=0 0x0000003070e0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 http://bugs.bacula.org/view.php?id=1 0x0000003f74234d56 in rwl_writelock_p (rwl=0x7f86d85b3460, file=<value optimized out>, line=<value optimized out>) at rwlock.c:245 http://bugs.bacula.org/view.php?id=2 0x0000003f75209756 in B_DB::_db_lock (this=<value optimized out>, file=0x3f7521e165 "sql_create.c", line=511) at cats.c:119 http://bugs.bacula.org/view.php?id=3 0x0000003f7520c809 in db_create_client_record (jcr=0x0, mdb=0x7f86d85b3458, cr=0x7f846c9bf900) at sql_create.c:511 http://bugs.bacula.org/view.php?id=4 0x000000000040caac in check_catalog (mode=UPDATE_CATALOG) at dird.c:1069 http://bugs.bacula.org/view.php?id=5 0x000000000040e7a7 in reload_config (sig=<value optimized out>) at dird.c:548 http://bugs.bacula.org/view.php?id=6 0x000000000043387e in reload_cmd (ua=<value optimized out>, cmd=<value optimized out>) at ua_cmds.c:1280 http://bugs.bacula.org/view.php?id=7 0x0000000000436326 in do_a_command (ua=0x7f84f01ce468) at ua_cmds.c:240 http://bugs.bacula.org/view.php?id=8 0x000000000044d6b2 in handle_UA_client_request (arg=0x7f86e800b038) at ua_server.c:146 http://bugs.bacula.org/view.php?id=9 0x0000003f7424174d in workq_server (arg=0x67da40) at workq.c:344 http://bugs.bacula.org/view.php?id=10 0x0000003f742477e2 in lmgr_thread_launcher (x=0x7f86e8003c58) at lockmgr.c:939 http://bugs.bacula.org/view.php?id=11 0x0000003070e07851 in start_thread () from ./lib64/libpthread.so.0 http://bugs.bacula.org/view.php?id=12 0x0000003070ae76dd in clone () from ./lib64/libc.so.6 (gdb) (gdb) frame 1 http://bugs.bacula.org/view.php?id=1 0x0000003f74234d56 in rwl_writelock_p (rwl=0x7f86d85b3460, file=<value optimized out>, line=<value optimized out>) at rwlock.c:245 245 if ((stat = pthread_cond_wait(&rwl->write, &rwl->mutex)) != 0) { Here is the owner thread: (gdb) p/x rwl->writer_id $6 = 0x7f84f8dfa700 (gdb) 65 Thread 0x7f84f8dfa700 (LWP 8649) 0x0000003070e0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 It's stack: (gdb) thread 65 [Switching to thread 65 (Thread 0x7f84f8dfa700 (LWP 8649))]http://bugs.bacula.org/view.php?id=0 0x0000003070e0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 (gdb) bt http://bugs.bacula.org/view.php?id=0 0x0000003070e0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 http://bugs.bacula.org/view.php?id=1 0x0000003f74234d56 in rwl_writelock_p (rwl=0x3f7480aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:245 http://bugs.bacula.org/view.php?id=2 0x0000003f74607ce0 in b_LockRes (file=0x3f746088f0 "res.c", line=94) at res.c:64 http://bugs.bacula.org/view.php?id=3 0x0000003f74607d65 in GetResWithName (rcode=<value optimized out>, name=0x7f84d8004038 "Backup_oo95sr04") at res.c:94 http://bugs.bacula.org/view.php?id=4 0x000000000043f435 in job_select_handler (ctx=0x7f84d8002308, num_fields=<value optimized out>, row=0x7f84d8002798) at ua_prune.c:403 http://bugs.bacula.org/view.php?id=5 0x0000003f75604162 in B_DB_POSTGRESQL::db_sql_query (this=0x7f86d85b3458, query=<value optimized out>, result_handler=0x43f3c0 <job_select_handler(void*, int, char**)>, ctx=0x7f84d8002308) at postgresql.c:565 http://bugs.bacula.org/view.php?id=6 0x00000000004402e4 in prune_jobs (ua=0x7f84d8001d58, client=0x7f86d8298cb8, pool=<value optimized out>, JobType=<value optimized out>) at ua_prune.c:527 http://bugs.bacula.org/view.php?id=7 0x0000000000410dc9 in do_autoprune (jcr=0x7f86a8009fb8) at autoprune.c:64 http://bugs.bacula.org/view.php?id=8 0x00000000004253fd in job_thread (arg=0x7f86a8009fb8) at job.c:343 http://bugs.bacula.org/view.php?id=9 0x0000000000426390 in jobq_server (arg=0x67d740) at jobq.c:450 http://bugs.bacula.org/view.php?id=10 0x0000003f742477e2 in lmgr_thread_launcher (x=0x7f852c0055c8) at lockmgr.c:939 http://bugs.bacula.org/view.php?id=11 0x0000003070e07851 in start_thread () from ./lib64/libpthread.so.0 http://bugs.bacula.org/view.php?id=12 0x0000003070ae76dd in clone () from ./lib64/libc.so.6 (gdb) (gdb) frame 1 http://bugs.bacula.org/view.php?id=1 0x0000003f74234d56 in rwl_writelock_p (rwl=0x3f7480aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:245 245 if ((stat = pthread_cond_wait(&rwl->write, &rwl->mutex)) != 0) { Owner thread: (gdb) p/x rwl->writer_id $7 = 0x7f846c9c0700 * 161 Thread 0x7f846c9c0700 (LWP 27403) 0x0000003070e0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 So Thread 65 and 161 are deadlocked. The customer has to do a service bacula-dir restart to free it up. Steps to Reproduce: Unknown. The problem is intermittent. Perhaps a manual prune and simultaneous reload? ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2012-12-14 16:50 giorgi3 New Issue ====================================================================== |