From: <bac...@li...> - 2007-07-31 13:34:58
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=906 ====================================================================== Reported By: ebollengier Assigned To: ebollengier ====================================================================== Project: bacula Issue ID: 906 Category: Storage Daemon Reproducibility: always Severity: minor Priority: normal Status: feedback ====================================================================== Date Submitted: 07-27-2007 09:53 EDT Last Modified: 07-31-2007 09:34 EDT ====================================================================== Summary: After a bad restoration, device can stay in read mode (Reader=1) Description: I have 2 storage deamon with 2 different media type. When i use the wrong storage to do my job (because i'm dumb), it fail and keep my device in read mode. In Use Volume status: R30072 on device "S1_L180_LTO1" (/dev/lto1) Reader=1 writers=0 reserved=0 R30085 on device "S1_L180_LTO2" (/dev/lto2) Reader=1 writers=0 reserved=0 ==== Data spooling: 0 active jobs, 0 bytes; 51 total jobs, 50,032,356,656 max bytes/job. Attr spooling: 0 active jobs, 0 bytes; 51 total jobs, 158,664,603 max bytes. I have to restart the storage deamon to be able to use this device again. The problem seems to be in src/stored/acquire.c acquire_device_for_read() * Acquire device for reading. * The drive should have previously been reserved by calling * reserve_device_for_read(). We read the Volume label from the block and * leave the block pointers just after the label. reserve_device_for_read find a device, and use set_read() to "lock" it for reading. acquire_device_for_read() see that the drive type is uncompatible with the media type, and try to use an other drive and leave the first device in read mode if no other drive can be found. I think, we have to add something like detach_dcr_from_dev(); near src/stored/acquire.c:158 to release the drive after the error. /* * If the MediaType requested for this volume is not the * same as the current drive, we attempt to find the same * device that was used to write the orginal volume. If * found, we switch to using that device. */ ... } else { /* error */ Jmsg1(jcr, M_FATAL, 0, _("No suitable device found to read Volume \"%s\"\n"), vol->VolumeName); jcr->dcr = dcr_save; goto get_out; } ====================================================================== ---------------------------------------------------------------------- kern - 07-28-07 06:18 ---------------------------------------------------------------------- Unfortunately, I think it is probably quite a bit more complicated than just adding a detach_dcr_from_dev(); The problem is that this particular code was experimental, and when I rewrote the reservations code, the use of the dcr changed, so this code must be rewritten. It can probably be simplified since the new_dcr() routine now does most of the work that was done here allowing Bacula to switch drives more easily. ---------------------------------------------------------------------- kern - 07-29-07 08:52 ---------------------------------------------------------------------- Please test with the current SVN rev 5263, which hopefully will fix this problem. If it doesn't fix it, we will need to make some sort of regression test for it so that I can duplicate the problem here. ---------------------------------------------------------------------- ebollengier - 07-30-07 04:55 ---------------------------------------------------------------------- i have got a segfault Thread 3 (Thread 1082132832 (LWP 13224)): http://bugs.bacula.org/view.php?id=0 0x00002b35891c500f in waitpid () from /lib/libpthread.so.0 http://bugs.bacula.org/view.php?id=1 0x0000000000448de4 in signal_handler (sig=11) at signal.c:167 http://bugs.bacula.org/view.php?id=2 <signal handler called> http://bugs.bacula.org/view.php?id=3 0x0000000000420e9e in _unblock_device (file=0x453a9d "lock.c", line=151, dev=0x5b34c8) at lock.c:248 http://bugs.bacula.org/view.php?id=4 0x00000000004211b4 in DEVICE::dunblock (this=0x5b34c8, locked=<value optimized out>) at lock.c:151 http://bugs.bacula.org/view.php?id=5 0x0000000000409e5f in acquire_device_for_read (dcr=0x5b5f88) at acquire.c:283 http://bugs.bacula.org/view.php?id=6 0x00000000004257fe in do_read_data (jcr=0x5a9878) at read.c:78 http://bugs.bacula.org/view.php?id=7 0x000000000041cc08 in read_data_cmd (jcr=0x5a9878) at fd_cmds.c:280 http://bugs.bacula.org/view.php?id=8 0x000000000041cb0d in do_fd_commands (jcr=0x5a9878) at fd_cmds.c:165 http://bugs.bacula.org/view.php?id=9 0x000000000041cfe5 in run_job (jcr=0x5a9878) at fd_cmds.c:128 http://bugs.bacula.org/view.php?id=10 0x000000000041d405 in run_cmd (jcr=0x5a9878) at job.c:194 http://bugs.bacula.org/view.php?id=11 0x000000000041877a in handle_connection_request (arg=<value optimized out>) at dircmd.c:224 http://bugs.bacula.org/view.php?id=12 0x000000000044c8fd in workq_server (arg=<value optimized out>) at workq.c:357 http://bugs.bacula.org/view.php?id=13 0x00002b35891bef1a in start_thread () from /lib/libpthread.so.0 http://bugs.bacula.org/view.php?id=14 0x00002b3589ce5612 in clone () from /lib/libc.so.6 http://bugs.bacula.org/view.php?id=15 0x0000000000000000 in ?? () debug output with -d51 exlisvg3-sd: reserve.c:586 jid=12151 <dird: use storage= media_type=S1_LTO1 pool_name=Scratch pool_type=Backup append=0 copy=0 stripe=0 exlisvg3-sd: reserve.c:615 jid=12151 <dird device: use device=S1_L180 exlisvg3-sd: reserve.c:632 jid=12151 Storage= media_type=S1_LTO1 pool=Scratch pool_type=Backup append=0 exlisvg3-sd: reserve.c:634 jid=12151 Device=S1_L180 exlisvg3-sd: reserve.c:715 jid=12151 PrefMnt=1 exact=1 suitable=0 chgronly=0 any=0 exlisvg3-sd: reserve.c:828 jid=12151 PrefMnt=1 exact=1 suitable=0 chgronly=0 exlisvg3-sd: reserve.c:986 jid=12151 search res for S1_L180 exlisvg3-sd: reserve.c:989 jid=12151 Try match changer res=S1_L180 exlisvg3-sd: reserve.c:995 jid=12151 Try changer device S1_L180_LTO2 exlisvg3-sd: reserve.c:1058 jid=12151 chk MediaType device=S1_LTO1 request=S1_LTO1 exlisvg3-sd: reserve.c:1081 jid=12151 try reserve S1_L180_LTO2 exlisvg3-sd: reserve.c:1201 jid=12151 Inc reserve=1 dev="S1_L180_LTO2" (/dev/lto2) 5b34c8 exlisvg3-sd: reserve.c:1136 jid=12151 Read reserved=1 dev_name=S1_L180 mediatype=S1_LTO1 pool=Scratch ok=1 exlisvg3-sd: reserve.c:1148 jid=12151 >dird changer: 3000 OK use device device=S1_L180_LTO2 exlisvg3-sd: reserve.c:1008 jid=12151 Device S1_L180_LTO2 reserved=1 for read. exlisvg3-sd: reserve.c:959 jid=12151 available device found=S1_L180 exlisvg3-sd: cram-md5.c:71 send: auth cram-md5 <816703320.1185785542@exlisvg3-sd> ssl=0 exlisvg3-sd: fd_cmds.c:359 === Bootstrap file === exlisvg3-sd: fd_cmds.c:361 Volume="R40073" exlisvg3-sd: fd_cmds.c:361 MediaType="S2_LTO1" exlisvg3-sd: fd_cmds.c:361 VolSessionId=428 exlisvg3-sd: fd_cmds.c:361 VolSessionTime=1184338077 exlisvg3-sd: fd_cmds.c:361 VolFile=149 exlisvg3-sd: fd_cmds.c:361 VolBlock=1323-15499 exlisvg3-sd: fd_cmds.c:361 FileIndex=5783 exlisvg3-sd: fd_cmds.c:361 Count=1 exlisvg3-sd: fd_cmds.c:365 === end bootstrap file === Next : 0x0 Root bsr : 0x5b6cd8 VolumeName : R40073 MediaType : S2_LTO1 Device : Slot : 0 SessId : 428 SessTime : 1184338077 VolFile : 149-149 VolBlock : 1323-15499 FileIndex : 5783 count : 1 found : 0 done : non positioning : 1 fast_reject : 1 exlisvg3-sd: read.c:59 Start read data. exlisvg3-sd: acquire.c:65 jcr->dcr=5b5f88 exlisvg3-sd: acquire.c:101 MediaType dcr=S2_LTO1 dev=S1_LTO1 exlisvg3-sd: reserve.c:422 jid=12151 Dec reserve=0 dev="S1_L180_LTO2" (/dev/lto2) exlisvg3-sd: reserve.c:986 jid=12151 search res for exlisvg3-sd: reserve.c:989 jid=12151 Try match changer res=S1_L180 exlisvg3-sd: reserve.c:1018 jid=12151 Try match res=NULL exlisvg3-sd: reserve.c:1018 jid=12151 Try match res=S1_L180_LTO1 exlisvg3-sd: reserve.c:1018 jid=12151 Try match res=S1_L180_LTO2 exlisvg3-sd: reserve.c:1018 jid=12151 Try match res=FileStorage ---------------------------------------------------------------------- kern - 07-31-07 09:34 ---------------------------------------------------------------------- Can you try this again with the current SVN rev 5270 or later. It *should* fix this problem, but until I have a real test case here, I cannot be 100% sure. Issue History Date Modified Username Field Change ====================================================================== 07-27-07 09:53 ebollengier New Issue 07-27-07 09:53 ebollengier Status new => assigned 07-27-07 09:53 ebollengier Assigned To => ebollengier 07-27-07 09:53 ebollengier File Added: log.sd 07-28-07 06:18 kern Note Added: 0002628 07-28-07 06:18 kern Status assigned => feedback 07-29-07 08:52 kern Note Added: 0002630 07-30-07 04:55 ebollengier Note Added: 0002631 07-31-07 09:34 kern Note Added: 0002635 ====================================================================== |