From: James L. <jam...@ho...> - 2005-11-17 00:08:18
Attachments:
evms-engine.log.tgz
|
Thanks for the reply Steve. I've retried setting up a degraded RAID5 array as described below, and have hit a different error in EVMS ("*** glibc detected *** free(): invalid next size (fast): 0x0819d720 ***"). Procedure was: 1) Delete all md arrays and all segments from /dev/sdb (using EVMS). Worked fine. 2) Close and reopen EVMS to check all is still well. It is. Close it down again. 3) Using cfdisk, create two logical (not primary) partitions on /dev/sdb (which had just free space), 100MB each (these are /dev/sdb5 and /dev/sdb6). Mark them as type 0xfd (Linux RAID autodetect). Save changes and again reload and close EVMS to check it's still happy; which it is. 4) Use mdadm to create a degraded RAID5 array using the following command: "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 missing" 5) mdadm report that this array gets created successfully (in a degraded state): james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Wed Nov 16 23:05:41 2005 Raid Level : raid5 Array Size : 192512 (188.00 MiB 197.13 MB) Device Size : 96256 (94.00 MiB 98.57 MB) Raid Devices : 3 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Nov 16 23:05:41 2005 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 6f0fa83a:ae0b4a99:a199ce36:9e21815c Events : 0.4 Number Major Minor RaidDevice State 0 8 21 0 active sync /dev/.static/dev/sdb5 1 8 22 1 active sync /dev/.static/dev/sdb6 2 0 0 - removed 6) Start EVMS GUI. Gives me a warning about /md/md0 being in a degraded state (which is fine). It doesn't give me the option to create an EVMS volume from the /md/md0 array (which is not good). It sees the array, and reports it as both degraded and corrupted (it shouldn't be, right?). 7) Close down EVMS and restart it. Again, I get a warning about the array being degraded. Now I do have the option to create a volume from the array. On trying to do this however, I get the following error message: *** glibc detected *** free(): invalid next size (fast): 0x0819d720 *** and EVMS hangs completely. I've attached a gzipped copy of the evms-engine.log file. Hope this helps... What is the relationship between the EVMS MD plugin and mdadm by the way? Do they use the same code, or are they entirely different programs that interface in their own way with the Linux RAID driver? Should arrays created by mdadm and array created by EVMS be identical once they have been created? James >From: Steve Dobbelstein <st...@us...> >To: "James Lee" <jam...@ho...> >CC: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array with >EVMS? >Date: Wed, 16 Nov 2005 11:53:51 -0600 > >"James Lee" <jam...@ho...> wrote on 11/14/2005 06:04:26 PM: > > > Hi there, > >Hi, James. > > > I'm setting up a RAID5 fileserver on a LAN, and using EVMS to manage the > > RAID array. > > > > Due to having to shift data around from a current array to a new one, >I'll > > be needing to (temporarily) create a RAID5 array in a degraded state. In > > other words create an 3-drive RAID5 array, but with one drive missing >(i.e. > > using only 2 drives). > >EVMS does not currently support building a degraded array. One of the >design principles behind EVMS is that it should only build "safe" >configurations. A degraded array is subject to a single disk failure, >which defeats the purpose of building the RAID5. > > > Now this is possible with mdadm by passing in one of the drives in the >array > > as "missing" rather than the actual drive name. In EVMS though, there > > doesn't seem to be an option to do this and it's not possible to create >a > > > RAID5 array with fewer than three drives. > >Right. And our current recommendation for building a degraded array is to >use mdadm, which you did. > > > Now I tried to just use mdadm to create the RAID5 array and then use >EVMS > > > afterwards. This caused EVMS to start failing with the following error: > > > > *** glibc detected *** corrupted double-linked list: 0x08065a30 > > > > This rendered the system unbootable until I wiped the offending array. > >Yikes. That is not supposed to happen. Can you run with the debug level >set to debug (e.g., "evmsgui -d debug") and then send me the gzipped >/ver/log/evms-engine.log. Hopefully it will have some clues. > > > So is there any way to create degraded arrays from within EVMS? > >At the moment, no. However, we have been getting several requests for this >functionality. Due to customer demand we will revisit our "safe" design. >I think allowing the creation of a degraded array with a warning message >should be sufficient. > >Steve D. > |
From: James L. <jam...@ho...> - 2005-11-18 23:52:43
Attachments:
evms-engine2.log.tgz
|
Well I've tried doing this again, this time using the standard Linux partition types. I started by wiping all the partitions on the drive (using cfdisk). I then created the new partitions with cfdisk, and loaded up EVMS to make sure it was OK (I hadn't tried to create any RAID volumes yet). EVMS comes up with some warnings about not being able to assemble some RAID arrys. These arrays should not longer exists; I don't know where EVMS is picking them up from? Do I need to do something more than just wiping partitions to get rid of RAID information on a drive? Or does EVMS store configuration in a config file rather than redetecting it each time? Anyway, EVMS then gets very confused about these RAID arrays, and on exiting I get the following error (and the process hangs): *** glibc detected *** corrupted double-linked list: 0x081a88a0 *** I've attached the debug information for this. Hopefully it might prove helpful. I have to say I'm somewhat concerned about these errors.... how stable is the RAID5 code in EVMS considered compared to, say, mdadm? Don't get me wrong though, I'm very impressed with EVMS overall :) I might have a go at upgrading to v2.5.3... I'm putting a request for it in Ubuntu Breezy backports too... James >From: "James Lee" <jam...@ho...> >To: st...@us... >CC: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array with >EVMS? >Date: Thu, 17 Nov 2005 23:57:05 +0000 > >Well it looks like it's the fact that I'd set the partitions to raid >autodetect rather than the standard Linux patition type that's started >causing problems then... > >Also, as someone's pointed out in this thread, the version of EVMS (2.5.2) >which is shipped with Ubuntu is out of date. A quick look at CVS shows >that the raid5_mgr.c file has changed, with a fix to memory corruption when >running a degraded RAID5 array. So I guess it's possible this is in fact >now fixed. If not, then you'd hope that this would be dealt with a little >more gracefully by EVMS... I'll see if I can run with those extra diags if >I have time (am a bit swamped at work, sorting out this sort of thing!). > >It might be a good idea to document more clearly the fact that EVMS doesn't >like linux raid autodetect arrays: I certainly didn't spot it, and doing a >text search in the user guide for "autodetect", or for "0xfd" doesn't come >up with anything... > >Anyway thanks for the help, and I'll let you know how I get on. > >James > > >>From: Steve Dobbelstein <st...@us...> >>To: "James Lee" <jam...@ho...> >>CC: evm...@li... >>Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array with >>EVMS? >>Date: Thu, 17 Nov 2005 10:16:06 -0600 >> >>"James Lee" <jam...@ho...> wrote on 11/16/2005 06:08:14 PM: >> >> > Thanks for the reply Steve. >> > >> > I've retried setting up a degraded RAID5 array as described below, and >>have >> > hit a different error in EVMS ("*** glibc detected *** free(): invalid >>next >> > size (fast): 0x0819d720 ***"). Procedure was: >> > >> > >> > 1) Delete all md arrays and all segments from /dev/sdb (using EVMS). >> > Worked fine. >> >>Yea. >> >> > 2) Close and reopen EVMS to check all is still well. It is. Close it >>down >> > again. >> >>Good. >> >> > 3) Using cfdisk, create two logical (not primary) partitions on >>/dev/sdb >> >> > (which had just free space), 100MB each (these are /dev/sdb5 and >>/dev/sdb6). >> > Mark them as type 0xfd (Linux RAID autodetect). Save changes and >>again >> >> > reload and close EVMS to check it's still happy; which it is. >> >>Don't set the partition type to RAID autodetect. Use the normal Linux >>partition type. The kernel MD autodetect conflicts with EVMS's discovery >>and activation. Somewhere in the documentation (I can't find it at the >>moment) it should say that you should disable RAID autodetect if you are >>going to use EVMS to manage your software-RAID arrays. >> >> > 4) Use mdadm to create a degraded RAID5 array using the following >>command: >> > "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 >> > missing" >> >>Looks good. >> >> > 5) mdadm report that this array gets created successfully (in a >>degraded >> >> > state): >> > james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 >> > /dev/md0: >> > Version : 00.90.01 >> > Creation Time : Wed Nov 16 23:05:41 2005 >> > Raid Level : raid5 >> > Array Size : 192512 (188.00 MiB 197.13 MB) >> > Device Size : 96256 (94.00 MiB 98.57 MB) >> > Raid Devices : 3 >> > Total Devices : 2 >> > Preferred Minor : 0 >> > Persistence : Superblock is persistent >> > >> > Update Time : Wed Nov 16 23:05:41 2005 >> > State : clean, degraded >> > Active Devices : 2 >> > Working Devices : 2 >> > Failed Devices : 0 >> > Spare Devices : 0 >> > >> > Layout : left-symmetric >> > Chunk Size : 64K >> > >> > UUID : 6f0fa83a:ae0b4a99:a199ce36:9e21815c >> > Events : 0.4 >> > >> > Number Major Minor RaidDevice State >> > 0 8 21 0 active sync >>/dev/.static/dev/sdb5 >> > 1 8 22 1 active sync >>/dev/.static/dev/sdb6 >> > 2 0 0 - removed >> > >> > 6) Start EVMS GUI. Gives me a warning about /md/md0 being in a >>degraded >> >> > state (which is fine). It doesn't give me the option to create an EVMS >> > volume from the /md/md0 array (which is not good). It sees the array, >>and >> > reports it as both degraded and corrupted (it shouldn't be, right?). >> >>Correct. EVMS should discover the array is degraded, but it should not be >>corrupt. >> >> > 7) Close down EVMS and restart it. Again, I get a warning about the >>array >> > being degraded. Now I do have the option to create a volume from the >>array. >> > On trying to do this however, I get the following error message: >> > *** glibc detected *** free(): invalid next size (fast): 0x0819d720 *** >> > and EVMS hangs completely. >> > >> > I've attached a gzipped copy of the evms-engine.log file. >> > >> > Hope this helps... >> >>It does. The last thing in the log is a call to raid5_free_private_data() >>in the MD plug-in. That function has a bunch of calls to free memory. I >>suspect one of those is going bad. It is not apparent from the code which >>one is bad. In fact, for each call to free memory, the code checks for a >>NULL pointer first, frees the non-NULL pointer, and then sets the pointer >>to NULL. Looks safe to me. My guess is there is some code elsewhere that >>has already freed some memory but has not set the pointer to NULL, >>resulting in a double free. >> >>EVMS has some memory debugging code, but it has to be built in. Can you >>do >>me the favor of another run? Rebuild EVMS with the debug support. >>./configure --with-debug <your-other-options> >>make clean install >> >>Then run evmsgui with the debug level set to "everything", e.g., "evmsgui >>-d everything". That will generate a rather large log file since it will >>log *everything*, which includes all calls to allocate and free memory. >>I'm hoping the memory allocation debugging code will catch a double free >>and that the log will contain the trace information to find out who did >>it. >> >> > What is the relationship between the EVMS MD plugin and mdadm by the >>way? >> >> > Do they use the same code, or are they entirely different programs that >> > interface in their own way with the Linux RAID driver? Should arrays >> > created by mdadm and array created by EVMS be identical once they have >>been >> > created? >> >>EVMS and mdadm are entirely different programs that interface in their own >>way with the Linux RAID driver. Naturally, they share common knowledge >>about the on-disk data structures and the behavior of the MD code in the >>kernel. Arrays created by mdadm should be understood by EVMS and vice >>versa. >> >>Steve D. >> >> >> >>------------------------------------------------------- >>This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >>Register for a JBoss Training Course. Free Certification Exam >>for All Training Attendees Through End of 2005. For more info visit: >>http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >>_______________________________________________ >>Evms-devel mailing list >>Evm...@li... >>To subscribe/unsubscribe, please visit: >>https://lists.sourceforge.net/lists/listinfo/evms-devel > > > > >------------------------------------------------------- >This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >Register for a JBoss Training Course. Free Certification Exam >for All Training Attendees Through End of 2005. For more info visit: >http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >_______________________________________________ >Evms-devel mailing list >Evm...@li... >To subscribe/unsubscribe, please visit: >https://lists.sourceforge.net/lists/listinfo/evms-devel |
From: Steve D. <st...@us...> - 2005-11-21 17:58:52
|
"James Lee" <jam...@ho...> wrote on 11/18/2005 05:52:35 PM: Hi, James. > Well I've tried doing this again, this time using the standard Linux > partition types. I started by wiping all the partitions on the drive (using > cfdisk). I then created the new partitions with cfdisk, and loaded up EVMS > to make sure it was OK (I hadn't tried to create any RAID volumes yet). > > EVMS comes up with some warnings about not being able to assemble some RAID > arrys. These arrays should not longer exists; I don't know where EVMS is > picking them up from? Do I need to do something more than just wiping > partitions to get rid of RAID information on a drive? Or does EVMS store > configuration in a config file rather than redetecting it each time? The metadata for MD devices is stored near the end of the device. (Round the end of the device down to a 64KB boundary, then put the metadata at the start of the 64KB block just before the previously calculated 64KB boundary.) My guess is that you recreate the partitions to be the same size. EVMS then found the leftover MD metadata at the end of the partitions. > Anyway, EVMS then gets very confused about these RAID arrays, and on exiting > I get the following error (and the process hangs): > > *** glibc detected *** corrupted double-linked list: 0x081a88a0 *** > > I've attached the debug information for this. Hopefully it might prove > helpful. Judging from the last lines in the log, it looks like it got tripped up on a free(). (Different place in the code that the previous log you sent, but on a free() nonetheless) However, the log is missing the debug information for the memory allocations. Did you rebuild EVMS after running "./configure --with-debug <your-other-options>"? The debug version of the memory allocation code should catch a double free. Did you run with the debug level set to "everything" (e.g., "evmsgui -d everything")? It would be helpful to have the history of memory allocations and frees in the log. > I have to say I'm somewhat concerned about these errors.... how > stable is the RAID5 code in EVMS considered compared to, say, mdadm? Don't > get me wrong though, I'm very impressed with EVMS overall :) The EVMS MD plug-in went through a significant overhaul in order to add support for the version 1 MD superblock. Naturally, some stability was lost in the rework. The bugs are being found and fixed. Steve D. |
From: James L. <jam...@ho...> - 2005-11-22 23:58:03
|
Just to follow this up by the way, I put up a request to have EVMS 2.5.3 backported to the latest stable Ubuntu (Breezy Badger), and it's been accepted. It's now available in the Breezy backports repository. I'll upgrade and let you know if these problems are fixed... James >From: Steve Dobbelstein <st...@us...> >To: "James Lee" <jam...@ho...> >CC: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array with >EVMS? >Date: Mon, 21 Nov 2005 11:58:03 -0600 > >"James Lee" <jam...@ho...> wrote on 11/18/2005 05:52:35 PM: > >Hi, James. > > > Well I've tried doing this again, this time using the standard Linux > > partition types. I started by wiping all the partitions on the drive >(using > > cfdisk). I then created the new partitions with cfdisk, and loaded up >EVMS > > to make sure it was OK (I hadn't tried to create any RAID volumes yet). > > > > EVMS comes up with some warnings about not being able to assemble some >RAID > > arrys. These arrays should not longer exists; I don't know where EVMS >is > > > picking them up from? Do I need to do something more than just wiping > > partitions to get rid of RAID information on a drive? Or does EVMS >store > > > configuration in a config file rather than redetecting it each time? > >The metadata for MD devices is stored near the end of the device. (Round >the end of the device down to a 64KB boundary, then put the metadata at the >start of the 64KB block just before the previously calculated 64KB >boundary.) My guess is that you recreate the partitions to be the same >size. EVMS then found the leftover MD metadata at the end of the >partitions. > > > Anyway, EVMS then gets very confused about these RAID arrays, and on >exiting > > I get the following error (and the process hangs): > > > > *** glibc detected *** corrupted double-linked list: 0x081a88a0 *** > > > > I've attached the debug information for this. Hopefully it might prove > > helpful. > >Judging from the last lines in the log, it looks like it got tripped up on >a free(). (Different place in the code that the previous log you sent, but >on a free() nonetheless) However, the log is missing the debug information >for the memory allocations. Did you rebuild EVMS after running >"./configure --with-debug <your-other-options>"? The debug version of the >memory allocation code should catch a double free. Did you run with the >debug level set to "everything" (e.g., "evmsgui -d everything")? It would >be helpful to have the history of memory allocations and frees in the log. > > > I have to say I'm somewhat concerned about these errors.... how > > stable is the RAID5 code in EVMS considered compared to, say, mdadm? >Don't > > get me wrong though, I'm very impressed with EVMS overall :) > >The EVMS MD plug-in went through a significant overhaul in order to add >support for the version 1 MD superblock. Naturally, some stability was >lost in the rework. The bugs are being found and fixed. > >Steve D. > |
From: Bernd Z. <be...@bz...> - 2005-11-23 00:25:16
|
Hi, > Just to follow this up by the way, I put up a request to have EVMS 2.5.3 > backported to the latest stable Ubuntu (Breezy Badger), and it's been > accepted. It's now available in the Breezy backports repository. > > I'll upgrade and let you know if these problems are fixed... I'll upload a backport for Debian stable at the weekend. Building the testing/unstable version under sarge works fine. Bernd |
From: Steinar H. G. <sgu...@bi...> - 2005-11-23 00:39:34
|
On Wed, Nov 23, 2005 at 01:25:08AM +0100, Bernd Zeimetz wrote: > I'll upload a backport for Debian stable at the weekend. > Building the testing/unstable version under sarge works fine. I'm in the process of fixing this in stable itself, but due to various issues, it'll have to wait until the mips autobuilders get fixed. See http://bugs.debian.org/339891 for all the details. /* Steinar */ -- Homepage: http://www.sesse.net/ |
From: Andrew S. <an...@ne...> - 2005-11-23 08:21:55
|
Bernd Zeimetz said: > I'll upload a backport for Debian stable at the weekend. > Building the testing/unstable version under sarge works fine. Where will that backport package be uploaded to? Thanks, Andrew S. -- Andrew Shugg <an...@ne...> http://www.neep.com.au/ "Just remember, Mr Fawlty, there's always someone worse off than yourself." "Is there? Well I'd like to meet him. I could do with a good laugh." |
From: Bernd Z. <be...@bz...> - 2005-11-27 14:24:42
|
Hi, > Where will that backport package be uploaded to? the backport of evms-2.5.3-7 (from unstable) is in my brand new apt repository, together with a build of multipath-tools 0.4.6 for sarge. Apt source lines are here: http://debian.buzzzed.de/ If you don't want to use apt you can fetch them directly from the pool directory: http://debian.buzzzed.de/pool/main/ I'll keep updates posted on http://blog.buzzzed.de/category/linux/debian/packages/ I did _not_ test them yet, that'll be done tomorrow - I don't have a SAN at home. Best regards, Bernd |
From: James L. <jam...@ho...> - 2005-11-26 20:43:39
Attachments:
logs.tgz
|
Hi there, I'm having some more trouble with getting EVMS and mdadm to play nicely together (even after upgrading EVMS)... The steps I'm taking are: 1. Starting with an empty drive (wiped it by zero-filling the start and end of drive to make sure there's no residual partition table information). Create two logical partitions (/dev/sdb5 and /dev/sdb6). 2. Use mdadm to create a degraded "3-drive" RAID5 array called /dev/md0: "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 missing". 3. Start EVMS. It correctly detects the degraded RAID5 array on /dev/md0. 4. In EVMS, create 4 partitions on /dev/sdb, each one just over half the size of the partitions in the RAID5 array. Create two 2-drive RAID0 stripes, from these partitions. 5. Add one of these RAID0 arrays to the RAID5 array. Wait for it to resync. RAID5 array is now active and non-degraded. 6. Create an EXT3 filesystem and bung some files on it. 7. Everything working fine so far. Now expand the RAID5 array with the other RAID0 array. This seems to work fine. No data lost on the partition and no errors. 8. Reboot. When next starting EVMS, I get the following errors: MDRaid5RegMgr: RAID5 array md/md0 is missing the member with RAID index 2. The array is running in degrade mode. MDRaid5RegMgr: Region md/md0 is currently in degraded mode. To bring it back to normal state, add 1 new spare device to replace the faulty or missing device. Engine: Error code 5 (Input/output error) when reading the primary copy of feature on object md/md0 Engine: Error code 5 (Input/output error) when reading the secondary copy of feature on object md/md0 This is despite the fact that both the RAID0 array (as well as all the other hard drive partitions) are reported as OK by EVMS. The EXT3 partition on the drive is no longer recognized. Strangely, mdadm seems to be getting confused as well. When examining the md arrays, it's reporting /dev/md0 and /dev/md1 as RAID0 arrays, and /dev/md2 as the RAID5 array (which is the wrong way round). I've attached the evms engine log (no memory tracing though, sorry), and the output showing what mdadm thinks is going on (which seems odd). I'm running the latest EVMS 2.5.3-7 on Ubuntu Breezy by the way. Any ideas on this one? Just when I thought I'd finished seeing problems with EVMS and mdadm, this comes along... doh! Cheers, James |
From: Mike T. <mh...@us...> - 2005-11-28 22:59:01
|
On Sat, 2005-11-26 at 14:42, James Lee wrote: > Hi there, > > I'm having some more trouble with getting EVMS and mdadm to play nicely > together (even after upgrading EVMS)... > > The steps I'm taking are: > > 1. Starting with an empty drive (wiped it by zero-filling the start and end > of drive to make sure there's no residual partition table information). > Create two logical partitions (/dev/sdb5 and /dev/sdb6). > > 2. Use mdadm to create a degraded "3-drive" RAID5 array called /dev/md0: > "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 > missing". > > 3. Start EVMS. It correctly detects the degraded RAID5 array on /dev/md0. > > 4. In EVMS, create 4 partitions on /dev/sdb, each one just over half the > size of the partitions in the RAID5 array. Create two 2-drive RAID0 > stripes, from these partitions. > > 5. Add one of these RAID0 arrays to the RAID5 array. Wait for it to resync. > RAID5 array is now active and non-degraded. > > 6. Create an EXT3 filesystem and bung some files on it. > > 7. Everything working fine so far. Now expand the RAID5 array with the > other RAID0 array. This seems to work fine. No data lost on the partition > and no errors. > > 8. Reboot. When next starting EVMS, I get the following errors: > In theory, this should work!!! I will try your scenario to find out what went wrong. -- Mike T. |
From: James L. <jam...@ho...> - 2005-11-29 00:40:43
|
I should have mentioned: the version of mdadm I was using is the default version which ships with Ubuntu: 1.11.0. And I was using version 0.9 superblocks (both for the mdadm arrays, and for the arrays created by EVMS). Is there an advantage to using a version 1.0 superblock rather than the older 0.9 versions? I'll give this another go after wiping the hard drive completely.... I'm really hoping to get this sorted out soon, as I'm completely out of space; despite having some 320GB drives sitting spare.... By the way, I've put up a feature request for EVMS to allow creating degraded RAID5 arrays (which would get me past all these problems), which would really be nice. I might have a look and see if it seems hard to do (if it's a simple change I might just make it and recompile...). >From: Mike Tran <mh...@us...> >To: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array >withEVMS? >Date: Mon, 28 Nov 2005 16:58:39 -0600 > >On Sat, 2005-11-26 at 14:42, James Lee wrote: > > Hi there, > > > > I'm having some more trouble with getting EVMS and mdadm to play nicely > > together (even after upgrading EVMS)... > > > > The steps I'm taking are: > > > > 1. Starting with an empty drive (wiped it by zero-filling the start and >end > > of drive to make sure there's no residual partition table information). > > Create two logical partitions (/dev/sdb5 and /dev/sdb6). > > > > 2. Use mdadm to create a degraded "3-drive" RAID5 array called /dev/md0: > > "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 > > missing". > > > > 3. Start EVMS. It correctly detects the degraded RAID5 array on >/dev/md0. > > > > 4. In EVMS, create 4 partitions on /dev/sdb, each one just over half the > > size of the partitions in the RAID5 array. Create two 2-drive RAID0 > > stripes, from these partitions. > > > > 5. Add one of these RAID0 arrays to the RAID5 array. Wait for it to >resync. > > RAID5 array is now active and non-degraded. > > > > 6. Create an EXT3 filesystem and bung some files on it. > > > > 7. Everything working fine so far. Now expand the RAID5 array with the > > other RAID0 array. This seems to work fine. No data lost on the >partition > > and no errors. > > > > 8. Reboot. When next starting EVMS, I get the following errors: > > > >In theory, this should work!!! I will try your scenario to find out what >went wrong. > >-- >Mike T. > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log >files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Evms-devel mailing list >Evm...@li... >To subscribe/unsubscribe, please visit: >https://lists.sourceforge.net/lists/listinfo/evms-devel |
From: Mike T. <mh...@us...> - 2005-11-29 23:29:23
|
James Lee wrote: > I should have mentioned: the version of mdadm I was using is the > default version which ships with Ubuntu: 1.11.0. > > And I was using version 0.9 superblocks (both for the mdadm arrays, > and for the arrays created by EVMS). Is there an advantage to using a > version 1.0 superblock rather than the older 0.9 versions? > At the very least, I can think of 2: - version 1 superblock is endian neutral - With version 0.9, md device has a maximum of 27 disks/partitions. With a 1024-byte version 1 superblock, the limit is 384 -- Regards, Mike T. |
From: Mike T. <mh...@us...> - 2005-11-30 00:50:51
|
Mike Tran wrote: >On Sat, 2005-11-26 at 14:42, James Lee wrote: > > >>Hi there, >> >>I'm having some more trouble with getting EVMS and mdadm to play nicely >>together (even after upgrading EVMS)... >> >>The steps I'm taking are: >> >>1. Starting with an empty drive (wiped it by zero-filling the start and end >>of drive to make sure there's no residual partition table information). >>Create two logical partitions (/dev/sdb5 and /dev/sdb6). >> >>2. Use mdadm to create a degraded "3-drive" RAID5 array called /dev/md0: >>"mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 >>missing". >> >>3. Start EVMS. It correctly detects the degraded RAID5 array on /dev/md0. >> >>4. In EVMS, create 4 partitions on /dev/sdb, each one just over half the >>size of the partitions in the RAID5 array. Create two 2-drive RAID0 >>stripes, from these partitions. >> >>5. Add one of these RAID0 arrays to the RAID5 array. Wait for it to resync. >> RAID5 array is now active and non-degraded. >> >>6. Create an EXT3 filesystem and bung some files on it. >> >>7. Everything working fine so far. Now expand the RAID5 array with the >>other RAID0 array. This seems to work fine. No data lost on the partition >>and no errors. >> >>8. Reboot. When next starting EVMS, I get the following errors: >> >> >> > >In theory, this should work!!! I will try your scenario to find out what >went wrong. > > > I could not reproduce this problem using evms 2.5.3. Did you wait for the raid5 expand to complete before rebooting the machine? -- Mike T. |
From: James L. <jam...@ho...> - 2005-11-30 01:06:34
|
>From: Mike Tran <mh...@us...> >To: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array >with EVMS? >Date: Tue, 29 Nov 2005 18:50:34 -0600 > >Mike Tran wrote: > >>On Sat, 2005-11-26 at 14:42, James Lee wrote: >> >> >>>Hi there, >>> >>>I'm having some more trouble with getting EVMS and mdadm to play nicely >>>together (even after upgrading EVMS)... >>> >>>The steps I'm taking are: >>> >>>1. Starting with an empty drive (wiped it by zero-filling the start and >>>end of drive to make sure there's no residual partition table >>>information). Create two logical partitions (/dev/sdb5 and /dev/sdb6). >>> >>>2. Use mdadm to create a degraded "3-drive" RAID5 array called /dev/md0: >>>"mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 >>>missing". >>> >>>3. Start EVMS. It correctly detects the degraded RAID5 array on >>>/dev/md0. >>> >>>4. In EVMS, create 4 partitions on /dev/sdb, each one just over half the >>>size of the partitions in the RAID5 array. Create two 2-drive RAID0 >>>stripes, from these partitions. >>> >>>5. Add one of these RAID0 arrays to the RAID5 array. Wait for it to >>>resync. RAID5 array is now active and non-degraded. >>> >>>6. Create an EXT3 filesystem and bung some files on it. >>> >>>7. Everything working fine so far. Now expand the RAID5 array with the >>>other RAID0 array. This seems to work fine. No data lost on the >>>partition and no errors. >>> >>>8. Reboot. When next starting EVMS, I get the following errors: >>> >>> >>> >> >>In theory, this should work!!! I will try your scenario to find out what >>went wrong. >> >> >> >I could not reproduce this problem using evms 2.5.3. Did you wait for the >raid5 expand to complete before rebooting the machine? > >-- >Mike T. > Thanks for looking into this Mike. Yes, the RAID5 expand had completed successfully (and the machine was idle for a few hours before rebooting, with the array working fine). Is it possible that these problems are caused by having some residual superblock left over from a previous array? To save time, I wiped the drive by doing a wipe of the first and last million sectors of the drive (rather than zeroing the entire 320GB, which takes several hours). Maybe I should try with a completely clean drive. The version of mdadm I'm using doesn't support version 1 superblocks AFAIK, which is why I've had to use the older version 0.9 superblocks. I can't see myself having more than 27 devices in this array, or moving it over to a byte-swapped (i.e. Sparc?) machine, so I should be OK. Presumably support for the older superblock will continue into the future? |
From: Mike T. <mh...@us...> - 2005-11-30 16:51:54
|
James Lee wrote: >> From: Mike Tran <mh...@us...> >> To: evm...@li... >> Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array >> with EVMS? >> Date: Tue, 29 Nov 2005 18:50:34 -0600 >> >> Mike Tran wrote: >> >>> On Sat, 2005-11-26 at 14:42, James Lee wrote: >>> >>> >>>> Hi there, >>>> >>>> I'm having some more trouble with getting EVMS and mdadm to play >>>> nicely together (even after upgrading EVMS)... >>>> >>>> The steps I'm taking are: >>>> >>>> 1. Starting with an empty drive (wiped it by zero-filling the start >>>> and end of drive to make sure there's no residual partition table >>>> information). Create two logical partitions (/dev/sdb5 and >>>> /dev/sdb6). >>>> >>>> 2. Use mdadm to create a degraded "3-drive" RAID5 array called >>>> /dev/md0: "mdadm --create /dev/md0 --level=5 --raid-devices=3 >>>> /dev/sdb5 /dev/sdb6 missing". >>>> >>>> 3. Start EVMS. It correctly detects the degraded RAID5 array on >>>> /dev/md0. >>>> >>>> 4. In EVMS, create 4 partitions on /dev/sdb, each one just over >>>> half the size of the partitions in the RAID5 array. Create two >>>> 2-drive RAID0 stripes, from these partitions. >>>> >>>> 5. Add one of these RAID0 arrays to the RAID5 array. Wait for it >>>> to resync. RAID5 array is now active and non-degraded. >>>> >>>> 6. Create an EXT3 filesystem and bung some files on it. >>>> >>>> 7. Everything working fine so far. Now expand the RAID5 array with >>>> the other RAID0 array. This seems to work fine. No data lost on >>>> the partition and no errors. >>>> >>>> 8. Reboot. When next starting EVMS, I get the following errors: >>>> >>>> >>>> >>> >>> In theory, this should work!!! I will try your scenario to find out >>> what >>> went wrong. >>> >>> >>> >> I could not reproduce this problem using evms 2.5.3. Did you wait >> for the raid5 expand to complete before rebooting the machine? >> >> -- >> Mike T. >> > > Thanks for looking into this Mike. Yes, the RAID5 expand had > completed successfully (and the machine was idle for a few hours > before rebooting, with the array working fine). > > Is it possible that these problems are caused by having some residual > superblock left over from a previous array? To save time, I wiped the > drive by doing a wipe of the first and last million sectors of the > drive (rather than zeroing the entire 320GB, which takes several > hours). Maybe I should try with a completely clean drive. I don't think zeroing the entire disk will make any difference. What you did was 100% valid scenario. You've seen the problem and I want to fix it. If you can reproduce the problem, please let me know. > > The version of mdadm I'm using doesn't support version 1 superblocks > AFAIK, which is why I've had to use the older version 0.9 > superblocks. I can't see myself having more than 27 devices in this > array, or moving it over to a byte-swapped (i.e. Sparc?) machine, so I > should be OK. Presumably support for the older superblock will > continue into the future? Not many people use 1.0 superblock format. The kernel md driver code to support the old superblock is relatively small. Moreover, data is important. Therefore, I believe that the old superblock will be supported for a very long time :) -- Thanks, Mike T. |
From: Steinar H. G. <sgu...@bi...> - 2005-11-17 11:19:22
|
On Thu, Nov 17, 2005 at 12:08:14AM +0000, James Lee wrote: > james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 Note that Ubuntu is using EVMS 2.5.2; I've nagged them about it for a while, and they synced the 2.5.3 packages from Debian (which contain at least two important RAID-5-related fixes over 2.5.2) yesterday into Dapper, but Breezy is still AFAIK on the old version. /* Steinar */ -- Homepage: http://www.sesse.net/ |
From: Matt Z. <md...@ub...> - 2005-11-17 16:17:09
|
On Thu, Nov 17, 2005 at 12:18:17PM +0100, Steinar H. Gunderson wrote: > On Thu, Nov 17, 2005 at 12:08:14AM +0000, James Lee wrote: > > james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 > > Note that Ubuntu is using EVMS 2.5.2; I've nagged them about it for a while, > and they synced the 2.5.3 packages from Debian (which contain at least two > important RAID-5-related fixes over 2.5.2) yesterday into Dapper, but Breezy > is still AFAIK on the old version. Like many distributions, Ubuntu doesn't upgrade to new upstream releases of its components in a stable release. Generally, it is more important to users that they have stability than newer software. For users who have a specific need for a newer version, we offer a backports repository. If you feel that EVMS 2.5.3 would be appropriate for this, contact the backports team. -- - mdz |
From: Steinar H. G. <sgu...@bi...> - 2005-11-18 01:34:27
|
On Thu, Nov 17, 2005 at 08:16:59AM -0800, Matt Zimmerman wrote: > Like many distributions, Ubuntu doesn't upgrade to new upstream releases of > its components in a stable release. Generally, it is more important to > users that they have stability than newer software. For users who have a > specific need for a newer version, we offer a backports repository. If you > feel that EVMS 2.5.3 would be appropriate for this, contact the backports > team. I definitely feel Ubuntu should have these patches, as the bugs can badly corrupt a RAID-5 volume under specific circumstances (I've had it happen to me; EVMS eating my production RAID-5 was part of how the bug was discovered in the first place :-) ). What's the e-mail address for the backport team? I should probably make an upload to Debian stable containing just the RAID-5 bugfix as well; sarge has 2.5.2, with the exact same bug. /* Steinar */ -- Homepage: http://www.sesse.net/ |
From: Matt Z. <md...@de...> - 2005-11-18 01:50:37
|
On Fri, Nov 18, 2005 at 02:33:35AM +0100, Steinar H. Gunderson wrote: > On Thu, Nov 17, 2005 at 08:16:59AM -0800, Matt Zimmerman wrote: > > Like many distributions, Ubuntu doesn't upgrade to new upstream releases of > > its components in a stable release. Generally, it is more important to > > users that they have stability than newer software. For users who have a > > specific need for a newer version, we offer a backports repository. If you > > feel that EVMS 2.5.3 would be appropriate for this, contact the backports > > team. > > I definitely feel Ubuntu should have these patches, as the bugs can badly > corrupt a RAID-5 volume under specific circumstances (I've had it happen to > me; EVMS eating my production RAID-5 was part of how the bug was discovered > in the first place :-) ). What's the e-mail address for the backport team? http://lists.ubuntu.com/mailman/listinfo/ubuntu-backports -- - mdz |
From: Steve D. <st...@us...> - 2005-11-17 16:16:54
|
"James Lee" <jam...@ho...> wrote on 11/16/2005 06:08:14 PM: > Thanks for the reply Steve. > > I've retried setting up a degraded RAID5 array as described below, and have > hit a different error in EVMS ("*** glibc detected *** free(): invalid next > size (fast): 0x0819d720 ***"). Procedure was: > > > 1) Delete all md arrays and all segments from /dev/sdb (using EVMS). > Worked fine. Yea. > 2) Close and reopen EVMS to check all is still well. It is. Close it down > again. Good. > 3) Using cfdisk, create two logical (not primary) partitions on /dev/sdb > (which had just free space), 100MB each (these are /dev/sdb5 and /dev/sdb6). > Mark them as type 0xfd (Linux RAID autodetect). Save changes and again > reload and close EVMS to check it's still happy; which it is. Don't set the partition type to RAID autodetect. Use the normal Linux partition type. The kernel MD autodetect conflicts with EVMS's discovery and activation. Somewhere in the documentation (I can't find it at the moment) it should say that you should disable RAID autodetect if you are going to use EVMS to manage your software-RAID arrays. > 4) Use mdadm to create a degraded RAID5 array using the following command: > "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 > missing" Looks good. > 5) mdadm report that this array gets created successfully (in a degraded > state): > james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 > /dev/md0: > Version : 00.90.01 > Creation Time : Wed Nov 16 23:05:41 2005 > Raid Level : raid5 > Array Size : 192512 (188.00 MiB 197.13 MB) > Device Size : 96256 (94.00 MiB 98.57 MB) > Raid Devices : 3 > Total Devices : 2 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Wed Nov 16 23:05:41 2005 > State : clean, degraded > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 6f0fa83a:ae0b4a99:a199ce36:9e21815c > Events : 0.4 > > Number Major Minor RaidDevice State > 0 8 21 0 active sync /dev/.static/dev/sdb5 > 1 8 22 1 active sync /dev/.static/dev/sdb6 > 2 0 0 - removed > > 6) Start EVMS GUI. Gives me a warning about /md/md0 being in a degraded > state (which is fine). It doesn't give me the option to create an EVMS > volume from the /md/md0 array (which is not good). It sees the array, and > reports it as both degraded and corrupted (it shouldn't be, right?). Correct. EVMS should discover the array is degraded, but it should not be corrupt. > 7) Close down EVMS and restart it. Again, I get a warning about the array > being degraded. Now I do have the option to create a volume from the array. > On trying to do this however, I get the following error message: > *** glibc detected *** free(): invalid next size (fast): 0x0819d720 *** > and EVMS hangs completely. > > I've attached a gzipped copy of the evms-engine.log file. > > Hope this helps... It does. The last thing in the log is a call to raid5_free_private_data() in the MD plug-in. That function has a bunch of calls to free memory. I suspect one of those is going bad. It is not apparent from the code which one is bad. In fact, for each call to free memory, the code checks for a NULL pointer first, frees the non-NULL pointer, and then sets the pointer to NULL. Looks safe to me. My guess is there is some code elsewhere that has already freed some memory but has not set the pointer to NULL, resulting in a double free. EVMS has some memory debugging code, but it has to be built in. Can you do me the favor of another run? Rebuild EVMS with the debug support. ./configure --with-debug <your-other-options> make clean install Then run evmsgui with the debug level set to "everything", e.g., "evmsgui -d everything". That will generate a rather large log file since it will log *everything*, which includes all calls to allocate and free memory. I'm hoping the memory allocation debugging code will catch a double free and that the log will contain the trace information to find out who did it. > What is the relationship between the EVMS MD plugin and mdadm by the way? > Do they use the same code, or are they entirely different programs that > interface in their own way with the Linux RAID driver? Should arrays > created by mdadm and array created by EVMS be identical once they have been > created? EVMS and mdadm are entirely different programs that interface in their own way with the Linux RAID driver. Naturally, they share common knowledge about the on-disk data structures and the behavior of the MD code in the kernel. Arrays created by mdadm should be understood by EVMS and vice versa. Steve D. |
From: James L. <jam...@ho...> - 2005-11-17 23:57:16
|
Well it looks like it's the fact that I'd set the partitions to raid autodetect rather than the standard Linux patition type that's started causing problems then... Also, as someone's pointed out in this thread, the version of EVMS (2.5.2) which is shipped with Ubuntu is out of date. A quick look at CVS shows that the raid5_mgr.c file has changed, with a fix to memory corruption when running a degraded RAID5 array. So I guess it's possible this is in fact now fixed. If not, then you'd hope that this would be dealt with a little more gracefully by EVMS... I'll see if I can run with those extra diags if I have time (am a bit swamped at work, sorting out this sort of thing!). It might be a good idea to document more clearly the fact that EVMS doesn't like linux raid autodetect arrays: I certainly didn't spot it, and doing a text search in the user guide for "autodetect", or for "0xfd" doesn't come up with anything... Anyway thanks for the help, and I'll let you know how I get on. James >From: Steve Dobbelstein <st...@us...> >To: "James Lee" <jam...@ho...> >CC: evm...@li... >Subject: Re: [Evms-devel] Possible to create a degraded RAID5 array with >EVMS? >Date: Thu, 17 Nov 2005 10:16:06 -0600 > >"James Lee" <jam...@ho...> wrote on 11/16/2005 06:08:14 PM: > > > Thanks for the reply Steve. > > > > I've retried setting up a degraded RAID5 array as described below, and >have > > hit a different error in EVMS ("*** glibc detected *** free(): invalid >next > > size (fast): 0x0819d720 ***"). Procedure was: > > > > > > 1) Delete all md arrays and all segments from /dev/sdb (using EVMS). > > Worked fine. > >Yea. > > > 2) Close and reopen EVMS to check all is still well. It is. Close it >down > > again. > >Good. > > > 3) Using cfdisk, create two logical (not primary) partitions on >/dev/sdb > > > (which had just free space), 100MB each (these are /dev/sdb5 and >/dev/sdb6). > > Mark them as type 0xfd (Linux RAID autodetect). Save changes and >again > > > reload and close EVMS to check it's still happy; which it is. > >Don't set the partition type to RAID autodetect. Use the normal Linux >partition type. The kernel MD autodetect conflicts with EVMS's discovery >and activation. Somewhere in the documentation (I can't find it at the >moment) it should say that you should disable RAID autodetect if you are >going to use EVMS to manage your software-RAID arrays. > > > 4) Use mdadm to create a degraded RAID5 array using the following >command: > > "mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb5 /dev/sdb6 > > missing" > >Looks good. > > > 5) mdadm report that this array gets created successfully (in a >degraded > > > state): > > james@ubuntu-fileserver:~$ sudo mdadm --detail /dev/md0 > > /dev/md0: > > Version : 00.90.01 > > Creation Time : Wed Nov 16 23:05:41 2005 > > Raid Level : raid5 > > Array Size : 192512 (188.00 MiB 197.13 MB) > > Device Size : 96256 (94.00 MiB 98.57 MB) > > Raid Devices : 3 > > Total Devices : 2 > > Preferred Minor : 0 > > Persistence : Superblock is persistent > > > > Update Time : Wed Nov 16 23:05:41 2005 > > State : clean, degraded > > Active Devices : 2 > > Working Devices : 2 > > Failed Devices : 0 > > Spare Devices : 0 > > > > Layout : left-symmetric > > Chunk Size : 64K > > > > UUID : 6f0fa83a:ae0b4a99:a199ce36:9e21815c > > Events : 0.4 > > > > Number Major Minor RaidDevice State > > 0 8 21 0 active sync >/dev/.static/dev/sdb5 > > 1 8 22 1 active sync >/dev/.static/dev/sdb6 > > 2 0 0 - removed > > > > 6) Start EVMS GUI. Gives me a warning about /md/md0 being in a >degraded > > > state (which is fine). It doesn't give me the option to create an EVMS > > volume from the /md/md0 array (which is not good). It sees the array, >and > > reports it as both degraded and corrupted (it shouldn't be, right?). > >Correct. EVMS should discover the array is degraded, but it should not be >corrupt. > > > 7) Close down EVMS and restart it. Again, I get a warning about the >array > > being degraded. Now I do have the option to create a volume from the >array. > > On trying to do this however, I get the following error message: > > *** glibc detected *** free(): invalid next size (fast): 0x0819d720 *** > > and EVMS hangs completely. > > > > I've attached a gzipped copy of the evms-engine.log file. > > > > Hope this helps... > >It does. The last thing in the log is a call to raid5_free_private_data() >in the MD plug-in. That function has a bunch of calls to free memory. I >suspect one of those is going bad. It is not apparent from the code which >one is bad. In fact, for each call to free memory, the code checks for a >NULL pointer first, frees the non-NULL pointer, and then sets the pointer >to NULL. Looks safe to me. My guess is there is some code elsewhere that >has already freed some memory but has not set the pointer to NULL, >resulting in a double free. > >EVMS has some memory debugging code, but it has to be built in. Can you do >me the favor of another run? Rebuild EVMS with the debug support. >./configure --with-debug <your-other-options> >make clean install > >Then run evmsgui with the debug level set to "everything", e.g., "evmsgui >-d everything". That will generate a rather large log file since it will >log *everything*, which includes all calls to allocate and free memory. >I'm hoping the memory allocation debugging code will catch a double free >and that the log will contain the trace information to find out who did it. > > > What is the relationship between the EVMS MD plugin and mdadm by the >way? > > > Do they use the same code, or are they entirely different programs that > > interface in their own way with the Linux RAID driver? Should arrays > > created by mdadm and array created by EVMS be identical once they have >been > > created? > >EVMS and mdadm are entirely different programs that interface in their own >way with the Linux RAID driver. Naturally, they share common knowledge >about the on-disk data structures and the behavior of the MD code in the >kernel. Arrays created by mdadm should be understood by EVMS and vice >versa. > >Steve D. > > > >------------------------------------------------------- >This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >Register for a JBoss Training Course. Free Certification Exam >for All Training Attendees Through End of 2005. For more info visit: >http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >_______________________________________________ >Evms-devel mailing list >Evm...@li... >To subscribe/unsubscribe, please visit: >https://lists.sourceforge.net/lists/listinfo/evms-devel |