tier-users Mailing List for btier
btier creates an automated tiered blockdevice
Status: Beta
Brought to you by:
mruijter
You can subscribe to this list here.
2013 |
Jan
(25) |
Feb
(26) |
Mar
(20) |
Apr
(1) |
May
(1) |
Jun
(12) |
Jul
(2) |
Aug
(4) |
Sep
|
Oct
|
Nov
(7) |
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
(60) |
Feb
(13) |
Mar
(6) |
Apr
|
May
|
Jun
(2) |
Jul
(3) |
Aug
(38) |
Sep
(17) |
Oct
(26) |
Nov
(3) |
Dec
(2) |
2015 |
Jan
(5) |
Feb
(7) |
Mar
(8) |
Apr
(1) |
May
(3) |
Jun
|
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(1) |
Nov
|
Dec
(3) |
2016 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Pavel G. <Pa...@ac...> - 2016-04-28 12:40:11
|
Hello Hal, Yes, I meant the latest version 1.3.11. Unfortunately, switching to vfs didn't help. It has the same memory leak. I tried to reproduce it in a lab test. Manual migrating a single block back and forth doesn't reproduce the issue. It leaks in a real life scenario. From: Hal Bouma <hb...@ne...<mailto:hb...@ne...>> Date: Thursday 28 April 2016 at 14:11 To: Pavel Gashev <Pa...@ac...<mailto:Pa...@ac...>>, "tie...@li...<mailto:tie...@li...>" <tie...@li...<mailto:tie...@li...>> Subject: RE: Memory leak in btier-1.3.10 Hi Pavel, You mean 1.3.11 and the memory leak only happens when you’re using the bio. The vfs method doesn’t demonstrate this behavior. Would be nice if bio could be fixed. Hal From: Pavel Gashev [mailto:Pa...@ac...] Sent: Wednesday, April 27, 2016 5:39 AM To: tie...@li...<mailto:tie...@li...> Subject: [Tier-users] Memory leak in btier-1.3.10 Hello, I know biter-1.3.10 is supposed to be a final release in 1.3.x branch. Unfortunately it has memory leak which makes it completely unusable: # grep ^kmalloc-256 /proc/slabinfo kmalloc-256 44238112 44238112 256 32 2 : tunables 0 0 0 : slabdata 1382441 1382441 0 So there are +44M allocations in kmalloc-256. It's around 10GB of RAM. It's growing during block migrations. Kernel is 3.10.0-327.13.1.el7.x86_64 |
From: Hal B. <hb...@ne...> - 2016-04-28 11:32:17
|
Hi Pavel, You mean 1.3.11 and the memory leak only happens when you’re using the bio. The vfs method doesn’t demonstrate this behavior. Would be nice if bio could be fixed. Hal From: Pavel Gashev [mailto:Pa...@ac...] Sent: Wednesday, April 27, 2016 5:39 AM To: tie...@li... Subject: [Tier-users] Memory leak in btier-1.3.10 Hello, I know biter-1.3.10 is supposed to be a final release in 1.3.x branch. Unfortunately it has memory leak which makes it completely unusable: # grep ^kmalloc-256 /proc/slabinfo kmalloc-256 44238112 44238112 256 32 2 : tunables 0 0 0 : slabdata 1382441 1382441 0 So there are +44M allocations in kmalloc-256. It's around 10GB of RAM. It's growing during block migrations. Kernel is 3.10.0-327.13.1.el7.x86_64 |
From: Pavel G. <Pa...@ac...> - 2016-04-27 10:59:42
|
Hello, I know biter-1.3.10 is supposed to be a final release in 1.3.x branch. Unfortunately it has memory leak which makes it completely unusable: # grep ^kmalloc-256 /proc/slabinfo kmalloc-256 44238112 44238112 256 32 2 : tunables 0 0 0 : slabdata 1382441 1382441 0 So there are +44M allocations in kmalloc-256. It's around 10GB of RAM. It's growing during block migrations. Kernel is 3.10.0-327.13.1.el7.x86_64 |
From: Markus K. <ro...@sw...> - 2016-03-20 22:36:02
|
It would seem that some non-trivial changes to block devices came about between 4.2 and 4.3, and btier 2 no longer builds. Does anyone have a patch? |
From: Chris B. <ch...@ce...> - 2016-02-14 11:32:00
|
Hi there, I have a couple of tiers of storage that has been working very well for a couple of years. However, I've noticed some writes accumlating on tier1 when there is free space on tier0 (see below). I thought the logic was that writing a block will migrate it to the top tier. Any ideas how I can ensure all writes hit tier0? [root@i7 sdtiera]# while true; do date; cat /sys/block/sdtiera/tier/device_usage; sleep 2; done Sun Feb 14 21:59:07 ACDT 2016 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 283152 1477 2478 676215677 1134482947 1 dm-2 1468004 1461291 475 98 697674154 144466226 Sun Feb 14 21:59:09 ACDT 2016 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 283152 1477 2478 676216039 1134482947 1 dm-2 1468004 1461291 475 98 697674157 144466226 Sun Feb 14 21:59:11 ACDT 2016 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 283152 1477 2478 676217253 1134486329 1 dm-2 1468004 1461291 475 98 697678480 144466497 Sun Feb 14 21:59:13 ACDT 2016 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 283152 1477 2478 676217803 1134486329 1 dm-2 1468004 1461291 475 98 697678513 144466497 Sun Feb 14 21:59:15 ACDT 2016 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 283152 1477 2478 676218937 1134486361 1 dm-2 1468004 1461291 475 98 697678513 144466497 Thanks, Chris |
From: Hal B. <hb...@ne...> - 2015-12-14 14:20:51
|
Hi everyone, It looks like my initial stability issues was with CloudLinux 7 (which is a branch of CE7). I've switched back to CloudLinux 6 which has been stable. I've ran into one other issue though. Running for a few days with the -B option has caused a basically idle system to use over 10GB of RAM. Slabtop shows the following info: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 20303820 20303816 99% 0.25K 1353588 15 5414352K biovec-16 20303940 20303877 99% 0.19K 1015197 20 4060788K bio-0 Does this mean there is a possible memory leak in the new block io method for the 2.6.32 kernels? Or is using 10GB on block io buffers expected and can be purged if the system runs out of memory? Switching to the older vfs method has not caused any excessive memory usage so I'm running in that mode for now. Hal |
From: Chris B. <ch...@ce...> - 2015-12-03 04:13:42
|
Are you using -B (bio) or -V (vfs) (default) ? I hit corruption in on CentOS7, 1.3.11, ZFS on top of btier while using bio mode. My observations were posted to the list in March this year. I've since been using Fedora (20,21,22) with btier 1.3.11 reliably for the last 8 months. I'm using ZFS on top so periodic scrubbing would tell me if btier was screwing things up, but it's been rock solid. Regards, Chris |
From: Hal B. <hb...@ne...> - 2015-12-02 20:12:46
|
Hi everyone, I've ran into some stability issues on a server with btier 1.3.11 and I just wanted to ask a couple questions to make sure I am using it properly and as intended. I'm running CentOS 7 & btier on a PowerEdge 2950 with PERC 6 controller and 2x240GB Intel DC3500 drives (RAID1) and 4x600GB 15K SAS Drives (RAID10). I spent some time configuring dracut to run btier_setup so it could be used as the main file system. i.e. Filesystem Size Used Avail Use% Mounted on /dev/sdtiera1 1.3T 65G 1.2T 6% / I set this up about a month ago and everything was stable until last week when I started moving over some content. Then the server started randomly rebooting. So my first question is, is btier safe to use with the root file system this way? Or should I be having it setup on a different partition that is used by say virtual machine images, or other applications? I have noticed after the server reboots that the tiered storage usage on the SSD jumps way up. i.e. TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 sdb3 174569 89582 10 5 1909112 921183 1 sda3 1134326 35478 2 4 2540301 5253152 After a day, most of the allocations will get again get moved back to the SAS drives (sda3) based on the default migration settings until the server again auto reboots. Today it started to fail again where it was reporting file system errors and corrupted memory links and then the server didn't come back up. The partition table was longer valid and thus unable to be mounted. While I was able to repair the partition table, many of the OS files that hadn't been touched were damaged as well. So it feels like the 1MB block mapping table is getting out of sync with the position of the blocks on the storage devices which is what leads to the crashes. I've tried to be conservative with the settings by using writethrough and ext4 so that barriers are supported, yet data is still getting corrupted. I'm not really taxing the server either - usually the server reboots when it's over 80% idle. Any advice given would be greatly appreciated. Thanks! Hal |
From: Marc S. <mar...@mc...> - 2015-10-29 12:59:15
|
Hi, I created a quick patch to handle kernel versions >= 3.19 -- I used #ifdef's inline, so not sure if this is the optimal way to handle this, but it seems to work fine. I tested with kernel version 4.1.11, but haven't tested with anything lower then 3.19 yet... I imagine it will work fine. I've also attached the patch file to this email. I created the patch again tip but it also applied cleanly to BTIER 1.3.11. --Marc --snip-- diff -Naur a/kernel/btier/btier_main.c b/kernel/btier/btier_main.c --- a/kernel/btier/btier_main.c 2015-10-26 16:30:46.155707719 -0400 +++ b/kernel/btier/btier_main.c 2015-10-28 11:16:31.382317117 -0400 @@ -402,7 +402,11 @@ if (likely(bw == len)) return 0; pr_err("Write error on device %s at offset %llu, length %li\n", +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + backdev->fds->f_path.dentry->d_name.name, +#else backdev->fds->f_dentry->d_name.name, +#endif (unsigned long long)pos, (unsigned long)len); if (bw >= 0) bw = -EIO; @@ -2161,7 +2165,11 @@ static void tier_check(struct tier_device *dev, int devicenr) { pr_info("device %s is not clean, check forced\n", +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[devicenr]->fds->f_path.dentry->d_name.name); +#else dev->backdev[devicenr]->fds->f_dentry->d_name.name); +#endif recover_journal(dev, devicenr); } @@ -2323,13 +2331,21 @@ if (!xbuf) return NULL; for (i = 0; i < dev->attached_devices; i++) { +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + name = dev->backdev[i]->fds->f_path.dentry->d_name.name; +#else name = dev->backdev[i]->fds->f_dentry->d_name.name; +#endif thash = tiger_hash((char *)name, strlen(name)); if (!thash) { /* When tiger is not supported, use a simple UUID construction */ thash = kzalloc(TIGER_HASH_LEN, GFP_KERNEL); memcpy(thash, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[i]->fds->f_path.dentry-> d_name.name, +#else dev->backdev[i]->fds->f_dentry->d_name.name, +#endif hashlen); } for (n = 0; n < hashlen; n++) { @@ -2414,7 +2430,11 @@ dev->backdev[i]->devmagic->clean = DIRTY; write_device_magic(dev, i); dtapolicy = &dev->backdev[i]->devmagic->dtapolicy; +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + devicename = dev->backdev[i]->fds->f_path.dentry-> d_name.name; +#else devicename = dev->backdev[i]->fds->f_dentry->d_name.name; +#endif pr_info("device %s registered as tier %u\n", devicename, i); if (0 == dtapolicy->max_age) dtapolicy->max_age = TIERMAXAGE; @@ -2955,7 +2975,11 @@ u64 startofnewbitlist; pr_info("resize device %s devicenr %u from %llu to %llu\n", +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[devicenr]->fds->f_path.dentry->d_name.name, +#else dev->backdev[devicenr]->fds->f_dentry->d_name.name, +#endif devicenr, dev->backdev[devicenr]->devicesize, curdevsize); startofnewbitlist = newdevsize - newbitlistsize; res = diff -Naur a/kernel/btier/btier_sysfs.c b/kernel/btier/btier_sysfs.c --- a/kernel/btier/btier_sysfs.c 2015-10-26 16:30:46.156707705 -0400 +++ b/kernel/btier/btier_sysfs.c 2015-10-28 11:17:07.308023714 -0400 @@ -420,7 +420,11 @@ memcpy(devicename, a, p - a); if (0 != strcmp(devicename, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[devicenr]->fds->f_path.dentry->d_name.name)) { +#else dev->backdev[devicenr]->fds->f_dentry->d_name.name)) { +#endif kfree(devicename); goto end_error; } @@ -653,7 +657,11 @@ as_sprintf ("%7s %20s %15s %15s\n%7u %20s %15u %15u\n", "tier", "device", "max_age", "hit_collecttime", i, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[i]->fds->f_path.dentry-> d_name.name, +#else dev->backdev[i]->fds->f_dentry->d_name.name, +#endif dev->backdev[i]->devmagic->dtapolicy.max_age, dev->backdev[i]->devmagic->dtapolicy. hit_collecttime); @@ -661,7 +669,11 @@ msg2 = as_sprintf("%s%7u %20s %15u %15u\n", msg, i, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[i]->fds->f_path.dentry-> +#else dev->backdev[i]->fds->f_dentry-> +#endif d_name.name, dev->backdev[i]->devmagic->dtapolicy. max_age, @@ -734,7 +746,11 @@ line = as_sprintf ("%7u %20s %15lu %15llu %15u %15u %15llu %15llu\n", i, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)) + dev->backdev[i]->fds->f_path.dentry->d_name.name, devblocks, +#else dev->backdev[i]->fds->f_dentry->d_name.name, devblocks, +#endif allocated, dev->backdev[i]->devmagic->average_reads, dev->backdev[i]->devmagic->average_writes, dev->backdev[i]->devmagic->total_reads, --snip-- |
From: Marcin M. <ma...@me...> - 2015-09-21 09:36:31
|
W dniu 15.09.2015 o 20:39, Mark Ruijter pisze: > > Something like this should fix it. > > find . -name '*.c' -type f -exec sed -i s/f_dentry/f_path.dentry/g {} + Thank you, it works. Marcin |
From: Marcin M. <ma...@me...> - 2015-09-15 07:49:19
|
Hello! I'm trying to compile btier-2.0.1 and I'm getting errors: # LC_ALL=en_US.utf.8 make make -Wall -C /lib/modules/3.19.8-gentoo/build M=/var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier modules make[1]: Entering directory '/usr/src/linux-3.19.8-gentoo' CC [M] /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.o /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c: In function ‘tier_attr_migration_policy_store’: /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c:386:33: error: ‘struct file’ has no member named ‘f_dentry’ dev->backdev[devicenr]->fds->f_dentry->d_name.name)) { ^ /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c: In function ‘tier_attr_migration_policy_show’: /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c:611:29: error: ‘struct file’ has no member named ‘f_dentry’ dev->backdev[i]->fds->f_dentry->d_name.name, ^ /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c:619:32: error: ‘struct file’ has no member named ‘f_dentry’ dev->backdev[i]->fds->f_dentry-> ^ /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c: In function ‘tier_attr_device_usage_show’: /var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.c:706:28: error: ‘struct file’ has no member named ‘f_dentry’ dev->backdev[i]->fds->f_dentry->d_name.name, devblocks, ^ scripts/Makefile.build:257: polecenia dla obiektu '/var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.o' nie powiodły się make[2]: *** [/var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier/btier_sysfs.o] Błąd 1 Makefile:1382: recipe for target '_module_/var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier' failed make[1]: *** [_module_/var/tmp/portage/sys-block/btier-2.0.1/work/btier-2.0.1/kernel/btier] Error 2 make[1]: Leaving directory '/usr/src/linux-3.19.8-gentoo' Makefile:6: recipe for target 'default' failed make: *** [default] Error 2 It looks that kernel 3.19 introduce some changes that triggers this error: https://github.com/zfsonlinux/zfs/issues/2959 Is a chance for btier-2.0.2 with fixed this bug?:) Thanks, Marcin |
From: T.C. F. <tc...@gm...> - 2015-07-07 18:44:03
|
Hey Mark, Appreciate the response. I'm definitely very familiar with raid and the associated overhead. I was using the sequential bypass feature and was only seeing 30MBs or so a second. And was seeing 8k io to the backing MD device. Removed Btier and that same backing device was doing 300MBs with 1mb write chunks and 800+ MBs with 10mb write chunks. I think it might have to do with the get_chunksize function? I would think we would want to lay out stripe width chunks for a backing MD device to avoid read write modify penalty? With 2.x I saw 50MBs and 600k io to MD device (but still not stripe width). Still was off from the 300MBs and 800MBs I'm seeing without BTIER. I'm no kernel module developer though. On Jul 7, 2015 7:45 AM, "Mark Ruijter" <mru...@gm...> wrote: > > Btier divides the virtual block device that it presents into segments of > 1MB. > These segments are used for data migration purposes only. > > Writes to the underlying devices are done a usual after redirecting them > to the underlying device that is desired for that particular io type. > For example sequential writes may directly go to raid6, while random > writes are handled by the raid1 SSDs. > > When it comes to RAID6 you are aware of the 1:6 worse case write penalty > that comes with it? > > This is one of the countless articles about raid penalty that are out > there: > > https://sudrsn.wordpress.com/2010/12/25/iops-raid-penalty-and-workload-characterization/ > > 12 drives in RAID6 (SATA) can handle 12 * 150 / 6 = 300 random 4k IOPS. > > A typical btier setup would use something like a raid1 with SSDs + raid6 > SATA. > The two SSDs would be able to handle 40~140K IOPS and the raid6 would > still give a good sequential write performance. > > Mark. > > > > On 7/6/15 7:43 PM, T.C. Ferguson wrote: > > I see that BTIER is using a 1MB chunk on disk, but I don't necessarily > see any specific configs / optimizations for optimal stripe width, etc. > > It would be great if we could get full stripe writes down to a parity > set. Just for fun I wanted to see what it would look like against a mdadm > raid6 set with 12 disks. The read+write+modify penalty was very very heavy > (lots of reads to accomplish pure write workload or any size). > > I couldn't find anything about this topic on the list prior. Thoughts? > Suggestions? > > BTIER does some really cool things. We are big fans. Just trying to > understand the legacy before we dive in to any specific direction. > > -- > T.C. Ferguson > (916) 258-2568 | Mobile > tc...@gm... > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today.https://www.gigenetcloud.com/ > > > > _______________________________________________ > Tier-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/tier-users > > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Tier-users mailing list > Tie...@li... > https://lists.sourceforge.net/lists/listinfo/tier-users > > |
From: Mark R. <mru...@gm...> - 2015-07-07 14:44:53
|
Btier divides the virtual block device that it presents into segments of 1MB. These segments are used for data migration purposes only. Writes to the underlying devices are done a usual after redirecting them to the underlying device that is desired for that particular io type. For example sequential writes may directly go to raid6, while random writes are handled by the raid1 SSDs. When it comes to RAID6 you are aware of the 1:6 worse case write penalty that comes with it? This is one of the countless articles about raid penalty that are out there: https://sudrsn.wordpress.com/2010/12/25/iops-raid-penalty-and-workload-characterization/ 12 drives in RAID6 (SATA) can handle 12 * 150 / 6 = 300 random 4k IOPS. A typical btier setup would use something like a raid1 with SSDs + raid6 SATA. The two SSDs would be able to handle 40~140K IOPS and the raid6 would still give a good sequential write performance. Mark. On 7/6/15 7:43 PM, T.C. Ferguson wrote: > I see that BTIER is using a 1MB chunk on disk, but I don't necessarily > see any specific configs / optimizations for optimal stripe width, etc. > > It would be great if we could get full stripe writes down to a parity > set. Just for fun I wanted to see what it would look like against a > mdadm raid6 set with 12 disks. The read+write+modify penalty was very > very heavy (lots of reads to accomplish pure write workload or any size). > > I couldn't find anything about this topic on the list prior. Thoughts? > Suggestions? > > BTIER does some really cool things. We are big fans. Just trying to > understand the legacy before we dive in to any specific direction. > > -- > T.C. Ferguson > (916) 258-2568 | Mobile > tc...@gm... <mailto:tc...@gm...> > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > > > _______________________________________________ > Tier-users mailing list > Tie...@li... > https://lists.sourceforge.net/lists/listinfo/tier-users |
From: T.C. F. <tc...@gm...> - 2015-07-06 17:43:52
|
I see that BTIER is using a 1MB chunk on disk, but I don't necessarily see any specific configs / optimizations for optimal stripe width, etc. It would be great if we could get full stripe writes down to a parity set. Just for fun I wanted to see what it would look like against a mdadm raid6 set with 12 disks. The read+write+modify penalty was very very heavy (lots of reads to accomplish pure write workload or any size). I couldn't find anything about this topic on the list prior. Thoughts? Suggestions? BTIER does some really cool things. We are big fans. Just trying to understand the legacy before we dive in to any specific direction. -- T.C. Ferguson (916) 258-2568 | Mobile tc...@gm... |
From: T.C. F. <tc...@gm...> - 2015-07-03 06:03:45
|
With BTIER announcing an optimal filesytem size of 1MB, what are the optimal filesystem mkfs parameters? (I.E. for ext4 what is the stride and stride width setting? xfs, etc.)? thanks! -- T.C. Ferguson (916) 258-2568 | Mobile tc...@gm... |
From: T.C. F. <tc...@gm...> - 2015-07-02 22:24:08
|
Hello, I have a use case where I would like to have more than 26 tiered devices. Any chance this is something that is being worked on? The naming seems very static. Any chance we could take a page out of MDADMs book and look to metadata and udev to insert the device in to the system /dev directory? Also allow an arbitrary name to be set on creation that would be used in naming the device when it is brought online? -- T.C. Ferguson (916) 258-2568 | Mobile tc...@gm... |
From: Geoff B. <geo...@gm...> - 2015-05-25 16:31:36
|
I'm just starting to experiment with btier, and 2.0.1 felt like the right version. I got wrapped around the axle just trying to build it, at least partly because it's been a few years since I spent much time with debian, and just as many years since I built a kernel module that wasn't available from my package manager. I'd like to share what I did in case it's helpful for anyone else, and also to see if anyone recognizes any red flags in my setup. Trying to just build it per the README, I installed the linux-headers and linux-headers-amd64 packages (the latter should give me the part of the kernel source tree that's necessary) and ran make. I got this output: http://pastebin.com/TmiRw8vt instead of a kernel module. One of the good folks on #bcache pointed me to dkms. So I copied files from kernel/btier to /usr/src/btier-2.0.1 and created a dkms.conf file with http://paste.debian.net/183394/ Then I was able to run dkms -m btier -v 2.0.1 build dkms -m btier -v 2.0.1 install to compile and install the ko. Based on reading up/asking around, it looks like this has the nice benefit of making sure that kernel updates from apt will recompile and reload the btier module automagically. Then I edited the makefile to skip build and installation of the kernel module: http://paste.debian.net/183404/ Now back to experimenting with btier :D I hope this either helps someone else who tries this or causes someone to loudly call me an idiot and tell me what I just did incorrectly. Thanks Geoff |
From: Joris D. <Jo...@fa...> - 2015-05-25 13:18:33
|
>-----Original Message----- >From: Joris Dobbelsteen >Sent: Sunday, May 24, 2015 1:27 >To: 'tie...@li...' >Subject: btier resize fails > >Hi, > >I've been tested btier 1.3.10 on a Debian Wheezy virtual machine (Host >is Centos 6.6 KVM). Unfortunately, there is a crash when a resize is >attempted on a tier with 3 storage devices. Tier2 (sdb1) has been >resized from 800GB to 1.2TB. Details are attached below. I have tried to resize btier again after waiting for some time. This time first and second tier data have mostly migrated. This seems to solve the first issue, but shows something interesting in the dmesg, which looks like a datatype issue: btier: Device sdtiera is resized from 1084435202048 to 1513920397312 sdtiera: detected capacity change from 1084435202048 to -685102858240 This looks to be a signed/unsigned mismatch of assigning the size to a 32-bit signed int in (512 byte) blocks. >Can you help me find out what the problem is, or avoid the it? Would this added information help find the issue? - Joris Dobbelsteen [snip] > >Tree: sdd1 + sdc1 + sdb1 > --> sdtiera > --> LVM2 > --> 1 logical volume with ext4 > [snip] [ 2.425688] btier: version : 1.3.10 [ 2.428942] btier: device sdd1 is not clean, check forced [ 2.428944] btier: recover_journal : journal is clean, no need to recover [ 2.430731] btier: device sdd1 registered as tier 0 [ 2.430734] btier: device sdd1 is a real device [ 2.430735] btier: device sdc1 is not clean, check forced [ 2.430736] btier: recover_journal : journal is clean, no need to recover [ 2.430742] btier: device sdc1 registered as tier 1 [ 2.430743] btier: device sdc1 is a real device [ 2.430744] btier: device sdb1 is not clean, check forced [ 2.430745] btier: recover_journal : journal is clean, no need to recover [ 2.430750] btier: device sdb1 registered as tier 2 [ 2.430751] btier: device sdb1 is a real device [ 2.430752] btier: repair_bitlists : clearing and rebuilding bitlists [ 2.430753] btier: dev->blocklistsize : 0x1c00000 (29360128) [ 2.430754] btier: backdev->devicesize : 0x27ff00000 (10736369664) [ 2.430755] btier: backdev->startofdata : 0x100000 [ 2.430760] btier: backdev->bitlistsize : 0x100000 [ 2.430761] btier: backdev->startofbitlist : 0x27fe00000 [ 2.430762] btier: backdev->endofdata : 0x27e1fffff [ 2.430762] btier: backdev->devicesize : 0x31fff00000 (214747316224) [ 2.430763] btier: backdev->startofdata : 0x100000 [ 2.430764] btier: backdev->bitlistsize : 0x100000 [ 2.430764] btier: backdev->startofbitlist : 0x31ffe00000 [ 2.430765] btier: backdev->endofdata : 0x31ffdfffff [ 2.430765] btier: backdev->devicesize : 0xc7fff00000 (858992410624) [ 2.430766] btier: backdev->startofdata : 0x100000 [ 2.430767] btier: backdev->bitlistsize : 0x100000 [ 2.430767] btier: backdev->startofbitlist : 0xc7ffe00000 [ 2.430768] btier: backdev->endofdata : 0xc7ffdfffff [ 2.430768] btier: dev->backdev[0]->startofblocklist: 0x27e200000 [ 2.683382] sdtiera: unknown partition table [ 2.683853] btier: write mode = vfs to devices and files [131314.151214] btier: curdevsize = 10736369664 old = 10736369664 [131314.151423] btier: curdevsize = 214747316224 old = 214747316224 [131314.151617] btier: curdevsize = 1288489140224 old = 858992410624 [131314.151811] btier: newblocklistsize=40894464 [131314.151954] btier: resize device sdb1 devicenr 2 from 858992410624 to 1288489140224 [131314.152215] btier: migrate_bitlist : device 2 [131314.155778] btier: copylist device 2, ostart 0xc7ffe00000 (858991362048) osize 0x100000 (1048576), nstart 0x12bffd00000 (1288487043072) end 0x12bffe00000 (1288488091648) [131314.332520] btier: migrate_data_if_needed [131314.349525] btier: migrate_data_if_needed return 0 [131314.349681] btier: copylist device 0, ostart 0x27e200000 (10705960960) osize 0x1c00000 (29360128), nstart 0x27d700000 (10694426624) end 0x27f300000 (10723786752) [131314.949143] btier: Device sdtiera is resized from 1084435202048 to 1513920397312 [131314.949397] sdtiera: detected capacity change from 1084435202048 to -685102858240 [131315.230410] BUG: unable to handle kernel paging request at ffffc90002c2e000 [131315.230633] IP: [<ffffffffa022af9b>] load_blocklist+0x80/0xea [btier] [131315.230854] PGD 3e139067 PUD 3e13a067 PMD 10031067 PTE 0 [131315.231303] Oops: 0002 [#1] SMP [131315.231613] CPU 0 [131315.231673] Modules linked in: ext4 crc16 jbd2 xfs tgr192 btier(O) processor crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 button thermal_sys snd_ pcm snd_page_alloc i2c_piix4 snd_timer psmouse i2c_core snd virtio_balloon aes_g eneric evdev serio_raw cryptd soundcore pcspkr virtio_console ext3 mbcache jbd d m_mod sr_mod cdrom sg sd_mod crc_t10dif ata_generic floppy virtio_net virtio_scs i ata_piix uhci_hcd ehci_hcd libata scsi_mod usbcore usb_common virtio_pci virti o_ring virtio [last unloaded: scsi_wait_scan] [131315.232005] [131315.232005] Pid: 15638, comm: bash Tainted: GO 3.2.0-4-amd64 #1 Debian 3.2.68-1+deb7u1 oVirt oVirt Node [131315.232005] RIP: 0010:[<ffffffffa022af9b>] [<ffffffffa022af9b>] load_blocklist+0x80/0xea [btier] [131315.232005] RSP: 0018:ffff8800256b3e58 EFLAGS: 00010292 [131315.232005] RAX: ffff88000631f2c0 RBX: ffff88003776c000 RCX: 0000000000000020 [131315.232005] RDX: 000000000000001c RSI: ffff88000631f2c0 RDI: ffffffffa022af9b [131315.232005] RBP: 0000000000164a00 R08: 00000000000080d0 R09: 0000000000000000 [131315.232005] R10: ffff8800256b3d98 R11: ffff8800256b3d98 R12: ffff880036ea6740 [131315.232005] R13: 00000ffffff607cb R14: ffff88003776c21c R15: ffffc90002c2e000 [131315.232005] FS: 00007f6e77f33700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [131315.232005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [131315.232005] CR2: ffffc90002c2e000 CR3: 000000000a9f5000 CR4: 00000000001406f0 [131315.232005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [131315.232005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [131315.232005] Process bash (pid: 15638, threadinfo ffff8800256b2000, task ffff880036cee0c0) [131315.232005] Stack: [131315.232005] 0000000300000001 ffff88003776c21c 0000000000000002 ffff88003776c000 [131315.232005] ffff88003776c120 0000000000000002 ffffffff8143d530 ffff880037169878 [131315.232005] ffff88000dae9a40 ffffffffa0229787 ffff8800342ddb50 ffff88000dae9a60 [131315.232005] Call Trace: [131315.232005] [<ffffffffa0229787>] ? tier_attr_resize_store+0x2e/0x46 [btier] [131315.232005] [<ffffffff811500b7>] ? sysfs_write_file+0xe0/0x11c [131315.232005] [<ffffffff810fb559>] ? vfs_write+0xa2/0xe9 [131315.232005] [<ffffffff810fb736>] ? sys_write+0x45/0x6b [131315.232005] [<ffffffff81356172>] ? system_call_fastpath+0x16/0x1b [131315.232005] Code: 00 00 31 c0 31 ed 4c 8d b3 1c 02 00 00 eb 63 4c 8d 3c ed 0 0 00 00 00 4d 03 7c 24 68 be d0 00 00 00 bf 1c 00 00 00 e8 fb f7 ff ff <49> 89 0 7 49 8b 44 24 68 48 8b 14 e8 48 85 d2 74 3d 4c 6b c5 1c [131315.232005] RIP [<ffffffffa022af9b>] load_blocklist+0x80/0xea [btier] [131315.232005] RSP <ffff8800256b3e58> [131315.232005] CR2: ffffc90002c2e000 [131315.232005] ---[ end trace 82a76019b5a74bd1 ]--- |
From: Joris D. <Jo...@fa...> - 2015-05-23 23:43:30
|
Hi, I've been tested btier 1.3.10 on a Debian Wheezy virtual machine (Host is Centos 6.6 KVM). Unfortunately, there is a crash when a resize is attempted on a tier with 3 storage devices. Tier2 (sdb1) has been resized from 800GB to 1.2TB. Details are attached below. Can you help me find out what the problem is, or avoid the it? I did not manage to dive deep enough in the btier code to diagnose issue though. The GCC optimizer does not make code analysis easier, as some intermediate function points are missing. Is there some documentation to get started with the btier code? Best regards, - Joris Dobbelsteen The tier consists of these drives (after reboot): TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 sdd1 10208 10208 0 0 631 1 1 sdc1 204796 204796 0 0 68 0 2 sdb1 819196 715473 0 0 12 0 Tree: sdd1 + sdc1 + sdb1 --> sdtiera --> LVM2 --> 1 logical volume with ext4 Dmesg shows: [ 153.845601] btier: migrate_data_if_needed : blocknr 366623 from device 0 [ 153.846056] btier: migrate_data_if_needed : blocknr 366624 from device 0 [ 153.846475] btier: migrate_data_if_needed : blocknr 366625 from device 0 [ 153.846940] btier: migrate_data_if_needed : blocknr 368641 from device 0 [ 153.848470] btier: migrate_data_if_needed : blocknr 368642 from device 0 [ 153.848954] btier: migrate_data_if_needed : blocknr 368643 from device 0 [ 153.849363] btier: migrate_data_if_needed : blocknr 368644 from device 0 [ 153.849789] btier: migrate_data_if_needed : blocknr 368645 from device 0 [ 153.850215] btier: migrate_data_if_needed : blocknr 368646 from device 0 [ 153.850630] btier: migrate_data_if_needed : blocknr 368647 from device 0 [ 153.851124] btier: migrate_data_if_needed : blocknr 368648 from device 0 [ 153.851549] btier: migrate_data_if_needed : blocknr 368649 from device 0 [ 153.851989] btier: migrate_data_if_needed : blocknr 368650 from device 0 [ 153.852421] btier: migrate_data_if_needed : blocknr 368651 from device 0 [ 153.852914] btier: migrate_data_if_needed : blocknr 368652 from device 0 [ 153.853339] btier: Call copyblock blocknr 368652 from device 0 to device 2 [ 153.853903] BUG: unable to handle kernel NULL pointer dereference at (null) [ 153.854613] IP: [<ffffffffa020a20b>] allocate_dev.isra.16+0x54/0x10d [btier] [ 153.855065] PGD 3c7c2067 PUD 33c09067 PMD 0 [ 153.855516] Oops: 0000 [#1] SMP [ 153.855914] CPU 0 [ 153.855998] Modules linked in: ext4 crc16 jbd2 xfs tgr192 btier(O) snd_pcm sn d_page_alloc snd_timer snd i2c_piix4 crc32c_intel soundcore psmouse processor i2 c_core ghash_clmulni_intel button pcspkr serio_raw evdev virtio_balloon thermal_ sys aesni_intel aes_x86_64 aes_generic cryptd virtio_console ext3 mbcache jbd dm _mod sr_mod sg cdrom sd_mod crc_t10dif ata_generic virtio_scsi virtio_net floppy ata_piix uhci_hcd ehci_hcd libata scsi_mod usbcore usb_common virtio_pci virtio _ring virtio [last unloaded: scsi_wait_scan] [ 153.856417] [ 153.856417] Pid: 2699, comm: bash Tainted: G O 3.2.0-4-amd64 #1 Deb ian 3.2.68-1+deb7u1 oVirt oVirt Node [ 153.856417] RIP: 0010:[<ffffffffa020a20b>] [<ffffffffa020a20b>] allocate_dev .isra.16+0x54/0x10d [btier] [ 153.856417] RSP: 0018:ffff88003c729d68 EFLAGS: 00010246 [ 153.856417] RAX: 0000000000000000 RBX: ffff88003a4c7d40 RCX: ffff88003b9a6000 [ 153.856417] RDX: 0000000000000002 RSI: ffff880036d12f40 RDI: ffff88003b9a6000 [ 153.856417] RBP: ffff880036d12f40 R08: 0000000000000000 R09: 0000000000000003 [ 153.856417] R10: 0000000000000000 R11: ffffffff8160001a R12: 0000000000000000 [ 153.856417] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002 [ 153.856417] FS: 00007fbf6cb95700(0000) GS:ffff88003fc00000(0000) knlGS:00000 00000000000 [ 153.856417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 153.856417] CR2: 0000000000000000 CR3: 00000000371f5000 CR4: 00000000001406f0 [ 153.856417] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 153.856417] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 153.856417] Process bash (pid: 2699, threadinfo ffff88003c728000, task ffff88 0036fdc740) [ 153.856417] Stack: [ 153.856417] 0000000000000000 ffff88003b9a6000 0000000000000002 0000000000000 000 [ 153.856417] 0000000002700000 ffff880036d12f40 ffff88003b9a6000 ffff880038b64 ae0 [ 153.856417] 0000000000000003 0000000000000000 000000000005a00c ffffffffa020a 767 [ 153.856417] Call Trace: [ 153.856417] [<ffffffffa020a767>] ? copyblock+0x67/0x22d [btier] [ 153.856417] [<ffffffffa020dc51>] ? resize_tier+0x423/0x70e [btier] [ 153.856417] [<ffffffffa0208787>] ? tier_attr_resize_store+0x2e/0x46 [btier] [ 153.856417] [<ffffffff811500b7>] ? sysfs_write_file+0xe0/0x11c [ 153.856417] [<ffffffff810fb559>] ? vfs_write+0xa2/0xe9 [ 153.856417] [<ffffffff810fb736>] ? sys_write+0x45/0x6b [ 153.856417] [<ffffffff81356172>] ? system_call_fastpath+0x16/0x1b [ 153.856417] Code: 40 49 c1 e8 0c 4d 89 c5 49 c1 e0 20 49 c1 e5 0c e9 ae 00 00 00 48 8b 43 70 45 31 e4 4c 01 e8 48 89 44 24 18 eb 7f 48 8b 44 24 18 <42> 80 3c 20 ff 74 67 45 89 e6 41 c1 e6 14 4d 01 c6 4c 89 75 04 [ 153.856417] RIP [<ffffffffa020a20b>] allocate_dev.isra.16+0x54/0x10d [btier] [ 153.856417] RSP <ffff88003c729d68> [ 153.856417] CR2: 0000000000000000 [ 153.879857] ---[ end trace 1722e7e7ab7546a5 ]--- |
From: Chris B. <ch...@ce...> - 2015-04-06 10:50:43
|
I use my own userspace migration script and had seen this before, but not for a while. Today I managed to trigger it again :) [root@i7 tier]# cat device_usage TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 md0 457818 175977 138 78 63509917 35931320 1 dm-4 1468004 605481 3088377346 3088377346 18446744073709320901 18446744073709551005 Any idea why this might be happening? Thanks, Chris |
From: Chris B. <ch...@ce...> - 2015-03-22 00:17:54
|
Hi Mark, Sorry I was away for the week for work. > Kernel 3.14 introduced a significant amount of changes for the BIO code. I've observed the same corruption with kernel 3.10 + btier 1.3.11 when ZFS is in use using BIO mode. Note this is with CentOS 7 whose upstream kernel frequently backports features. Just something to consider. [root@localhost t1]# modinfo btier filename: /lib/modules/3.10.0-123.20.1.el7.x86_64/kernel/drivers/block/btier.ko author: Mark Ruijter license: GPL srcversion: C0D0950FCD6C0FD7137A74F depends: vermagic: 3.10.0-123.20.1.el7.x86_64 SMP mod_unload modversions [root@localhost t1]# strings /lib/modules/3.10.0-123.20.1.el7.x86_64/kernel/drivers/block/btier.ko | grep ^1.3 1.3.11 [root@localhost ~]# btier_setup -f /dev/sdb -c -B Device size (raw) : 0x500000000 (21474836480) Device size (rnd) : 0x500000000 (21474836480) Clearing bitlist of device : /dev/sdb offset : 0x4fff00000 (21473787904) device size : 0x500000000 (21474836480) bitlist size : 0x100000 (1048576) Total device size : 0x4ffc00000 (21470642176) Clearing blocklist of device : /dev/sdb list size : 0x100000 (1048576) starting from offset : 0x4ffe00000 (21472739328) write_device_magic device : 0 size : 0x4ffc00000 (21470642176) [root@localhost ~]# ls -la /dev/sdtiera brw-rw----. 1 root disk 252, 0 Mar 22 10:32 /dev/sdtiera [root@localhost ~]# zpool create p1 /dev/sdtiera -f [root@localhost ~]# zfs create p1/t1 [root@localhost ~]# cd /p1/t1/ [root@localhost t1]# dd_rescue /dev/urandom test.random ^Cdd_rescue: (fatal): Caught signal 2 "Interrupt". Flush and exit after current block! dd_rescue: (info): Summary for /dev/urandom -> test.random dd_rescue: (info): ipos: 107648.0k, opos: 107648.0k, xferd: 107648.0k errs: 0, errxfer: 0.0k, succxfer: 107648.0k +curr.rate: 11165kB/s, avg.rate: 14095kB/s, avg.load: 88.1% [root@localhost t1]# ls -la total 107731 drwxr-xr-x. 2 root root 3 Mar 22 10:34 . drwxr-xr-x. 3 root root 3 Mar 22 10:34 .. -rw-r-----. 1 root root 110231552 Mar 22 10:34 test.random [root@localhost t1]# du -sh test.random 106M test.random [root@localhost t1]# cat /sys/devices/virtual/block/sdtiera/tier/device_usage TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES 0 sdb 20476 112 0 1 526 27462 [root@localhost t1]# zpool status pool: p1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM p1 ONLINE 0 0 0 sdtiera ONLINE 0 0 0 errors: No known data errors [root@localhost t1]# zpool scrub p1 [root@localhost t1]# zpool status -v pool: p1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub repaired 2K in 0h0m with 105 errors on Sun Mar 22 10:36:35 2015 config: NAME STATE READ WRITE CKSUM p1 ONLINE 0 0 105 sdtiera ONLINE 0 0 214 errors: Permanent errors have been detected in the following files: /p1/t1/test.random |
From: Mark R. <mru...@gm...> - 2015-03-16 19:51:34
|
I have one last question. Is it possible to reproduce the problem with btier 1.3.11 on a kernel <3.14? Kernel 3.14 introduced a significant amount of changes for the BIO code. Thanks in advance, Mark Op 15 mrt. 2015 om 00:33 heeft Chris Bennett <ch...@ce...> het volgende geschreven: >> Does btier-2.0.1 appear to suffer from the same corruption? > > Just performed the same set of steps and no signs of corruption. > > Regards, > > Chris |
From: Chris B. <ch...@ce...> - 2015-03-14 23:33:32
|
> Does btier-2.0.1 appear to suffer from the same corruption? Just performed the same set of steps and no signs of corruption. Regards, Chris |
From: Mark R. <mru...@gm...> - 2015-03-14 20:22:16
|
Does btier-2.0.1 appear to suffer from the same corruption? That would be interesting to know since it is very different compared to 1.3.11. Also strange that this has only been observed with zfsonlinux. Thanks, Mark Verstuurd vanaf mijn iPhone > Op 14 mrt. 2015 om 20:35 heeft Chris Bennett <ch...@ce...> het volgende geschreven: > > Hi Mark, > > I have just tried to use ZFS on a sdtier block device, observed errors > while performing ZFS scrub and found this thread in recent months with > similar symptoms. I am not seeing a crash (yet) but can reliably > reproduce corruption (see below). With a small amount of testing, I > am not seeing the same corruption with VFS. > > Happy to perform any further testing if required. > > Regards, > > Chris > > > > > [root@i7 /]# btier_setup -f /dev/vg_ssd0/btier02_ssd:/dev/WD4003FZEX-00Z4SA0_WD-WMC5H0D6DW0N/btier02_hdd_wd4tb -c -B | ts > Mar 15 05:55:13 > Mar 15 05:55:13 Device size (raw) : 0x672400000 (27686600704) > Mar 15 05:55:13 Device size (rnd) : 0x672400000 (27686600704) > Mar 15 05:55:13 Clearing bitlist of device : /dev/vg_ssd0/btier02_ssd > Mar 15 05:55:13 offset : 0x672300000 (27685552128) > Mar 15 05:55:13 device size : 0x672400000 (27686600704) > Mar 15 05:55:13 bitlist size : 0x100000 (1048576) > Mar 15 05:55:13 > Mar 15 05:55:13 Device size (raw) : 0x3200000000 (214748364800) > Mar 15 05:55:13 Device size (rnd) : 0x3200000000 (214748364800) > Mar 15 05:55:13 Clearing bitlist of device : /dev/WD4003FZEX-00Z4SA0_WD-WMC5H0D6DW0N/btier02_hdd_wd4tb > Mar 15 05:55:13 offset : 0x31fff00000 (214747316224) > Mar 15 05:55:13 device size : 0x3200000000 (214748364800) > Mar 15 05:55:13 bitlist size : 0x100000 (1048576) > Mar 15 05:55:13 > Mar 15 05:55:13 Total device size : 0x3871600000 (242420285440) > Mar 15 05:55:13 Clearing blocklist of device : /dev/vg_ssd0/btier02_ssd > Mar 15 05:55:13 list size : 0x700000 (7340032) > Mar 15 05:55:13 starting from offset : 0x671c00000 (27678212096) > Mar 15 05:55:13 > Mar 15 05:55:13 write_device_magic device : 0 > Mar 15 05:55:13 size : 0x3871600000 (242420285440) > Mar 15 05:55:13 write_device_magic device : 1 > Mar 15 05:55:13 size : 0x3871600000 (242420285440) > > [root@i7 /]# zpool create p2 /dev/sdtierb -f | ts > [root@i7 /]# zfs create p2/t1 | ts > > [root@i7 t1]# zpool status | ts > Mar 15 05:55:47 pool: p2 > Mar 15 05:55:47 state: ONLINE > Mar 15 05:55:47 scan: none requested > Mar 15 05:55:47 config: > Mar 15 05:55:47 > Mar 15 05:55:47 NAME STATE READ WRITE CKSUM > Mar 15 05:55:47 p2 ONLINE 0 0 0 > Mar 15 05:55:47 sdtierb ONLINE 0 0 0 > Mar 15 05:55:47 > Mar 15 05:55:47 errors: No known data errors > > [root@i7 t1]# cd /p2/t1 > [root@i7 t1]# ls -al | ts > Mar 15 05:56:03 total 3 > Mar 15 05:56:03 drwxr-xr-x 2 root root 2 Mar 15 05:55 . > Mar 15 05:56:03 drwxr-xr-x 3 root root 3 Mar 15 05:55 .. > > [root@i7 t1]# cat /sys/devices/virtual/block/sdtierb/tier/device_usage | ts > Mar 15 05:56:42 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES > Mar 15 05:56:42 0 dm-6 26394 7 0 0 302 692 > Mar 15 05:56:42 1 dm-10 204797 0 0 0 0 0 > Mar 15 05:56:42 > > [root@i7 t1]# dd if=/dev/urandom of=/p2/t1/test.urandom bs=1M count=10 > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.742522 s, 14.1 MB/s > > [root@i7 t1]# cat /sys/devices/virtual/block/sdtierb/tier/device_usage | ts > Mar 15 05:57:37 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES > Mar 15 05:57:37 0 dm-6 26394 17 0 0 304 3451 > Mar 15 05:57:37 1 dm-10 204797 0 0 0 0 0 > Mar 15 05:57:37 > > [root@i7 t1]# zpool status -v | ts > Mar 15 05:59:21 pool: p2 > Mar 15 05:59:21 state: ONLINE > Mar 15 05:59:21 status: One or more devices has experienced an error resulting in data > Mar 15 05:59:21 corruption. Applications may be affected. > Mar 15 05:59:21 action: Restore the file in question if possible. Otherwise restore the > Mar 15 05:59:21 entire pool from backup. > Mar 15 05:59:21 see: http://zfsonlinux.org/msg/ZFS-8000-8A > Mar 15 05:59:21 scan: scrub repaired 3K in 0h0m with 10 errors on Sun Mar 15 05:59:00 2015 > Mar 15 05:59:21 config: > Mar 15 05:59:21 > Mar 15 05:59:21 NAME STATE READ WRITE CKSUM > Mar 15 05:59:21 p2 ONLINE 0 0 10 > Mar 15 05:59:21 sdtierb ONLINE 0 0 25 > Mar 15 05:59:21 > Mar 15 05:59:21 errors: Permanent errors have been detected in the following files: > Mar 15 05:59:21 > Mar 15 05:59:21 /p2/t1/test.urandom > > > [root@i7 t1]# modinfo btier > filename: /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko > author: Mark Ruijter > license: GPL > depends: > vermagic: 3.14.27-100.fc19.x86_64 SMP mod_unload > [root@i7 t1]# strings /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko | less > [root@i7 t1]# strings /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko | grep 1.3 > 1.3.11 > /home/chris/btier-1.3.11/kernel/btier/btier_main.c |
From: Chris B. <ch...@ce...> - 2015-03-14 19:35:23
|
Hi Mark, I have just tried to use ZFS on a sdtier block device, observed errors while performing ZFS scrub and found this thread in recent months with similar symptoms. I am not seeing a crash (yet) but can reliably reproduce corruption (see below). With a small amount of testing, I am not seeing the same corruption with VFS. Happy to perform any further testing if required. Regards, Chris [root@i7 /]# btier_setup -f /dev/vg_ssd0/btier02_ssd:/dev/WD4003FZEX-00Z4SA0_WD-WMC5H0D6DW0N/btier02_hdd_wd4tb -c -B | ts Mar 15 05:55:13 Mar 15 05:55:13 Device size (raw) : 0x672400000 (27686600704) Mar 15 05:55:13 Device size (rnd) : 0x672400000 (27686600704) Mar 15 05:55:13 Clearing bitlist of device : /dev/vg_ssd0/btier02_ssd Mar 15 05:55:13 offset : 0x672300000 (27685552128) Mar 15 05:55:13 device size : 0x672400000 (27686600704) Mar 15 05:55:13 bitlist size : 0x100000 (1048576) Mar 15 05:55:13 Mar 15 05:55:13 Device size (raw) : 0x3200000000 (214748364800) Mar 15 05:55:13 Device size (rnd) : 0x3200000000 (214748364800) Mar 15 05:55:13 Clearing bitlist of device : /dev/WD4003FZEX-00Z4SA0_WD-WMC5H0D6DW0N/btier02_hdd_wd4tb Mar 15 05:55:13 offset : 0x31fff00000 (214747316224) Mar 15 05:55:13 device size : 0x3200000000 (214748364800) Mar 15 05:55:13 bitlist size : 0x100000 (1048576) Mar 15 05:55:13 Mar 15 05:55:13 Total device size : 0x3871600000 (242420285440) Mar 15 05:55:13 Clearing blocklist of device : /dev/vg_ssd0/btier02_ssd Mar 15 05:55:13 list size : 0x700000 (7340032) Mar 15 05:55:13 starting from offset : 0x671c00000 (27678212096) Mar 15 05:55:13 Mar 15 05:55:13 write_device_magic device : 0 Mar 15 05:55:13 size : 0x3871600000 (242420285440) Mar 15 05:55:13 write_device_magic device : 1 Mar 15 05:55:13 size : 0x3871600000 (242420285440) [root@i7 /]# zpool create p2 /dev/sdtierb -f | ts [root@i7 /]# zfs create p2/t1 | ts [root@i7 t1]# zpool status | ts Mar 15 05:55:47 pool: p2 Mar 15 05:55:47 state: ONLINE Mar 15 05:55:47 scan: none requested Mar 15 05:55:47 config: Mar 15 05:55:47 Mar 15 05:55:47 NAME STATE READ WRITE CKSUM Mar 15 05:55:47 p2 ONLINE 0 0 0 Mar 15 05:55:47 sdtierb ONLINE 0 0 0 Mar 15 05:55:47 Mar 15 05:55:47 errors: No known data errors [root@i7 t1]# cd /p2/t1 [root@i7 t1]# ls -al | ts Mar 15 05:56:03 total 3 Mar 15 05:56:03 drwxr-xr-x 2 root root 2 Mar 15 05:55 . Mar 15 05:56:03 drwxr-xr-x 3 root root 3 Mar 15 05:55 .. [root@i7 t1]# cat /sys/devices/virtual/block/sdtierb/tier/device_usage | ts Mar 15 05:56:42 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES Mar 15 05:56:42 0 dm-6 26394 7 0 0 302 692 Mar 15 05:56:42 1 dm-10 204797 0 0 0 0 0 Mar 15 05:56:42 [root@i7 t1]# dd if=/dev/urandom of=/p2/t1/test.urandom bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.742522 s, 14.1 MB/s [root@i7 t1]# cat /sys/devices/virtual/block/sdtierb/tier/device_usage | ts Mar 15 05:57:37 TIER DEVICE SIZE MB ALLOCATED MB AVERAGE READS AVERAGE WRITES TOTAL_READS TOTAL_WRITES Mar 15 05:57:37 0 dm-6 26394 17 0 0 304 3451 Mar 15 05:57:37 1 dm-10 204797 0 0 0 0 0 Mar 15 05:57:37 [root@i7 t1]# zpool status -v | ts Mar 15 05:59:21 pool: p2 Mar 15 05:59:21 state: ONLINE Mar 15 05:59:21 status: One or more devices has experienced an error resulting in data Mar 15 05:59:21 corruption. Applications may be affected. Mar 15 05:59:21 action: Restore the file in question if possible. Otherwise restore the Mar 15 05:59:21 entire pool from backup. Mar 15 05:59:21 see: http://zfsonlinux.org/msg/ZFS-8000-8A Mar 15 05:59:21 scan: scrub repaired 3K in 0h0m with 10 errors on Sun Mar 15 05:59:00 2015 Mar 15 05:59:21 config: Mar 15 05:59:21 Mar 15 05:59:21 NAME STATE READ WRITE CKSUM Mar 15 05:59:21 p2 ONLINE 0 0 10 Mar 15 05:59:21 sdtierb ONLINE 0 0 25 Mar 15 05:59:21 Mar 15 05:59:21 errors: Permanent errors have been detected in the following files: Mar 15 05:59:21 Mar 15 05:59:21 /p2/t1/test.urandom [root@i7 t1]# modinfo btier filename: /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko author: Mark Ruijter license: GPL depends: vermagic: 3.14.27-100.fc19.x86_64 SMP mod_unload [root@i7 t1]# strings /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko | less [root@i7 t1]# strings /lib/modules/3.14.27-100.fc19.x86_64/kernel/drivers/block/btier.ko | grep 1.3 1.3.11 /home/chris/btier-1.3.11/kernel/btier/btier_main.c |