Re: [Gptfdisk-general] strange sgdisk failure on ppc64el
Brought to you by:
srs5694
From: Rod S. <rod...@ro...> - 2014-02-27 02:37:22
|
On 02/26/2014 10:55 AM, Scott Moser wrote: > Hi, > We use gptfdisk (sgdisk) from growpart to grow gpt disks on boot grow the > mounted root filesystem. It may sound crazy, but it works really well. > > We're seeing an issue on ppc64el systems with gpt partition tables. This > reproduces only occasionally. > > I've opened an ubuntu bug at > https://bugs.launchpad.net/ubuntu/+source/gdisk/+bug/1285197 I've subscribed to that bug report. (Incidentally, I don't know if you're aware, but I was hired by Canonical in January. Maintaining gdisk isn't part of my Canonical job description, though, so it remains a hobby project.) > The failure we saw was this (we only have logs), and subsequent run of > identical 'growpart' finished fine: > command: growpart /dev/sda 1 > exit code: 2 > stdout: > FAILED: disk=/dev/sda partition=1: failed to repartition > stderr: > failed [sgdisk_mod:4] sgdisk --move-second-header --delete=1 > --new=1:18432:20971486 > --typecode=1:0FC63DAF-8483-4772-8E79-3D69D8477DE4 > --partition-guid=1:53FFEF70-1623-46CC-AFB7-EBC1EB5340F2 > --change-name=1:Linux filesystem /dev/sda > Could not create partition 1 from 40532396646334464 to 0 > Could not change partition 1's type code to 0FC63DAF-8483-4772-8E79-3D69D8477DE4! > Unable to set partition 1's name to 'Linux filesystem'! > Error encountered; not saving changes. > ***** WARNING: Resize failed, attempting to revert ****** > ***** Appears to have gone OK **** > > Note there the odd '1 from 40532396646334464 to 0'. That caught my eye, too. If my calculator skills are intact, 40,532,396,646,334,464 is 9 x 2^52, while the value you specified as the start point (18432) is 9 x 2^11. This is surely not a coincidence -- it's a bit shift of 41. The sporadic nature of the problem suggests some sort of issue with the machine state. (This isn't to say that I'm ruling out the possibility of an sgdisk bug; but if there is an sgdisk bug, it's probably interacting with something about the machine state that's unique for just some runs.) The subsequent two failures (an inability to set the type code and name) naturally follow from an inability to create the partition. > I'm wondering if you've seen anything like this. No, this is the first I've heard of a problem that looks even remotely like this. > I think its only fair to point out: > a.) this occured on a ppc64el guest running on ppc64 under kvm. > Its a new platform, and its possible that there are issues in kvm or > the underlying virtual hardware. > b.) it doens't occur that often > c.) in ppc64el the 'el' is "little endian". I did do development and testing of gdisk on both an ancient PowerPC iMac and using KVM to emulate PowerPC hardware; however, that was big-endian. I no longer have the iMac, so I can't test on that hardware myself. > All that said, most of the time it works fine. Which of course makes it very hard to debug. > To generally reproduce what is going on here via use of /dev/loop0, we can > do the following. Obviously you're not stressing the whole stack that was > in play (not running a ppc64el guest or kernel). > > imgurl="http://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-ppc64el-gpt1.img" > imgdist="${imgurl##*/}" > wget "$imgurl" -O "$imgdist" > qemu-img convert -O raw "$imgdist" my.img > qemu-img resize my.img 10G > LODEV="/dev/loop0" > sudo losetup $LODEV "$PWD/my.img" > mkdir ./mp > sudo mount ${LODEV}p1 ./mp > sudo growpart "$LODEV" 1 > > sudo umount ./mp > sudo losetup -d "$LODEV" I'll try to take a look at this over the weekend. In the meantime, if you uncover any additional data, feel free to contact me with it. Incidentally, the version of gdisk in trusty is a little behind -- gdisk is now up to 0.8.9; however, there's a new bug in 0.8.9, so you might want to try experimenting with the version in the Sourceforge git repository, which fixes that bug. (I meant to release a new version with a fix last weekend, but didn't get around to it.) AFAIK, the 0.8.9 bug only affects the process of creating hybrid MBRs, so it shouldn't affect your issue. For that matter, the changes from 0.8.8 to 0.8.9 also don't seem like things that would affect your bug, so I doubt if that's really relevant. Still, it's best to test with the latest code, if at all possible.... -- Rod Smith rod...@ro... http://www.rodsbooks.com |