From: Carlo M. A. B. <ca...@sa...> - 2008-01-04 07:02:19
|
revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 that redefined sector as an array of pointers to char, instead of a statically allocated buffer of chars, that was triggering the following warnings : block.c: In function `bdrv_commit': block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible pointer type block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible pointer type Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> --- qemu/block.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/block.c b/qemu/block.c index 519be24..492a4d3 100644 --- a/qemu/block.c +++ b/qemu/block.c @@ -460,7 +460,7 @@ int bdrv_commit(BlockDriverState *bs) BlockDriver *drv = bs->drv; int64_t i, total_sectors; int n, j; - unsigned char *sector[512]; + unsigned char sector[512]; if (!drv) return -ENOMEDIUM; -- 1.5.3.7 |
From: Avi K. <av...@qu...> - 2008-01-07 09:27:40
|
Carlo Marcelo Arenas Belon wrote: > revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 that > redefined sector as an array of pointers to char, instead of a statically > allocated buffer of chars, that was triggering the following warnings : > > block.c: In function `bdrv_commit': > block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible pointer type > block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible pointer type > > Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> > Doesn't the cache=off option warrant an allocation here to ensure alignment (or perhaps a 1K stack buffer with runtime adjustment)? -- error compiling committee.c: too many arguments to function |
From: Laurent V. <Lau...@bu...> - 2008-01-07 10:22:46
|
Le lundi 07 janvier 2008 =C3=A0 11:27 +0200, Avi Kivity a =C3=A9crit : > Carlo Marcelo Arenas Belon wrote: > > revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 th= at > > redefined sector as an array of pointers to char, instead of a statical= ly > > allocated buffer of chars, that was triggering the following warnings : > > > > block.c: In function `bdrv_commit': > > block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible po= inter type > > block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible p= ointer type > > > > Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> > > =20 >=20 > Doesn't the cache=3Doff option warrant an allocation here to ensure=20 > alignment (or perhaps a 1K stack buffer with runtime adjustment)? You're right, a good patch should be something like this (it is not tested or even compiled) : Signed-off-by: Laurent Vivier <Lau...@bu...> diff --git a/qemu/block.c b/qemu/block.c index 519be24..c198659 100644 --- a/qemu/block.c +++ b/qemu/block.c @@ -460,7 +460,7 @@ int bdrv_commit(BlockDriverState *bs) BlockDriver *drv =3D bs->drv; int64_t i, total_sectors; int n, j; - unsigned char *sector[512]; + unsigned char *sector; =20 if (!drv) return -ENOMEDIUM; @@ -473,15 +473,21 @@ int bdrv_commit(BlockDriverState *bs) return -ENOTSUP; } =20 + sector =3D qemu_memalign(512,512); + if (sector =3D=3D NULL) + return -ENOMEM; + total_sectors =3D bdrv_getlength(bs) >> SECTOR_BITS; for (i =3D 0; i < total_sectors;) { if (drv->bdrv_is_allocated(bs, i, 65536, &n)) { for(j =3D 0; j < n; j++) { if (bdrv_read(bs, i, sector, 1) !=3D 0) { + qemu_free(sector); return -EIO; } =20 if (bdrv_write(bs->backing_hd, i, sector, 1) !=3D 0) { + qemu_free(sector); return -EIO; } i++; @@ -491,6 +497,7 @@ int bdrv_commit(BlockDriverState *bs) } } =20 + qemu_free(sector); if (drv->bdrv_make_empty) return drv->bdrv_make_empty(bs); --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Avi K. <av...@qu...> - 2008-01-07 10:47:37
|
Laurent Vivier wrote: > Le lundi 07 janvier 2008 à 11:27 +0200, Avi Kivity a écrit : > >> Carlo Marcelo Arenas Belon wrote: >> >>> revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 that >>> redefined sector as an array of pointers to char, instead of a statically >>> allocated buffer of chars, that was triggering the following warnings : >>> >>> block.c: In function `bdrv_commit': >>> block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible pointer type >>> block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible pointer type >>> >>> Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> >>> >>> >> Doesn't the cache=off option warrant an allocation here to ensure >> alignment (or perhaps a 1K stack buffer with runtime adjustment)? >> > > You're right, a good patch should be something like this (it is not > tested or even compiled) : > Looks good, but patch is corrupted by mail client. -- error compiling committee.c: too many arguments to function |
From: Laurent V. <Lau...@bu...> - 2008-01-07 12:08:20
|
Le lundi 07 janvier 2008 =C3=A0 12:47 +0200, Avi Kivity a =C3=A9crit : > Laurent Vivier wrote: > > Le lundi 07 janvier 2008 =C3=A0 11:27 +0200, Avi Kivity a =C3=A9crit : > > =20 > >> Carlo Marcelo Arenas Belon wrote: > >> =20 > >>> revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 = that > >>> redefined sector as an array of pointers to char, instead of a static= ally > >>> allocated buffer of chars, that was triggering the following warnings= : > >>> > >>> block.c: In function `bdrv_commit': > >>> block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible = pointer type > >>> block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible= pointer type > >>> > >>> Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> > >>> =20 > >>> =20 > >> Doesn't the cache=3Doff option warrant an allocation here to ensure=20 > >> alignment (or perhaps a 1K stack buffer with runtime adjustment)? > >> =20 > > > > You're right, a good patch should be something like this (it is not > > tested or even compiled) : > > =20 >=20 > Looks good, but patch is corrupted by mail client. OK, I'll resend it later (I'd like to compile it at least). Laurent --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Laurent V. <Lau...@bu...> - 2008-01-07 15:46:33
|
Le lundi 07 janvier 2008 =C3=A0 10:34 -0500, Javier Guerra a =C3=A9crit : > On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > > What I'm wondering now is: is it really useful to have "cache=3Doff" an= d > > "snapshot=3Don" at the same time ? >=20 > does "cache=3Doff" means disk cache? if so, it might be useful to test > clustering filesystems. "cache=3Doff" means files is opened with "O_DIRECT" and thus there is no cache in the kernel memory on the host side. IMO, "cache=3Doff" and "snapshot=3Don" are incompatible because a snapshot can be seen like a cache. > so far, the only way is to setup a network block device (iSCSI, AoE, > nbd). i'd like to simply specify the same backing file for two > instances' hdb parameter. I'm sorry but I don't understand this part. > and snapshots help a lot to go back after blowing up the on-disk structur= es But I think if you use a snapshot there is no reason to use "cache=3Doff" Laurent --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Laurent V. <Lau...@bu...> - 2008-01-07 16:29:51
|
Le lundi 07 janvier 2008 =C3=A0 11:03 -0500, Javier Guerra a =C3=A9crit : > On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > > "cache=3Doff" means files is opened with "O_DIRECT" and thus there is n= o > > cache in the kernel memory on the host side. > > IMO, "cache=3Doff" and "snapshot=3Don" are incompatible because a snaps= hot > > can be seen like a cache. > > > > > so far, the only way is to setup a network block device (iSCSI, AoE, > > > nbd). i'd like to simply specify the same backing file for two > > > instances' hdb parameter. > > > > I'm sorry but I don't understand this part. >=20 > to test a cluster filesystem, i need two (virtual) machines with some > shared storage. i tried long ago something like this (with (k)qemu): >=20 > - create a disk image, call it hda-1.img > - boot and install linux on it, shutdown > - copy to hda-2.img > - boot it (with new MAC) and change IP, hostname, little things, shutdown > - boot both with the same bridge, check that network works between them > - create a new disk image, call id hdb-shr.img > - boot both VMs, sharing hdb-shr.img: >=20 > qemu -hda=3Dhda-1.img -hdb-shr.img > qemu -hda=3Dhda-2.img -hdb-shr.img >=20 > - try to setup a cluster filesystem on hdb >=20 > it almost worked... but some writes didn't propagate to the other > until some extra writes to hdb; so i guessed that each qemu instance > had some caching on file I/O >=20 > hopefully, it would now work with "-cache=3Doff", don't you think? Well, I don't think the problem is at the host level but at the guest level, because both instances of qemu share the host cache and thus first instance should see changes made by the second instance (and vice-versa). There are also some caches at qemu level to emulate DMA, for instance in hw/ide.c it is MAX_MULT_SECTORS (16) which is 8 kB buffer, perhaps your problem is here but "cache=3Doff" doesn't remove this. Did you try to change MAX_MULT_SECTORS to 1 ? What do you call a "cluster filesystem" ? =20 > > > and snapshots help a lot to go back after blowing up the on-disk stru= ctures > > > > But I think if you use a snapshot there is no reason to use "cache=3Dof= f" >=20 > in the above case, if both KVM instances share the snapshot without > cacheing it, the cluster should still work, and have some rollback > capability at the same time >=20 > or is it too much wishful thinking? Unfortunately I guess... Laurent --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Javier G. <ja...@gu...> - 2008-01-07 16:42:45
|
T24gMS83LzA4LCBMYXVyZW50IFZpdmllciA8TGF1cmVudC5WaXZpZXJAYnVsbC5uZXQ+IHdyb3Rl Ogo+IExlIGx1bmRpIDA3IGphbnZpZXIgMjAwOCDDoCAxMTowMyAtMDUwMCwgSmF2aWVyIEd1ZXJy YSBhIMOpY3JpdCA6Cj4gPiBob3BlZnVsbHksIGl0IHdvdWxkIG5vdyB3b3JrIHdpdGggIi1jYWNo ZT1vZmYiLCBkb24ndCB5b3UgdGhpbms/Cj4KPiBXZWxsLCBJIGRvbid0IHRoaW5rIHRoZSBwcm9i bGVtIGlzIGF0IHRoZSBob3N0IGxldmVsIGJ1dCBhdCB0aGUgZ3Vlc3QKPiBsZXZlbCwgYmVjYXVz ZSBib3RoIGluc3RhbmNlcyBvZiBxZW11IHNoYXJlIHRoZSBob3N0IGNhY2hlIGFuZCB0aHVzCj4g Zmlyc3QgaW5zdGFuY2Ugc2hvdWxkIHNlZSBjaGFuZ2VzIG1hZGUgYnkgdGhlIHNlY29uZCBpbnN0 YW5jZSAoYW5kCj4gdmljZS12ZXJzYSkuCgp0aGF0J3Mgd2hhdCBhIGNsdXN0ZXIgZmlsZXN5c3Rl bSBpcyBkZXNpZ25lZCB0byBjb3BlIHdpdGggKGFuZCBpbiBmYWN0IGV4cGVjdHMpCgo+IFRoZXJl IGFyZSBhbHNvIHNvbWUgY2FjaGVzIGF0IHFlbXUgbGV2ZWwgdG8gZW11bGF0ZSBETUEsIGZvciBp bnN0YW5jZSBpbgo+IGh3L2lkZS5jIGl0IGlzIE1BWF9NVUxUX1NFQ1RPUlMgKDE2KSB3aGljaCBp cyA4IGtCIGJ1ZmZlciwgcGVyaGFwcyB5b3VyCj4gcHJvYmxlbSBpcyBoZXJlIGJ1dCAiY2FjaGU9 b2ZmIiBkb2Vzbid0IHJlbW92ZSB0aGlzLgo+IERpZCB5b3UgdHJ5IHRvIGNoYW5nZSBNQVhfTVVM VF9TRUNUT1JTIHRvIDEgPwoKbm9wZSwgZG9uJ3Qga25vdyBlbm91Z2ggb2YgcWVtdSBpbnRlcm5h bHMuLi4KCmJ1dCBpZiB0aG9zZSBjYWNoZXMgY2FuIGJlIGZsdXNoZWQgZnJvbSB0aGUgZ3Vlc3Qs IHRoZSBmaWxlc3lzdGVtCnNob3VsZCBkbyB0aGF0IHdoZW4gd3JpdGluZyBpdHMgbWV0YWRhdGEK Cj4gV2hhdCBkbyB5b3UgY2FsbCBhICJjbHVzdGVyIGZpbGVzeXN0ZW0iID8KCkdGUywgT0NGUzIs IGV0Yy4gICB0aGF0IGlzLCBmaWxlc3lzdGVtcyB0aGF0IGFyZSBkZXNpZ25lZCB0byBydW4gb24K bW9yZSB0aGFuIG9uZSBob3N0IHdpdGggc2hhcmVkIGJsb2NrIHN0b3JhZ2UuICB1c3VhbGx5IHRo YXQgbWVhbnMKRmlicmVDaGFubmVsIG9yIGlTQ1NJLCBidXQgb24gVk1zLCBhIGNvbW1vbiBiYWNr aW5nIGNvdWxkIHdvcmsgdG9vIChhdApsZWFzdCBpbiB0aGVvcnkpLgoKWGVuIG1hbmFnZXMgaXQs IGF0IGxlYXN0IG9uIHBhcmF2aXJ0dWFsaXplZCBndWVzdHM7IGhhZG4ndCB0cmllZCBvbgpIVk0g Z3Vlc3RzLiBtYXliZSB0aGUgSURFLWxpa2UgZW11bGF0aW9uIGlzIHRvbyBwb29yIGFuIGludGVy ZmFjZSB0bwpoYW5kbGUgaXQuCgotLSAKSmF2aWVyCg== |
From: Laurent V. <Lau...@bu...> - 2008-01-07 17:05:35
|
Well, in fact, I think you can't use snapshot with cluster filesystem: as each qemu instance will write in its own snapshot and will not see modifications made by other, and I don't think there is currently a way to share snapshot between qemu instances. Laurent Le lundi 07 janvier 2008 =C3=A0 11:42 -0500, Javier Guerra a =C3=A9crit : > On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > > Le lundi 07 janvier 2008 =C3=A0 11:03 -0500, Javier Guerra a =C3=A9crit= : > > > hopefully, it would now work with "-cache=3Doff", don't you think? > > > > Well, I don't think the problem is at the host level but at the guest > > level, because both instances of qemu share the host cache and thus > > first instance should see changes made by the second instance (and > > vice-versa). >=20 > that's what a cluster filesystem is designed to cope with (and in fact ex= pects) >=20 > > There are also some caches at qemu level to emulate DMA, for instance i= n > > hw/ide.c it is MAX_MULT_SECTORS (16) which is 8 kB buffer, perhaps your > > problem is here but "cache=3Doff" doesn't remove this. > > Did you try to change MAX_MULT_SECTORS to 1 ? >=20 > nope, don't know enough of qemu internals... >=20 > but if those caches can be flushed from the guest, the filesystem > should do that when writing its metadata >=20 > > What do you call a "cluster filesystem" ? >=20 > GFS, OCFS2, etc. that is, filesystems that are designed to run on > more than one host with shared block storage. usually that means > FibreChannel or iSCSI, but on VMs, a common backing could work too (at > least in theory). >=20 > Xen manages it, at least on paravirtualized guests; hadn't tried on > HVM guests. maybe the IDE-like emulation is too poor an interface to > handle it. >=20 --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Laurent V. <Lau...@bu...> - 2008-01-07 15:16:19
|
Le lundi 07 janvier 2008 =C3=A0 12:47 +0200, Avi Kivity a =C3=A9crit : > Laurent Vivier wrote: > > Le lundi 07 janvier 2008 =C3=A0 11:27 +0200, Avi Kivity a =C3=A9crit : > > =20 > >> Carlo Marcelo Arenas Belon wrote: > >> =20 > >>> revert a merge conflict from 075da586c92f09bd9a7401f1e80d72fde27c173 = that > >>> redefined sector as an array of pointers to char, instead of a static= ally > >>> allocated buffer of chars, that was triggering the following warnings= : > >>> > >>> block.c: In function `bdrv_commit': > >>> block.c:480: warning: passing arg 3 of `bdrv_read' from incompatible = pointer type > >>> block.c:484: warning: passing arg 3 of `bdrv_write' from incompatible= pointer type > >>> > >>> Signed-off-by: Carlo Marcelo Arenas Belon <ca...@sa...> > >>> =20 > >>> =20 > >> Doesn't the cache=3Doff option warrant an allocation here to ensure=20 > >> alignment (or perhaps a 1K stack buffer with runtime adjustment)? > >> =20 > > > > You're right, a good patch should be something like this (it is not > > tested or even compiled) : > > =20 >=20 > Looks good, but patch is corrupted by mail client. What I'm wondering now is: is it really useful to have "cache=3Doff" and "snapshot=3Don" at the same time ? If not, the patch of Carlo is good, otherwise there is more modifications to do (in other parts of qemu). Regards, Laurent --=20 ----------------- Lau...@bu... ------------------ "La perfection est atteinte non quand il ne reste rien =C3=A0 ajouter mais quand il ne reste rien =C3=A0 enlever." Saint Exup=C3=A9ry |
From: Avi K. <av...@qu...> - 2008-01-07 18:27:32
|
Laurent Vivier wrote: > What I'm wondering now is: is it really useful to have "cache=off" and > "snapshot=on" at the same time ? > No idea, but wouldn't want to rule it out. > If not, the patch of Carlo is good, otherwise there is more > modifications to do (in other parts of qemu). > If other modifications are needed, perhaps do them in the I/O submission path (if unaligned and cache=off, force alignment). Since most of the data will be aligned by the guest, the overhead will only be incurred by the rare metadata accesses. -- error compiling committee.c: too many arguments to function |
From: Javier G. <ja...@gu...> - 2008-01-07 15:35:45
|
On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > What I'm wondering now is: is it really useful to have "cache=off" and > "snapshot=on" at the same time ? does "cache=off" means disk cache? if so, it might be useful to test clustering filesystems. so far, the only way is to setup a network block device (iSCSI, AoE, nbd). i'd like to simply specify the same backing file for two instances' hdb parameter. and snapshots help a lot to go back after blowing up the on-disk structures -- Javier |
From: Avi K. <av...@qu...> - 2008-01-07 18:29:17
|
Javier Guerra wrote: > On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > >> What I'm wondering now is: is it really useful to have "cache=off" and >> "snapshot=on" at the same time ? >> > > does "cache=off" means disk cache? if so, it might be useful to test > clustering filesystems. > > so far, the only way is to setup a network block device (iSCSI, AoE, > nbd). i'd like to simply specify the same backing file for two > instances' hdb parameter. > For that to work you need to use the raw format (and then, cache=off doesn't matter, as a clustering filesystem ought to have a coherent cache). -- error compiling committee.c: too many arguments to function |
From: Javier G. <ja...@gu...> - 2008-01-07 16:03:26
|
On 1/7/08, Laurent Vivier <Lau...@bu...> wrote: > "cache=off" means files is opened with "O_DIRECT" and thus there is no > cache in the kernel memory on the host side. > IMO, "cache=off" and "snapshot=on" are incompatible because a snapshot > can be seen like a cache. > > > so far, the only way is to setup a network block device (iSCSI, AoE, > > nbd). i'd like to simply specify the same backing file for two > > instances' hdb parameter. > > I'm sorry but I don't understand this part. to test a cluster filesystem, i need two (virtual) machines with some shared storage. i tried long ago something like this (with (k)qemu): - create a disk image, call it hda-1.img - boot and install linux on it, shutdown - copy to hda-2.img - boot it (with new MAC) and change IP, hostname, little things, shutdown - boot both with the same bridge, check that network works between them - create a new disk image, call id hdb-shr.img - boot both VMs, sharing hdb-shr.img: qemu -hda=hda-1.img -hdb-shr.img qemu -hda=hda-2.img -hdb-shr.img - try to setup a cluster filesystem on hdb it almost worked... but some writes didn't propagate to the other until some extra writes to hdb; so i guessed that each qemu instance had some caching on file I/O hopefully, it would now work with "-cache=off", don't you think? > > and snapshots help a lot to go back after blowing up the on-disk structures > > But I think if you use a snapshot there is no reason to use "cache=off" in the above case, if both KVM instances share the snapshot without cacheing it, the cluster should still work, and have some rollback capability at the same time or is it too much wishful thinking? -- Javier |
From: Avi K. <av...@qu...> - 2008-01-07 18:32:07
|
Javier Guerra wrote: > to test a cluster filesystem, i need two (virtual) machines with some > shared storage. i tried long ago something like this (with (k)qemu): > > - create a disk image, call it hda-1.img > - boot and install linux on it, shutdown > - copy to hda-2.img > - boot it (with new MAC) and change IP, hostname, little things, shutdown > - boot both with the same bridge, check that network works between them > - create a new disk image, call id hdb-shr.img > - boot both VMs, sharing hdb-shr.img: > > qemu -hda=hda-1.img -hdb-shr.img > qemu -hda=hda-2.img -hdb-shr.img > > - try to setup a cluster filesystem on hdb > > it almost worked... but some writes didn't propagate to the other > until some extra writes to hdb; so i guessed that each qemu instance > had some caching on file I/O > > hopefully, it would now work with "-cache=off", don't you think? > > cache=off is not necessary. However you need raw format images (not qcow/vmdk/etc). -- error compiling committee.c: too many arguments to function |