You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Andrea A. <an...@qu...> - 2008-05-08 05:20:22
|
On Wed, May 07, 2008 at 09:14:45PM -0700, Linus Torvalds wrote: > IOW, you didn't even look at it, did you? Actually I looked both at the struct and at the slab alignment just in case it was changed recently. Now after reading your mail I also compiled it just in case. 2.6.26-rc1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> anon_vma 260 576 24 144 1 : tunables 120 60 8 : slabdata 4 4 0 ^^ ^^^ 2.6.26-rc1 + below patch diff --git a/include/linux/rmap.h b/include/linux/rmap.h --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -27,6 +27,7 @@ struct anon_vma { struct anon_vma { spinlock_t lock; /* Serialize access to vma list */ struct list_head head; /* List of private "related" vmas */ + int flag:1; }; #ifdef CONFIG_MMU # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> anon_vma 250 560 32 112 1 : tunables 120 60 8 : slabdata 5 5 0 ^^ ^^^ Not a big deal sure to grow it 33%, it's so small anyway, but I don't see the point in growing it. sort() can be interrupted by signals, and until it can we can cap it to 512 vmas making the worst case taking dozen usecs, I fail to see what you have against sort(). Again: if a vma bitflag + global lock could have avoided sort and run O(N) instead of current O(N*log(N)) I would have done that immediately, infact I was in the process of doing it when you posted the followup. Nothing personal here, just staying technical. Hope you too. |
From: Linus T. <tor...@li...> - 2008-05-08 04:15:05
|
On Thu, 8 May 2008, Andrea Arcangeli wrote: > > But removing sort isn't worth it if it takes away ram from the VM even > when global_mm_lock will never be called. Andrea, you really are a piece of work. Your arguments have been bogus crap that didn't even understand what was going on from the beginning, and now you continue to do that. What exactly "takes away ram" from the VM? The notion of adding a flag to "struct anon_vma"? The one that already has a 4 byte padding thing on x86-64 just after the spinlock? And that on 32-bit x86 (with less than 256 CPU's) would have two bytes of padding if we didn't just make the spinlock type unconditionally 32 bits rather than the 16 bits we actually _use_? IOW, you didn't even look at it, did you? But whatever. I clearly don't want a patch from you anyway, so .. Linus |
From: Andrea A. <an...@qu...> - 2008-05-08 03:41:30
|
On Wed, May 07, 2008 at 08:10:33PM -0700, Christoph Lameter wrote: > On Thu, 8 May 2008, Andrea Arcangeli wrote: > > > to the sort function to break the loop. After that we remove the 512 > > vma cap and mm_lock is free to run as long as it wants like > > /dev/urandom, nobody can care less how long it will run before > > returning as long as it reacts to signals. > > Look Linus has told you what to do. Why not simply do it? When it looked like we could use vm_flags to remove sort, that looked an ok optimization, no problem with optimizations, I'm all for optimizations if they cost nothing to the VM fast paths and VM footprint. But removing sort isn't worth it if it takes away ram from the VM even when global_mm_lock will never be called. sort is like /dev/urandom so after sort is fixed to handle signals (and I expect Matt will help with this) we'll remove the 512 vmas cap. In the meantime we can live with the 512 vmas cap that guarantees sort won't take more than a few dozen usec. Removing sort() is the only thing that the anon vma bitflag can achieve and it's clearly not worth it and it would go in the wrong direction (fixing sort to handle signals is clearly the right direction, if sort is a concern at all). Adding the global lock around global_mm_lock to avoid one global_mm_lock to collide against another global_mm_lock is sure ok with me, if that's still wanted now that it's clear removing sort isn't worth it, I'm neutral on this. Christoph please go ahead and add the bitflag to anon-vma yourself if you want. If something isn't technically right I don't do it no matter who asks it. |
From: Anthony L. <ali...@us...> - 2008-05-08 03:17:54
|
I discovered this while testing virtio save/restore this evening. After sleeping, cpu_single_env can change so we have to make sure to restore it. This applies on top of my IO thread series. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index cc8f292..0990af6 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -246,8 +246,11 @@ static void pause_all_threads(void) vcpu_info[i].stop = 1; pthread_kill(vcpu_info[i].thread, SIG_IPI); } - while (!all_threads_paused()) + while (!all_threads_paused()) { + CPUState *env = cpu_single_env; pthread_cond_wait(&qemu_pause_cond, &qemu_mutex); + cpu_single_env = env; + } } static void resume_all_threads(void) |
From: Christoph L. <cla...@sg...> - 2008-05-08 03:10:28
|
On Thu, 8 May 2008, Andrea Arcangeli wrote: > to the sort function to break the loop. After that we remove the 512 > vma cap and mm_lock is free to run as long as it wants like > /dev/urandom, nobody can care less how long it will run before > returning as long as it reacts to signals. Look Linus has told you what to do. Why not simply do it? |
From: Andrea A. <an...@qu...> - 2008-05-08 02:56:47
|
On Wed, May 07, 2008 at 06:12:32PM -0700, Christoph Lameter wrote: > Andrea's mm_lock could have wider impact. It is the first effective > way that I have seen of temporarily holding off reclaim from an address > space. It sure is a brute force approach. The only improvement I can imagine on mm_lock, is after changing the name to global_mm_lock() to reestablish the signal_pending check in the loop that takes the spinlock and to backoff and put the cap to 512 vmas so the ram wasted on anon-vmas wouldn't save more than 10-100usec at most (plus the vfree that may be a bigger cost but we're ok to pay it and it surely isn't security related). Then on the long term we need to talk to Matt on returning a parameter to the sort function to break the loop. After that we remove the 512 vma cap and mm_lock is free to run as long as it wants like /dev/urandom, nobody can care less how long it will run before returning as long as it reacts to signals. This is the right way if we want to support XPMEM/GRU efficiently and without introducing unnecessary regressions in the VM fastpaths and VM footprint. |
From: Zhang, X. <xia...@in...> - 2008-05-08 02:55:47
|
Avi, Please drop the previous one due to a wrong attachment. Xiantao >From a9f479197f0a0efa45a930309fad03fd690cba60 Mon Sep 17 00:00:00 2001 From: Xiantao Zhang <xia...@in...> Date: Thu, 8 May 2008 10:16:05 +0800 Subject: [PATCH] KVM: Qemu : IA-64 build fix. Remove unexisting header inclusion, and set correct phys_ram_size for ipf machine. Signed-off-by: Xiantao Zhang <xia...@in...> --- qemu/hw/ipf.c | 4 +++- qemu/target-ia64/machine.c | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c index a84e343..54eaca0 100644 --- a/qemu/hw/ipf.c +++ b/qemu/hw/ipf.c @@ -535,7 +535,8 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, for(i = 0; i < MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { - serial_init(serial_io[i], i8259[serial_irq[i]], serial_hds[i]); + serial_init(serial_io[i], i8259[serial_irq[i]], 115200, + serial_hds[i]); } } @@ -669,4 +670,5 @@ QEMUMachine ipf_machine = { "itanium", "Itanium Platform", ipf_init_pci, + VGA_RAM_SIZE + GFW_SIZE, }; diff --git a/qemu/target-ia64/machine.c b/qemu/target-ia64/machine.c index ba06d7b..4dc5d5e 100644 --- a/qemu/target-ia64/machine.c +++ b/qemu/target-ia64/machine.c @@ -1,6 +1,5 @@ #include "hw/hw.h" #include "hw/boards.h" -#include "hw/ipf.h" #include "exec-all.h" #include "qemu-kvm.h" -- 1.5.2 |
From: Zhang, X. <xia...@in...> - 2008-05-08 02:49:46
|
>From a9f479197f0a0efa45a930309fad03fd690cba60 Mon Sep 17 00:00:00 2001 From: Xiantao Zhang <xia...@in...> Date: Thu, 8 May 2008 10:16:05 +0800 Subject: [PATCH] KVM: Qemu : IA-64 build fix. Remove unexisting header inclusion, and set correct phys_ram_size for ipf machine. Signed-off-by: Xiantao Zhang <xia...@in...> --- qemu/hw/ipf.c | 4 +++- qemu/target-ia64/machine.c | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c index a84e343..54eaca0 100644 --- a/qemu/hw/ipf.c +++ b/qemu/hw/ipf.c @@ -535,7 +535,8 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, for(i = 0; i < MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { - serial_init(serial_io[i], i8259[serial_irq[i]], serial_hds[i]); + serial_init(serial_io[i], i8259[serial_irq[i]], 115200, + serial_hds[i]); } } @@ -669,4 +670,5 @@ QEMUMachine ipf_machine = { "itanium", "Itanium Platform", ipf_init_pci, + VGA_RAM_SIZE + VGA_RAM_SIZE, }; diff --git a/qemu/target-ia64/machine.c b/qemu/target-ia64/machine.c index ba06d7b..4dc5d5e 100644 --- a/qemu/target-ia64/machine.c +++ b/qemu/target-ia64/machine.c @@ -1,6 +1,5 @@ #include "hw/hw.h" #include "hw/boards.h" -#include "hw/ipf.h" #include "exec-all.h" #include "qemu-kvm.h" -- 1.5.2 |
From: Linus T. <tor...@li...> - 2008-05-08 02:32:21
|
Andrea, I'm not interested. I've stated my standpoint: the code being discussed is crap. We're not doing that. Not in the core VM. I gave solutions that I think aren't crap, but I already also stated that I have no problems not merging it _ever_ if no solution can be found. The whole issue simply isn't even worth the pain, imnsho. Linus |
From: Yunfeng Z. <yun...@in...> - 2008-05-08 02:30:58
|
Hi, all, This is today's KVM test result against kvm.git 8ae6dc90ac84d9734e343210c8ec709f50cd9d89 and kvm-userspace.git 9803148510003a03531a0f9535604ec41a9a1251. One New Issue: ================================================ 1. Fails to restore guests in real mode https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959950&group_id=180599 Four Old Issues: ================================================ 2. XP crashes while rebooting https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959452&group_id=180599 3. linux virtio drivers won't work after runing save/restore https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959443&group_id=180599 4.Cannot boot guests with hugetlbfs https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1941302&group_id=180599 5. KVM Guest Drivers fail on Windows https://sourceforge.net/tracker/index.php?func=detail&aid=1920897&group_id=180599&atid=893831 We met the issue too while testing Windows drivers on XP. Test environment ================================================ Platform Woodcrest CPU 4 Memory size 8G' Details ================================================ IA32-pae: 1. boot guest with 256M memory PASS 2. boot two windows xp guest PASS 3. boot 4 same guest in parallel PASS 4. boot linux and windows guest in parallel PASS 5. boot guest with 1500M memory PASS 6. boot windows 2003 with ACPI enabled PASS 7. boot Windows xp with ACPI enabled PASS 8. boot Windows 2000 without ACPI PASS 9. kernel build on SMP linux guest PASS 10. LTP on linux guest PASS 11. boot base kernel linux PASS 12. save/restore 32-bit HVM guests PASS 13. live migration 32-bit HVM guests PASS 14. boot SMP Windows xp with ACPI enabled PASS 15. boot SMP Windows 2003 with ACPI enabled PASS 16. boot SMP Windows 2000 with ACPI enabled PASS 17. boot SMP Windows 2008 with ACPI enabled PASS ================================================ IA32e: 1. boot four 32-bit guest in parallel PASS 2. boot four 64-bit guest in parallel PASS 3. boot 4G 64-bit guest PASS 4. boot 4G pae guest PASS 5. boot 32-bit linux and 32 bit windows guest in parallel PASS 6. boot 32-bit guest with 1500M memory PASS 7. boot 64-bit guest with 1500M memory PASS 8. boot 32-bit guest with 256M memory PASS 9. boot 64-bit guest with 256M memory PASS 10. boot two 32-bit windows xp in parallel PASS 11. boot four 32-bit different guest in para PASS 12. save/restore 64-bit linux guests PASS 13. save/restore 32-bit linux guests PASS 14. boot 32-bit SMP windows 2003 with ACPI enabled PASS 15. boot 32-bit SMP windows 2008 with ACPI enabled PASS 16. boot 32-bit SMP Windows 2000 with ACPI enabled PASS 17. boot 32-bit SMP Windows xp with ACPI enabled PASS 18. boot 32-bit Windows 2000 without ACPI PASS 19. boot 64-bit Windows xp with ACPI enabled PASS 20. boot 32-bit Windows xp without ACPI PASS 21. boot 64-bit UP vista PASS 22. boot 64-bit SMP vista PASS 23. kernel build in 32-bit linux guest OS PASS 24. kernel build in 64-bit linux guest OS PASS 25. LTP on 32-bit linux guest OS PASS 26. LTP on 64-bit linux guest OS PASS 27. boot 64-bit guests with ACPI enabled PASS 28. boot 32-bit x-server PASS 29. boot 64-bit SMP windows XP with ACPI enabled PASS 30. boot 64-bit SMP windows 2003 with ACPI enabled PASS 31. boot 64-bit SMP windows 2008 with ACPI enabled PASS 32. live migration 64bit linux guests PASS 33. live migration 32bit linux guests PASS 34. reboot 32bit windows xp guest PASS 35. reboot 32bit windows xp guest PASS Report Summary on IA32-pae Summary Test Report of Last Session ===================================================================== Total Pass Fail NoResult Crash ===================================================================== control_panel 7 7 0 0 0 Restart 2 2 0 0 0 gtest 17 17 0 0 0 ===================================================================== control_panel 7 7 0 0 0 :KVM_LM_PAE_gPAE 1 1 0 0 0 :KVM_four_sguest_PAE_gPA 1 1 0 0 0 :KVM_256M_guest_PAE_gPAE 1 1 0 0 0 :KVM_linux_win_PAE_gPAE 1 1 0 0 0 :KVM_1500M_guest_PAE_gPA 1 1 0 0 0 :KVM_SR_PAE_gPAE 1 1 0 0 0 :KVM_two_winxp_PAE_gPAE 1 1 0 0 0 Restart 2 2 0 0 0 :GuestPAE_PAE_g64 1 1 0 0 0 :BootTo32pae_PAE_g64 1 1 0 0 0 gtest 17 17 0 0 0 :ltp_nightly_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_PAE_gPAE 1 1 0 0 0 :reboot_xp_PAE_gPAE 1 1 0 0 0 :boot_up_vista_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_xp_PAE_gPA 1 1 0 0 0 :boot_up_acpi_win2k3_PAE 1 1 0 0 0 :boot_smp_acpi_win2k3_PA 1 1 0 0 0 :boot_smp_acpi_win2k_PAE 1 1 0 0 0 :boot_up_acpi_win2k_PAE_ 1 1 0 0 0 :boot_smp_acpi_xp_PAE_gP 1 1 0 0 0 :boot_up_noacpi_win2k_PA 1 1 0 0 0 :boot_smp_vista_PAE_gPAE 1 1 0 0 0 :boot_base_kernel_PAE_gP 1 1 0 0 0 :boot_up_win2008_PAE_gPA 1 1 0 0 0 :bootx_PAE_gPAE 1 1 0 0 0 :boot_smp_win2008_PAE_gP 1 1 0 0 0 :kb_nightly_PAE_gPAE 1 1 0 0 0 ===================================================================== Total 26 26 0 0 0 Report Summary on IA32e Summary Test Report of Last Session ===================================================================== Total Pass Fail NoResult Crash ===================================================================== control_panel 15 15 0 0 0 Restart 3 3 0 0 0 gtest 29 28 1 0 0 ===================================================================== control_panel 15 15 0 0 0 :KVM_LM_64_g64 1 1 0 0 0 :KVM_four_sguest_64_gPAE 1 1 0 0 0 :KVM_4G_guest_64_g64 1 1 0 0 0 :KVM_four_sguest_64_g64 1 1 0 0 0 :KVM_linux_win_64_gPAE 1 1 0 0 0 :KVM_1500M_guest_64_gPAE 1 1 0 0 0 :KVM_SR_64_g64 1 1 0 0 0 :KVM_LM_64_gPAE 1 1 0 0 0 :KVM_256M_guest_64_g64 1 1 0 0 0 :KVM_1500M_guest_64_g64 1 1 0 0 0 :KVM_4G_guest_64_gPAE 1 1 0 0 0 :KVM_SR_64_gPAE 1 1 0 0 0 :KVM_256M_guest_64_gPAE 1 1 0 0 0 :KVM_two_winxp_64_gPAE 1 1 0 0 0 :KVM_four_dguest_64_gPAE 1 1 0 0 0 Restart 3 3 0 0 0 :BootTo64_64_g64 1 1 0 0 0 :Guest64_64_gPAE 1 1 0 0 0 :GuestPAE_64_g64 1 1 0 0 0 gtest 29 28 1 0 0 :boot_up_acpi_64_gPAE 1 1 0 0 0 :boot_up_noacpi_xp_64_gP 1 1 0 0 0 :boot_smp_acpi_xp_64_g64 1 1 0 0 0 :boot_base_kernel_64_gPA 1 1 0 0 0 :boot_smp_win2008_64_g64 1 1 0 0 0 :boot_smp_acpi_win2k3_64 1 1 0 0 0 :boot_smp_acpi_win2k_64_ 1 1 0 0 0 :boot_base_kernel_64_g64 1 1 0 0 0 :bootx_64_gPAE 1 1 0 0 0 :kb_nightly_64_gPAE 1 1 0 0 0 :ltp_nightly_64_g64 1 1 0 0 0 :boot_up_acpi_64_g64 1 1 0 0 0 :boot_smp_win2008_64_gPA 1 1 0 0 0 :boot_smp_acpi_xp_64_gPA 1 1 0 0 0 :boot_up_noacpi_win2k_64 1 1 0 0 0 :boot_smp_vista_64_gPAE 1 1 0 0 0 :boot_up_acpi_win2k3_64_ 1 1 0 0 0 :reboot_xp_64_gPAE 1 1 0 0 0 :boot_up_win2008_64_g64 1 1 0 0 0 :bootx_64_g64 1 1 0 0 0 :reboot_xp_64_g64 1 0 1 0 0 :boot_up_vista_64_g64 1 1 0 0 0 :boot_up_win2008_64_gPAE 1 1 0 0 0 :boot_up_acpi_xp_64_g64 1 1 0 0 0 :boot_up_vista_64_gPAE 1 1 0 0 0 :ltp_nightly_64_gPAE 1 1 0 0 0 :boot_smp_acpi_win2k3_64 1 1 0 0 0 :boot_up_noacpi_win2k3_6 1 1 0 0 0 :kb_nightly_64_g64 1 1 0 0 0 ===================================================================== Total 47 46 1 0 0 Best Regards, Yunfeng |
From: Andrea A. <an...@qu...> - 2008-05-08 02:24:20
|
On Wed, May 07, 2008 at 06:57:05PM -0700, Linus Torvalds wrote: > Take five minutes. Take a deep breadth. And *think* about actually reading > what I wrote. > > The bitflag *can* prevent taking the same lock twice. It just needs to be > in the right place. It's not that I didn't read it, but to do it I've to grow every anon_vma by 8 bytes. I thought it was implicit that the conclusion of your email is that it couldn't possibly make sense to grow the size of each anon_vma by 33%, when nobody loaded the kvm or gru or xpmem kernel modules. It surely isn't my preferred solution, while capping the number of vmas to 1024 means sort() will make around 10240 steps, Matt call tell the exact number. The big cost shouldn't be in sort. Even 512 vmas will be more than enough for us infact. Note that I've a cond_resched in the sort compare function and I can re-add the signal_pending check. I had the signal_pending check in the original version that didn't use sort() but was doing an inner loop, I thought signal_pending wasn't necessary after speeding it up with sort(). But I can add it again, so then we'll only fail to abort inside sort() and we'll be able to break the loop while taking all the spinlocks, but with such as small array that can't be an issue and the result will surely run faster than stop_machine with zero ram and cpu overhead for the VM (besides stop_machine can't work or we'd need to disable preemption between invalidate_range_start/end, even removing the xpmem schedule-inside-mmu-notifier requirement). |
From: SourceForge.net <no...@so...> - 2008-05-08 02:10:09
|
Bugs item #1959443, was opened at 2008-05-07 20:06 Message generated for change (Settings changed) made by yunfeng You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959443&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) >Summary: linux virtio drivers won't work after runing save/restore Initial Comment: I tried to save/restore a rhel5u1 guest with virtio drivers. Save the guest is ok. But it failed when run restore. Here is the command I am using: [root@vt-nhm1 ~]# qemu-system-x86_64 -m 512 -smp 1 -net nic,macaddr=00:16:3e:0f:be:84,model=virtio -net tap,script=/etc/kvm/qemu-ifup -hda /root/ia32p_rhel5u1.img -drive file=/media/root/ia32p_suse10u2.img,if=virtio -incoming file://share/xvs/var/sr123 Migration failed rc=201 ---------------------------------------------------------------------- >Comment By: yunfeng (yunfeng) Date: 2008-05-08 10:10 Message: Logged In: YES user_id=1283543 Originator: YES The command i pasted in previous commend is wrong. I retested it with using following command: qemu-system-x86_64 -m 256 -net nic,macaddr=00:16:3e:10:4d:f1,model=virtio -net tap,script=/etc/kvm/qemu-ifup -hda /share/ia32p_rhel5u1.img -drive file=/mnt/sda1/ia32p_rhel4u1.img,if=virtio -incoming file:///tmp/ddd The problem is linux virtio drivers won't work after runing save/restore. The reproduce steps are: 1. booting a linux guest with virtnet and virtblk. 2. enable virtnet and mount virtblk 3. ping another host in the guest, and do some operations on the virtblk partition, like copy a big file to the disk. 4. save the guest 5. restore the guest. What I am now seeing is either virtnet and virtblk will stop working after running restore. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959443&group_id=180599 |
From: Linus T. <tor...@li...> - 2008-05-08 01:57:22
|
On Thu, 8 May 2008, Andrea Arcangeli wrote: > > So because the bitflag can't prevent taking the same lock twice on two > different vmas in the same mm, we still can't remove the sorting Andrea. Take five minutes. Take a deep breadth. And *think* about actually reading what I wrote. The bitflag *can* prevent taking the same lock twice. It just needs to be in the right place. Let me quote it for you: > So the flag wouldn't be one of the VM_xyzzy flags, and would require > adding a new field to "struct anon_vma()" IOW, just make it be in that anon_vma (and the address_space). No sorting required. > I think it's more interesting to put a cap on the number of vmas to > min(1024,max_map_count). The sort time on an 8k array runs in constant > time. Shut up already. It's not constant time just because you can cap the overhead. We're not in a university, and we care about performance, not your made-up big-O notation. Linus |
From: Andrea A. <an...@qu...> - 2008-05-08 01:52:45
|
On Wed, May 07, 2008 at 06:39:48PM -0700, Linus Torvalds wrote: > > > On Wed, 7 May 2008, Christoph Lameter wrote: > > > > > (That said, we're not running out of vm flags yet, and if we were, we > > > could just add another word. We're already wasting that space right now on > > > 64-bit by calling it "unsigned long"). > > > > We sure have enough flags. > > Oh, btw, I was wrong - we wouldn't want to mark the vma's (they are > unique), we need to mark the address spaces/anonvma's. So the flag would > need to be in the "struct anon_vma" (and struct address_space), not in the > vma itself. My bad. So the flag wouldn't be one of the VM_xyzzy flags, and > would require adding a new field to "struct anon_vma()" > > And related to that brain-fart of mine, that obviously also means that > yes, the locking has to be stronger than "mm->mmap_sem" held for writing, > so yeah, it would have be a separate global spinlock (or perhaps a > blocking lock if you have some reason to protect anything else with this So because the bitflag can't prevent taking the same lock twice on two different vmas in the same mm, we still can't remove the sorting, and the global lock won't buy much other than reducing the collisions. I can add that though. I think it's more interesting to put a cap on the number of vmas to min(1024,max_map_count). The sort time on an 8k array runs in constant time. kvm runs with 127 vmas allocated... |
From: SourceForge.net <no...@so...> - 2008-05-08 01:45:27
|
Bugs item #1959950, was opened at 2008-05-08 09:45 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959950&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: Fails to restore guests in real mode. Initial Comment: If save and restore a guest in real mode, for example guest running in grub or memtest86 program, the guest will crash. The error message: [root@vt-dp8 results]# qemu-system-x86_64 -m 256 -net nic,macaddr=00:16:3e:10:4d:f1,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/tmp-img_gtp12_1210169969_1 -cdrom /root/memtest86-3.4a.iso -boot d -incoming file:///tmp/abc unhandled vm exit: 0x80000021 vcpu_id 0 rax 00000000000000e4 rbx 000000000001a8c0 rcx 0000000000000000 rdx 0000000000000020 rsi 000000000001c8dc rdi 0000000000000022 rsp 000000000001b880 rbp 0000000000000001 r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 rip 0000000000006069 rflags 00010002 cs 0010 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) ds 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) es 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) ss 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) fs 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) gs 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) tr 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) gdt 25b4/2f idt 250c/9f cr0 11 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 Aborted ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1959950&group_id=180599 |
From: Linus T. <tor...@li...> - 2008-05-08 01:40:02
|
On Wed, 7 May 2008, Christoph Lameter wrote: > > > (That said, we're not running out of vm flags yet, and if we were, we > > could just add another word. We're already wasting that space right now on > > 64-bit by calling it "unsigned long"). > > We sure have enough flags. Oh, btw, I was wrong - we wouldn't want to mark the vma's (they are unique), we need to mark the address spaces/anonvma's. So the flag would need to be in the "struct anon_vma" (and struct address_space), not in the vma itself. My bad. So the flag wouldn't be one of the VM_xyzzy flags, and would require adding a new field to "struct anon_vma()" And related to that brain-fart of mine, that obviously also means that yes, the locking has to be stronger than "mm->mmap_sem" held for writing, so yeah, it would have be a separate global spinlock (or perhaps a blocking lock if you have some reason to protect anything else with this too). Linus |
From: Andrea A. <an...@qu...> - 2008-05-08 01:34:54
|
Sorry for not having completely answered to this. I initially thought stop_machine could work when you mentioned it, but I don't think it can even removing xpmem block-inside-mmu-notifier-method requirements. For stop_machine to solve this (besides being slower and potentially not more safe as running stop_machine in a loop isn't nice), we'd need to prevent preemption in between invalidate_range_start/end. I think there are two ways: 1) add global lock around mm_lock to remove the sorting 2) remove invalidate_range_start/end, nuke mm_lock as consequence of it, and replace all three with invalidate_pages issued inside the PT lock, one invalidation for each 512 pte_t modified, so serialization against get_user_pages becomes trivial but this will be not ok at all for SGI as it increases a lot their invalidation frequency For KVM both ways are almost the same. I'll implement 1 now then we'll see... |
From: Linus T. <tor...@li...> - 2008-05-08 01:32:56
|
On Wed, 7 May 2008, Christoph Lameter wrote: > On Wed, 7 May 2008, Linus Torvalds wrote: > > and you're now done. You have your "mm_lock()" (which still needs to be > > renamed - it should be a "mmu_notifier_lock()" or something like that), > > but you don't need the insane sorting. At most you apparently need a way > > to recognize duplicates (so that you don't deadlock on yourself), which > > looks like a simple bit-per-vma. > > Andrea's mm_lock could have wider impact. It is the first effective > way that I have seen of temporarily holding off reclaim from an address > space. It sure is a brute force approach. Well, I don't think the naming necessarily has to be about notifiers, but it should be at least a *bit* more scary than "mm_lock()", to make it clear that it's pretty dang expensive. Even without the vmalloc and sorting, if it would be used by "normal" things it would still be very expensive for some cases - running thngs like ElectricFence, for example, will easily generate thousands and thousands of vma's in a process. Linus |
From: Andrea A. <an...@qu...> - 2008-05-08 01:26:52
|
On Wed, May 07, 2008 at 06:02:49PM -0700, Linus Torvalds wrote: > You replace mm_lock() with the sequence that Andrew gave you (and I > described): > > spin_lock(&global_lock) > .. get all locks UNORDERED .. > spin_unlock(&global_lock) > > and you're now done. You have your "mm_lock()" (which still needs to be > renamed - it should be a "mmu_notifier_lock()" or something like that), > but you don't need the insane sorting. At most you apparently need a way > to recognize duplicates (so that you don't deadlock on yourself), which > looks like a simple bit-per-vma. > > The global lock doesn't protect any data structures itself - it just > protects two of these mm_lock() functions from ABBA'ing on each other! I thought the thing to remove was the "get all locks". I didn't realize the major problem was only the sorting of the array. I'll add the global lock, it's worth it as it drops the worst case number of steps by log(65536) times. Furthermore surely two concurrent mm_notifier_lock will run faster as it'll decrease the cacheline collisions. Since you ask to call it mmu_notifier_lock I'll also move it to mmu_notifier.[ch] as consequence. |
From: Christoph L. <cla...@sg...> - 2008-05-08 01:12:32
|
On Wed, 7 May 2008, Linus Torvalds wrote: > and you're now done. You have your "mm_lock()" (which still needs to be > renamed - it should be a "mmu_notifier_lock()" or something like that), > but you don't need the insane sorting. At most you apparently need a way > to recognize duplicates (so that you don't deadlock on yourself), which > looks like a simple bit-per-vma. Andrea's mm_lock could have wider impact. It is the first effective way that I have seen of temporarily holding off reclaim from an address space. It sure is a brute force approach. |
From: Linus T. <tor...@li...> - 2008-05-08 01:07:51
|
On Wed, 7 May 2008, Christoph Lameter wrote: > > Set the vma flag when we locked it and then skip when we find it locked > right? This would be in addition to the global lock? Yes. And clear it before unlocking (and again, testing if it's already clear - you mustn't unlock twice, so you must only unlock when the bit was set). You also (obviously) need to have somethign that guarantees that the lists themselves are stable over the whole sequence, but I assume you already have mmap_sem for reading (since you'd need it anyway just to follow the list). And if you have it for writing, it can obviously *act* as the global lock, since it would already guarantee mutual exclusion on that mm->mmap list. Linus |
From: Linus T. <tor...@li...> - 2008-05-08 01:03:33
|
On Thu, 8 May 2008, Andrea Arcangeli wrote: > Hi Andrew, > > On Wed, May 07, 2008 at 03:59:14PM -0700, Andrew Morton wrote: > > CPU0: CPU1: > > > > spin_lock(global_lock) > > spin_lock(a->lock); spin_lock(b->lock); > ================== mmu_notifier_register() If mmy_notifier_register() takes the global lock, it cannot happen here. It will be blocked (by CPU0), so there's no way it can then cause an ABBA deadlock. It will be released when CPU0 has taken *all* the locks it needed to take. > What we can do is to replace the mm_lock with a > spin_lock(&global_lock) only if all places that takes i_mmap_lock NO! You replace mm_lock() with the sequence that Andrew gave you (and I described): spin_lock(&global_lock) .. get all locks UNORDERED .. spin_unlock(&global_lock) and you're now done. You have your "mm_lock()" (which still needs to be renamed - it should be a "mmu_notifier_lock()" or something like that), but you don't need the insane sorting. At most you apparently need a way to recognize duplicates (so that you don't deadlock on yourself), which looks like a simple bit-per-vma. The global lock doesn't protect any data structures itself - it just protects two of these mm_lock() functions from ABBA'ing on each other! Linus |
From: Linus T. <tor...@li...> - 2008-05-08 00:56:21
|
On Wed, 7 May 2008, Robin Holt wrote: > > In order to invalidate the remote page table entries, we need to message > (uses XPC) to the remote side. The remote side needs to acquire the > importing process's mmap_sem and call zap_page_range(). Between the > messaging and the acquiring a sleeping lock, I would argue this will > require sleeping locks in the path prior to the mmu_notifier invalidate_* > callouts(). You simply will *have* to do it without locally holding all the MM spinlocks. Because quite frankly, slowing down all the normal VM stuff for some really esoteric hardware simply isn't acceptable. We just don't do it. So what is it that actually triggers one of these events? The most obvious solution is to just queue the affected pages while holding the spinlocks (perhaps locking them locally), and then handling all the stuff that can block after releasing things. That's how we normally do these things, and it works beautifully, without making everything slower. Sometimes we go to extremes, and actually break the locks are restart (ugh), and it gets ugly, but even that tends to be preferable to using the wrong locking. The thing is, spinlocks really kick ass. Yes, they restrict what you can do within them, but if 99.99% of all work is non-blocking, then the really odd rare blocking case is the one that needs to accomodate, not the rest. Linus |
From: Christoph L. <cla...@sg...> - 2008-05-08 00:56:11
|
On Wed, 7 May 2008, Linus Torvalds wrote: > On Wed, 7 May 2008, Christoph Lameter wrote: > > > > Multiple vmas may share the same mapping or refer to the same anonymous > > vma. The above code will deadlock since we may take some locks multiple > > times. > > Ok, so that actually _is_ a problem. It would be easy enough to also add > just a flag to the vma (VM_MULTILOCKED), which is still cleaner than doing > a vmalloc and a whole sort thing, but if this is really rare, maybe Ben's > suggestion of just using stop-machine is actually the right one just > because it's _so_ simple. Set the vma flag when we locked it and then skip when we find it locked right? This would be in addition to the global lock? stop-machine would work for KVM since its a once in a Guest OS time of thing. But GRU, KVM and eventually Infiniband need the ability to attach in a reasonable timeframe without causing major hiccups for other processes. > (That said, we're not running out of vm flags yet, and if we were, we > could just add another word. We're already wasting that space right now on > 64-bit by calling it "unsigned long"). We sure have enough flags. |
From: Robin H. <ho...@sg...> - 2008-05-08 00:52:51
|
On Wed, May 07, 2008 at 05:03:30PM -0700, Linus Torvalds wrote: > > > On Wed, 7 May 2008, Christoph Lameter wrote: > > > > Multiple vmas may share the same mapping or refer to the same anonymous > > vma. The above code will deadlock since we may take some locks multiple > > times. > > Ok, so that actually _is_ a problem. It would be easy enough to also add > just a flag to the vma (VM_MULTILOCKED), which is still cleaner than doing > a vmalloc and a whole sort thing, but if this is really rare, maybe Ben's > suggestion of just using stop-machine is actually the right one just > because it's _so_ simple. Also, stop-machine will not work if we come to the conclusion that i_mmap_lock and anon_vma->lock need to be sleepable locks. Thanks, Robin Holt |