kvm-devel Mailing List for kernel virtual machine (Page 53)

Brought to you by: avik, mtosatti

kvm-devel — kernel virtual machine development

You can subscribe to this list here.

2006	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (33)	_Nov (325)	_Dec (320)
2007	_Jan (484)	_Feb (438)	_Mar (407)	_Apr (713)	_May (831)	_Jun (806)	_Jul (1023)	_Aug (1184)	_Sep (1118)	_Oct (1461)	_Nov (1224)	_Dec (1042)
2008	_Jan (1449)	_Feb (1110)	_Mar (1428)	_Apr (1643)	_May (682)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 51 52 53 54 55 .. 703 > >> (Page 53 of 703)

[kvm-devel] [PATCH 07/31] KVM: s390: Stub out kvmtrace

From: Avi K. <av...@qu...> - 2008-04-21 10:30:09

Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/s390/kvm/Kconfig |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 2489b34..1761b74 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -36,6 +36,9 @@ config KVM
 
 	  If unsure, say N.
 
+config KVM_TRACE
+       bool
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/virtio/Kconfig
-- 
1.5.5

[kvm-devel] [PATCH 11/31] KVM: hlt emulation should take in-kernel APIC/PIT timers into account

From: Avi K. <av...@qu...> - 2008-04-21 10:30:09

From: Marcelo Tosatti <mto...@re...>

Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.

Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:

CPU0                                        CPU1
                                            if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
    schedule()
                                            atomic_inc(pit_timer->pending)

Signed-off-by: Marcelo Tosatti <mto...@re...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/ia64/kvm/kvm-ia64.c  |    5 +++++
 arch/s390/kvm/interrupt.c |    5 +++++
 arch/x86/kvm/i8254.c      |   11 +++++++++++
 arch/x86/kvm/irq.c        |   15 +++++++++++++++
 arch/x86/kvm/irq.h        |    3 +++
 arch/x86/kvm/lapic.c      |   10 ++++++++++
 include/linux/kvm_host.h  |    1 +
 virt/kvm/kvm_main.c       |    1 +
 8 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 9c56b64..ca1cfb1 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1778,6 +1778,11 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index f62588c..fcd1ed8 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -325,6 +325,11 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
 	return rc;
 }
 
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 {
 	u64 now, sltime;
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 06a241a..abb4b16 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -199,6 +199,7 @@ int __pit_timer_fn(struct kvm_kpit_state *ps)
 	struct kvm_kpit_timer *pt = &ps->pit_timer;
 
 	atomic_inc(&pt->pending);
+	smp_mb__after_atomic_inc();
 	if (vcpu0 && waitqueue_active(&vcpu0->wq)) {
 		vcpu0->arch.mp_state = VCPU_MP_STATE_RUNNABLE;
 		wake_up_interruptible(&vcpu0->wq);
@@ -210,6 +211,16 @@ int __pit_timer_fn(struct kvm_kpit_state *ps)
 	return (pt->period == 0 ? 0 : 1);
 }
 
+int pit_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
+
+	if (pit && vcpu->vcpu_id == 0)
+		return atomic_read(&pit->pit_state.pit_timer.pending);
+
+	return 0;
+}
+
 static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 {
 	struct kvm_kpit_state *ps;
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index dbfe21c..ce1f583 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -26,6 +26,21 @@
 #include "i8254.h"
 
 /*
+ * check if there are pending timer events
+ * to be processed.
+ */
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	ret = pit_has_pending_timer(vcpu);
+	ret |= apic_has_pending_timer(vcpu);
+
+	return ret;
+}
+EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
+
+/*
  * check if there is pending interrupt without
  * intack.
  */
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index fa5ed5d..1802134 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -85,4 +85,7 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 
+int pit_has_pending_timer(struct kvm_vcpu *vcpu);
+int apic_has_pending_timer(struct kvm_vcpu *vcpu);
+
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 31280df..debf582 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -952,6 +952,16 @@ static int __apic_timer_fn(struct kvm_lapic *apic)
 	return result;
 }
 
+int apic_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *lapic = vcpu->arch.apic;
+
+	if (lapic)
+		return atomic_read(&lapic->timer.pending);
+
+	return 0;
+}
+
 static int __inject_apic_timer_irq(struct kvm_lapic *apic)
 {
 	int vector;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd0c2d2..0bc4003 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -269,6 +269,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm);
 
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *v);
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
 static inline void kvm_guest_enter(void)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d5911d9..47cbc6e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -765,6 +765,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	 * We will block until either an interrupt or a signal wakes us up
 	 */
 	while (!kvm_cpu_has_interrupt(vcpu)
+	       && !kvm_cpu_has_pending_timer(vcpu)
 	       && !signal_pending(current)
 	       && !kvm_arch_vcpu_runnable(vcpu)) {
 		set_current_state(TASK_INTERRUPTIBLE);
-- 
1.5.5

[kvm-devel] [PATCH 13/31] KVM: add ioctls to save/store mpstate

From: Avi K. <av...@qu...> - 2008-04-21 10:30:09

From: Marcelo Tosatti <mto...@re...>

So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mto...@re...>
Signed-off-by: Christian Borntraeger <bor...@de...>
Signed-off-by: Carsten Otte <co...@de...>
Signed-off-by: Avi Kivity <av...@qu...>

KVM: ia64: provide get/set_mp_state stubs to fix compile error

Since

commit ded6fb24fb694bcc5f308a02ec504d45fbc8aaa6
Author: Marcelo Tosatti <mto...@re...>
Date:   Fri Apr 11 13:24:45 2008 -0300
    KVM: add ioctls to save/store mpstate

kvm does not compile on ia64.
This patch provides ioctl stubs for ia64 to make kvm.git compile again.

Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/ia64/kvm/kvm-ia64.c   |   12 ++++++++++++
 arch/s390/kvm/kvm-s390.c   |   12 ++++++++++++
 arch/x86/kvm/x86.c         |   19 +++++++++++++++++++
 include/asm-x86/kvm_host.h |    5 -----
 include/linux/kvm.h        |   15 +++++++++++++++
 include/linux/kvm_host.h   |    4 ++++
 virt/kvm/kvm_main.c        |   24 ++++++++++++++++++++++++
 7 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index f7589db..6df0732 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1792,3 +1792,15 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE;
 }
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d966137..98d1e73 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -414,6 +414,18 @@ int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
 	return -EINVAL; /* not implemented yet */
 }
 
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL; /* not implemented yet */
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL; /* not implemented yet */
+}
+
 static void __vcpu_run(struct kvm_vcpu *vcpu)
 {
 	memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b364d19..5c3c9d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -817,6 +817,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_CLOCKSOURCE:
 	case KVM_CAP_PIT:
 	case KVM_CAP_NOP_IO_DELAY:
+	case KVM_CAP_MP_STATE:
 		r = 1;
 		break;
 	case KVM_CAP_VAPIC:
@@ -3083,6 +3084,24 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	vcpu_load(vcpu);
+	mp_state->mp_state = vcpu->arch.mp_state;
+	vcpu_put(vcpu);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	vcpu_load(vcpu);
+	vcpu->arch.mp_state = mp_state->mp_state;
+	vcpu_put(vcpu);
+	return 0;
+}
+
 static void set_segment(struct kvm_vcpu *vcpu,
 			struct kvm_segment *var, int seg)
 {
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index f35a6ad..9d963cd 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -227,11 +227,6 @@ struct kvm_vcpu_arch {
 	u64 shadow_efer;
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
-#define KVM_MP_STATE_RUNNABLE          0
-#define KVM_MP_STATE_UNINITIALIZED     1
-#define KVM_MP_STATE_INIT_RECEIVED     2
-#define KVM_MP_STATE_SIPI_RECEIVED     3
-#define KVM_MP_STATE_HALTED            4
 	int mp_state;
 	int sipi_vector;
 	u64 ia32_misc_enable_msr;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d302d63..f8e211d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -228,6 +228,18 @@ struct kvm_vapic_addr {
 	__u64 vapic_addr;
 };
 
+/* for KVM_SET_MPSTATE */
+
+#define KVM_MP_STATE_RUNNABLE          0
+#define KVM_MP_STATE_UNINITIALIZED     1
+#define KVM_MP_STATE_INIT_RECEIVED     2
+#define KVM_MP_STATE_HALTED            3
+#define KVM_MP_STATE_SIPI_RECEIVED     4
+
+struct kvm_mp_state {
+	__u32 mp_state;
+};
+
 struct kvm_s390_psw {
 	__u64 mask;
 	__u64 addr;
@@ -326,6 +338,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_PIT 11
 #define KVM_CAP_NOP_IO_DELAY 12
 #define KVM_CAP_PV_MMU 13
+#define KVM_CAP_MP_STATE 14
 
 /*
  * ioctls for VM fds
@@ -387,5 +400,7 @@ struct kvm_trace_rec {
 #define KVM_S390_SET_INITIAL_PSW  _IOW(KVMIO,  0x96, struct kvm_s390_psw)
 /* initial reset for s390 */
 #define KVM_S390_INITIAL_RESET    _IO(KVMIO,  0x97)
+#define KVM_GET_MP_STATE          _IOR(KVMIO,  0x98, struct kvm_mp_state)
+#define KVM_SET_MP_STATE          _IOW(KVMIO,  0x99, struct kvm_mp_state)
 
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bc4003..81d4c33 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -237,6 +237,10 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs);
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs);
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state);
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state);
 int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
 				    struct kvm_debug_guest *dbg);
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 47cbc6e..0998455 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -979,6 +979,30 @@ out_free2:
 		r = 0;
 		break;
 	}
+	case KVM_GET_MP_STATE: {
+		struct kvm_mp_state mp_state;
+
+		r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state);
+		if (r)
+			goto out;
+		r = -EFAULT;
+		if (copy_to_user(argp, &mp_state, sizeof mp_state))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_SET_MP_STATE: {
+		struct kvm_mp_state mp_state;
+
+		r = -EFAULT;
+		if (copy_from_user(&mp_state, argp, sizeof mp_state))
+			goto out;
+		r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);
+		if (r)
+			goto out;
+		r = 0;
+		break;
+	}
 	case KVM_TRANSLATE: {
 		struct kvm_translation tr;
 
-- 
1.5.5

[kvm-devel] [PATCH 31/31] KVM: MMU: kvm_pv_mmu_op should not take mmap_sem

From: Avi K. <av...@qu...> - 2008-04-21 10:30:09

From: Marcelo Tosatti <mto...@re...>

kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
in the MMU processing will take it if necessary, so as it is it can
deadlock.

Apparently a leftover from the days before slots_lock.

Signed-off-by: Marcelo Tosatti <mto...@re...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/mmu.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 078a7f1..2ad6f54 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2173,8 +2173,6 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 	int r;
 	struct kvm_pv_mmu_op_buffer buffer;
 
-	down_read(&current->mm->mmap_sem);
-
 	buffer.ptr = buffer.buf;
 	buffer.len = min_t(unsigned long, bytes, sizeof buffer.buf);
 	buffer.processed = 0;
@@ -2194,7 +2192,6 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 	r = 1;
 out:
 	*ret = buffer.processed;
-	up_read(&current->mm->mmap_sem);
 	return r;
 }
 
-- 
1.5.5

[kvm-devel] [PATCH 14/31] KVM: fix kvm_vcpu_kick vs __vcpu_run race

From: Avi K. <av...@qu...> - 2008-04-21 10:30:06

From: Marcelo Tosatti <mto...@re...>

There is a window open between testing of pending IRQ's
and assignment of guest_mode in __vcpu_run.

Injection of IRQ's can race with __vcpu_run as follows:

CPU0                                CPU1
kvm_x86_ops->run()
vcpu->guest_mode = 0                SET_IRQ_LINE ioctl
..
kvm_x86_ops->inject_pending_irq
kvm_cpu_has_interrupt()

                                    apic_test_and_set_irr()
                                    kvm_vcpu_kick
                                    if (vcpu->guest_mode)
                                        send_ipi()

vcpu->guest_mode = 1

So move guest_mode=1 assignment before ->inject_pending_irq, and make
sure that it won't reorder after it.

Signed-off-by: Marcelo Tosatti <mto...@re...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/x86.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c3c9d3..0ce5563 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2802,6 +2802,13 @@ again:
 		goto out;
 	}
 
+	vcpu->guest_mode = 1;
+	/*
+	 * Make sure that guest_mode assignment won't happen after
+	 * testing the pending IRQ vector bitmap.
+	 */
+	smp_wmb();
+
 	if (vcpu->arch.exception.pending)
 		__queue_exception(vcpu);
 	else if (irqchip_in_kernel(vcpu->kvm))
@@ -2813,7 +2820,6 @@ again:
 
 	up_read(&vcpu->kvm->slots_lock);
 
-	vcpu->guest_mode = 1;
 	kvm_guest_enter();
 
 	if (vcpu->requests)
@@ -3970,11 +3976,17 @@ static void vcpu_kick_intr(void *info)
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	int ipi_pcpu = vcpu->cpu;
+	int cpu = get_cpu();
 
 	if (waitqueue_active(&vcpu->wq)) {
 		wake_up_interruptible(&vcpu->wq);
 		++vcpu->stat.halt_wakeup;
 	}
-	if (vcpu->guest_mode)
+	/*
+	 * We may be called synchronously with irqs disabled in guest mode,
+	 * So need not to call smp_call_function_single() in that case.
+	 */
+	if (vcpu->guest_mode && vcpu->cpu != cpu)
 		smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0, 0);
+	put_cpu();
 }
-- 
1.5.5

[kvm-devel] [PATCH 08/31] KVM: ia64: Stub out kvmtrace

From: Avi K. <av...@qu...> - 2008-04-21 10:30:05

Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/ia64/kvm/Kconfig |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index d2e54b9..7914e48 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -43,4 +43,7 @@ config KVM_INTEL
 	  Provides support for KVM on Itanium 2 processors equipped with the VT
 	  extensions.
 
+config KVM_TRACE
+       bool
+
 endif # VIRTUALIZATION
-- 
1.5.5

[kvm-devel] [PATCH 05/31] KVM: SVM: add intercept for machine check exception

From: Avi K. <av...@qu...> - 2008-04-21 10:30:05

From: Joerg Roedel <joe...@am...>

To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.

Signed-off-by: Joerg Roedel <joe...@am...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/svm.c         |   17 ++++++++++++++++-
 include/asm-x86/kvm_host.h |    1 +
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8af463b..da3ddef 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -507,7 +507,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 					INTERCEPT_DR7_MASK;
 
 	control->intercept_exceptions = (1 << PF_VECTOR) |
-					(1 << UD_VECTOR);
+					(1 << UD_VECTOR) |
+					(1 << MC_VECTOR);
 
 
 	control->intercept = 	(1ULL << INTERCEPT_INTR) |
@@ -1044,6 +1045,19 @@ static int nm_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int mc_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	/*
+	 * On an #MC intercept the MCE handler is not called automatically in
+	 * the host. So do it by hand here.
+	 */
+	asm volatile (
+		"int $0x12\n");
+	/* not sure if we ever come back to this point */
+
+	return 1;
+}
+
 static int shutdown_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	/*
@@ -1367,6 +1381,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
 	[SVM_EXIT_EXCP_BASE + PF_VECTOR] 	= pf_interception,
 	[SVM_EXIT_EXCP_BASE + NM_VECTOR] 	= nm_interception,
+	[SVM_EXIT_EXCP_BASE + MC_VECTOR] 	= mc_interception,
 	[SVM_EXIT_INTR] 			= nop_on_interception,
 	[SVM_EXIT_NMI]				= nop_on_interception,
 	[SVM_EXIT_SMI]				= nop_on_interception,
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index de3eccf..2861178 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -62,6 +62,7 @@
 #define SS_VECTOR 12
 #define GP_VECTOR 13
 #define PF_VECTOR 14
+#define MC_VECTOR 18
 
 #define SELECTOR_TI_MASK (1 << 2)
 #define SELECTOR_RPL_MASK 0x03
-- 
1.5.5

[kvm-devel] [PATCH 01/31] KVM: MMU: Don't assume struct page for x86

From: Avi K. <av...@qu...> - 2008-04-21 10:29:59

From: Anthony Liguori <ali...@us...>

This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori <ali...@us...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/mmu.c         |   89 +++++++++++++++++++++----------------------
 arch/x86/kvm/paging_tmpl.h |   26 ++++++------
 include/asm-x86/kvm_host.h |    4 +-
 include/linux/kvm_host.h   |   12 ++++++
 include/linux/kvm_types.h  |    2 +
 virt/kvm/kvm_main.c        |   68 ++++++++++++++++++++++++++++++---
 6 files changed, 133 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c89bf23..078a7f1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -240,11 +240,9 @@ static int is_rmap_pte(u64 pte)
 	return is_shadow_present_pte(pte);
 }
 
-static struct page *spte_to_page(u64 pte)
+static pfn_t spte_to_pfn(u64 pte)
 {
-	hfn_t hfn = (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
-
-	return pfn_to_page(hfn);
+	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 }
 
 static gfn_t pse36_gfn_delta(u32 gpte)
@@ -541,20 +539,20 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	struct kvm_rmap_desc *desc;
 	struct kvm_rmap_desc *prev_desc;
 	struct kvm_mmu_page *sp;
-	struct page *page;
+	pfn_t pfn;
 	unsigned long *rmapp;
 	int i;
 
 	if (!is_rmap_pte(*spte))
 		return;
 	sp = page_header(__pa(spte));
-	page = spte_to_page(*spte);
+	pfn = spte_to_pfn(*spte);
 	if (*spte & PT_ACCESSED_MASK)
-		mark_page_accessed(page);
+		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_page_dirty(page);
+		kvm_release_pfn_dirty(pfn);
 	else
-		kvm_release_page_clean(page);
+		kvm_release_pfn_clean(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte));
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -635,11 +633,11 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
 		spte = rmap_next(kvm, rmapp, spte);
 	}
 	if (write_protected) {
-		struct page *page;
+		pfn_t pfn;
 
 		spte = rmap_next(kvm, rmapp, NULL);
-		page = spte_to_page(*spte);
-		SetPageDirty(page);
+		pfn = spte_to_pfn(*spte);
+		kvm_set_pfn_dirty(pfn);
 	}
 
 	/* check for huge page mappings */
@@ -1036,7 +1034,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
 			 int *ptwrite, int largepage, gfn_t gfn,
-			 struct page *page, bool speculative)
+			 pfn_t pfn, bool speculative)
 {
 	u64 spte;
 	int was_rmapped = 0;
@@ -1058,10 +1056,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 
 			child = page_header(pte & PT64_BASE_ADDR_MASK);
 			mmu_page_remove_parent_pte(child, shadow_pte);
-		} else if (page != spte_to_page(*shadow_pte)) {
+		} else if (pfn != spte_to_pfn(*shadow_pte)) {
 			pgprintk("hfn old %lx new %lx\n",
-				 page_to_pfn(spte_to_page(*shadow_pte)),
-				 page_to_pfn(page));
+				 spte_to_pfn(*shadow_pte), pfn);
 			rmap_remove(vcpu->kvm, shadow_pte);
 		} else {
 			if (largepage)
@@ -1090,7 +1087,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 	if (largepage)
 		spte |= PT_PAGE_SIZE_MASK;
 
-	spte |= page_to_phys(page);
+	spte |= (u64)pfn << PAGE_SHIFT;
 
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
@@ -1135,12 +1132,12 @@ unshadowed:
 	if (!was_rmapped) {
 		rmap_add(vcpu, shadow_pte, gfn, largepage);
 		if (!is_rmap_pte(*shadow_pte))
-			kvm_release_page_clean(page);
+			kvm_release_pfn_clean(pfn);
 	} else {
 		if (was_writeble)
-			kvm_release_page_dirty(page);
+			kvm_release_pfn_dirty(pfn);
 		else
-			kvm_release_page_clean(page);
+			kvm_release_pfn_clean(pfn);
 	}
 	if (!ptwrite || !*ptwrite)
 		vcpu->arch.last_pte_updated = shadow_pte;
@@ -1151,7 +1148,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-			   int largepage, gfn_t gfn, struct page *page,
+			   int largepage, gfn_t gfn, pfn_t pfn,
 			   int level)
 {
 	hpa_t table_addr = vcpu->arch.mmu.root_hpa;
@@ -1166,13 +1163,13 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 
 		if (level == 1) {
 			mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
-				     0, write, 1, &pt_write, 0, gfn, page, false);
+				     0, write, 1, &pt_write, 0, gfn, pfn, false);
 			return pt_write;
 		}
 
 		if (largepage && level == 2) {
 			mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
-				     0, write, 1, &pt_write, 1, gfn, page, false);
+				     0, write, 1, &pt_write, 1, gfn, pfn, false);
 			return pt_write;
 		}
 
@@ -1187,7 +1184,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 						     1, ACC_ALL, &table[index]);
 			if (!new_table) {
 				pgprintk("nonpaging_map: ENOMEM\n");
-				kvm_release_page_clean(page);
+				kvm_release_pfn_clean(pfn);
 				return -ENOMEM;
 			}
 
@@ -1202,8 +1199,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 {
 	int r;
 	int largepage = 0;
-
-	struct page *page;
+	pfn_t pfn;
 
 	down_read(&current->mm->mmap_sem);
 	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
@@ -1211,18 +1207,18 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 		largepage = 1;
 	}
 
-	page = gfn_to_page(vcpu->kvm, gfn);
+	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
 	/* mmio */
-	if (is_error_page(page)) {
-		kvm_release_page_clean(page);
+	if (is_error_pfn(pfn)) {
+		kvm_release_pfn_clean(pfn);
 		return 1;
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
-	r = __direct_map(vcpu, v, write, largepage, gfn, page,
+	r = __direct_map(vcpu, v, write, largepage, gfn, pfn,
 			 PT32E_ROOT_LEVEL);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
@@ -1355,7 +1351,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 				u32 error_code)
 {
-	struct page *page;
+	pfn_t pfn;
 	int r;
 	int largepage = 0;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -1372,16 +1368,16 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		largepage = 1;
 	}
-	page = gfn_to_page(vcpu->kvm, gfn);
+	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
-	if (is_error_page(page)) {
-		kvm_release_page_clean(page);
+	if (is_error_pfn(pfn)) {
+		kvm_release_pfn_clean(pfn);
 		return 1;
 	}
 	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
-			 largepage, gfn, page, TDP_ROOT_LEVEL);
+			 largepage, gfn, pfn, TDP_ROOT_LEVEL);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
@@ -1525,6 +1521,8 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 
 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
+	vcpu->arch.update_pte.pfn = bad_pfn;
+
 	if (tdp_enabled)
 		return init_kvm_tdp_mmu(vcpu);
 	else
@@ -1644,7 +1642,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	gfn_t gfn;
 	int r;
 	u64 gpte = 0;
-	struct page *page;
+	pfn_t pfn;
 
 	vcpu->arch.update_pte.largepage = 0;
 
@@ -1680,15 +1678,15 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		vcpu->arch.update_pte.largepage = 1;
 	}
-	page = gfn_to_page(vcpu->kvm, gfn);
+	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
-	if (is_error_page(page)) {
-		kvm_release_page_clean(page);
+	if (is_error_pfn(pfn)) {
+		kvm_release_pfn_clean(pfn);
 		return;
 	}
 	vcpu->arch.update_pte.gfn = gfn;
-	vcpu->arch.update_pte.page = page;
+	vcpu->arch.update_pte.pfn = pfn;
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -1793,9 +1791,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	}
 	kvm_mmu_audit(vcpu, "post pte write");
 	spin_unlock(&vcpu->kvm->mmu_lock);
-	if (vcpu->arch.update_pte.page) {
-		kvm_release_page_clean(vcpu->arch.update_pte.page);
-		vcpu->arch.update_pte.page = NULL;
+	if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
+		kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
+		vcpu->arch.update_pte.pfn = bad_pfn;
 	}
 }
 
@@ -2236,8 +2234,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
 			audit_mappings_page(vcpu, ent, va, level - 1);
 		} else {
 			gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
-			struct page *page = gpa_to_page(vcpu, gpa);
-			hpa_t hpa = page_to_phys(page);
+			hpa_t hpa = (hpa_t)gpa_to_pfn(vcpu, gpa) << PAGE_SHIFT;
 
 			if (is_shadow_present_pte(ent)
 			    && (ent & PT64_BASE_ADDR_MASK) != hpa)
@@ -2250,7 +2247,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
 				 && !is_error_hpa(hpa))
 				printk(KERN_ERR "audit: (%s) notrap shadow,"
 				       " valid guest gva %lx\n", audit_msg, va);
-			kvm_release_page_clean(page);
+			kvm_release_pfn_clean(pfn);
 
 		}
 	}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 57d872a..156fe10 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -247,7 +247,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 {
 	pt_element_t gpte;
 	unsigned pte_access;
-	struct page *npage;
+	pfn_t pfn;
 	int largepage = vcpu->arch.update_pte.largepage;
 
 	gpte = *(const pt_element_t *)pte;
@@ -260,13 +260,13 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	pte_access = page->role.access & FNAME(gpte_access)(vcpu, gpte);
 	if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn)
 		return;
-	npage = vcpu->arch.update_pte.page;
-	if (!npage)
+	pfn = vcpu->arch.update_pte.pfn;
+	if (is_error_pfn(pfn))
 		return;
-	get_page(npage);
+	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
 		     gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
-		     npage, true);
+		     pfn, true);
 }
 
 /*
@@ -275,7 +275,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *walker,
 			 int user_fault, int write_fault, int largepage,
-			 int *ptwrite, struct page *page)
+			 int *ptwrite, pfn_t pfn)
 {
 	hpa_t shadow_addr;
 	int level;
@@ -336,7 +336,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 						  walker->pte_gpa[level - 2],
 						  &curr_pte, sizeof(curr_pte));
 			if (r || curr_pte != walker->ptes[level - 2]) {
-				kvm_release_page_clean(page);
+				kvm_release_pfn_clean(pfn);
 				return NULL;
 			}
 		}
@@ -349,7 +349,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	mmu_set_spte(vcpu, shadow_ent, access, walker->pte_access & access,
 		     user_fault, write_fault,
 		     walker->ptes[walker->level-1] & PT_DIRTY_MASK,
-		     ptwrite, largepage, walker->gfn, page, false);
+		     ptwrite, largepage, walker->gfn, pfn, false);
 
 	return shadow_ent;
 }
@@ -378,7 +378,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	u64 *shadow_pte;
 	int write_pt = 0;
 	int r;
-	struct page *page;
+	pfn_t pfn;
 	int largepage = 0;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
@@ -413,20 +413,20 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 			largepage = 1;
 		}
 	}
-	page = gfn_to_page(vcpu->kvm, walker.gfn);
+	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
 	up_read(&current->mm->mmap_sem);
 
 	/* mmio */
-	if (is_error_page(page)) {
+	if (is_error_pfn(pfn)) {
 		pgprintk("gfn %x is mmio\n", walker.gfn);
-		kvm_release_page_clean(page);
+		kvm_release_pfn_clean(pfn);
 		return 1;
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
-				  largepage, &write_pt, page);
+				  largepage, &write_pt, pfn);
 
 	pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
 		 shadow_pte, *shadow_pte, write_pt);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index b923049..de3eccf 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -248,8 +248,8 @@ struct kvm_vcpu_arch {
 	u64  *last_pte_updated;
 
 	struct {
-		gfn_t gfn;          /* presumed gfn during guest pte update */
-		struct page *page;  /* page corresponding to that gfn */
+		gfn_t gfn;	/* presumed gfn during guest pte update */
+		pfn_t pfn;	/* pfn corresponding to that gfn */
 		int largepage;
 	} update_pte;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a2ceb51..578c363 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,8 +150,10 @@ static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
 struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva);
 
 extern struct page *bad_page;
+extern pfn_t bad_pfn;
 
 int is_error_page(struct page *page);
+int is_error_pfn(pfn_t pfn);
 int kvm_is_error_hva(unsigned long addr);
 int kvm_set_memory_region(struct kvm *kvm,
 			  struct kvm_userspace_memory_region *mem,
@@ -168,6 +170,16 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
+void kvm_set_page_dirty(struct page *page);
+void kvm_set_page_accessed(struct page *page);
+
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+void kvm_release_pfn_dirty(pfn_t);
+void kvm_release_pfn_clean(pfn_t pfn);
+void kvm_set_pfn_dirty(pfn_t pfn);
+void kvm_set_pfn_accessed(pfn_t pfn);
+void kvm_get_pfn(pfn_t pfn);
+
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 			int len);
 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 1c4e46d..9b6f395 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -38,6 +38,8 @@ typedef unsigned long  hva_t;
 typedef u64            hpa_t;
 typedef unsigned long  hfn_t;
 
+typedef hfn_t pfn_t;
+
 struct kvm_pio_request {
 	unsigned long count;
 	int cur_count;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 93ed78b..6a52c08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -40,6 +40,7 @@
 #include <linux/kvm_para.h>
 #include <linux/pagemap.h>
 #include <linux/mman.h>
+#include <linux/swap.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -458,6 +459,12 @@ int is_error_page(struct page *page)
 }
 EXPORT_SYMBOL_GPL(is_error_page);
 
+int is_error_pfn(pfn_t pfn)
+{
+	return pfn == bad_pfn;
+}
+EXPORT_SYMBOL_GPL(is_error_pfn);
+
 static inline unsigned long bad_hva(void)
 {
 	return PAGE_OFFSET;
@@ -519,7 +526,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 /*
  * Requires current->mm->mmap_sem to be held
  */
-struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct page *page[1];
 	unsigned long addr;
@@ -530,7 +537,7 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr)) {
 		get_page(bad_page);
-		return bad_page;
+		return page_to_pfn(bad_page);
 	}
 
 	npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
@@ -538,27 +545,71 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
 	if (npages != 1) {
 		get_page(bad_page);
-		return bad_page;
+		return page_to_pfn(bad_page);
 	}
 
-	return page[0];
+	return page_to_pfn(page[0]);
+}
+
+EXPORT_SYMBOL_GPL(gfn_to_pfn);
+
+struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+{
+	return pfn_to_page(gfn_to_pfn(kvm, gfn));
 }
 
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
 void kvm_release_page_clean(struct page *page)
 {
-	put_page(page);
+	kvm_release_pfn_clean(page_to_pfn(page));
 }
 EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
+void kvm_release_pfn_clean(pfn_t pfn)
+{
+	put_page(pfn_to_page(pfn));
+}
+EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
+
 void kvm_release_page_dirty(struct page *page)
 {
+	kvm_release_pfn_dirty(page_to_pfn(page));
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
+
+void kvm_release_pfn_dirty(pfn_t pfn)
+{
+	kvm_set_pfn_dirty(pfn);
+	kvm_release_pfn_clean(pfn);
+}
+EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
+
+void kvm_set_page_dirty(struct page *page)
+{
+	kvm_set_pfn_dirty(page_to_pfn(page));
+}
+EXPORT_SYMBOL_GPL(kvm_set_page_dirty);
+
+void kvm_set_pfn_dirty(pfn_t pfn)
+{
+	struct page *page = pfn_to_page(pfn);
 	if (!PageReserved(page))
 		SetPageDirty(page);
-	put_page(page);
 }
-EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
+EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
+
+void kvm_set_pfn_accessed(pfn_t pfn)
+{
+	mark_page_accessed(pfn_to_page(pfn));
+}
+EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
+
+void kvm_get_pfn(pfn_t pfn)
+{
+	get_page(pfn_to_page(pfn));
+}
+EXPORT_SYMBOL_GPL(kvm_get_pfn);
 
 static int next_segment(unsigned long len, int offset)
 {
@@ -1351,6 +1402,7 @@ static struct sys_device kvm_sysdev = {
 };
 
 struct page *bad_page;
+pfn_t bad_pfn;
 
 static inline
 struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
@@ -1392,6 +1444,8 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
 		goto out;
 	}
 
+	bad_pfn = page_to_pfn(bad_page);
+
 	r = kvm_arch_hardware_setup();
 	if (r < 0)
 		goto out_free_0;
-- 
1.5.5

[kvm-devel] [PATCH 02/31] KVM: Register ioctl range

From: Avi K. <av...@qu...> - 2008-04-21 10:29:59

Signed-off-by: Avi Kivity <av...@qu...>
---
 Documentation/ioctl-number.txt |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index c18363b..240ce7a 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -183,6 +183,8 @@ Code	Seq#	Include File		Comments
 0xAC	00-1F	linux/raw.h
 0xAD	00	Netfilter device	in development:
 					<mailto:ru...@ru...>	
+0xAE	all	linux/kvm.h		Kernel-based Virtual Machine
+					<mailto:kvm...@li...>
 0xB0	all	RATIO devices		in development:
 					<mailto:vg...@ra...>
 0xB1	00-1F	PPPoX			<mailto:mos...@st...>
-- 
1.5.5

[kvm-devel] [PATCH 04/31] KVM: SVM: align shadow CR4.MCE with host

From: Avi K. <av...@qu...> - 2008-04-21 10:29:59

From: Joerg Roedel <joe...@am...>

This patch aligns the host version of the CR4.MCE bit with the CR4 active in
the guest. This is necessary to get MCE exceptions when the guest is running.

Signed-off-by: Joerg Roedel <joe...@am...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/svm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d7439ce..8af463b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -878,9 +878,12 @@ set:
 
 static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
+	unsigned long host_cr4_mce = read_cr4() & X86_CR4_MCE;
+
 	vcpu->arch.cr4 = cr4;
 	if (!npt_enabled)
 		cr4 |= X86_CR4_PAE;
+	cr4 |= host_cr4_mce;
 	to_svm(vcpu)->vmcb->save.cr4 = cr4;
 }
 
-- 
1.5.5

[kvm-devel] [PATCH 00/31] KVM updates for the 2.6.26 merge window (part IV, last)

From: Avi K. <av...@qu...> - 2008-04-21 10:29:59

Fourth and final batch of the pending kvm updates.  This one contains the
ppc port in addition to x86 updates.

 Documentation/ioctl-number.txt      |    2 +
 Documentation/powerpc/kvm_440.txt   |   41 ++
 MAINTAINERS                         |    7 +
 arch/ia64/kvm/Kconfig               |    3 +
 arch/ia64/kvm/kvm-ia64.c            |   43 ++-
 arch/powerpc/Kconfig                |    1 +
 arch/powerpc/Kconfig.debug          |    3 +
 arch/powerpc/Makefile               |    1 +
 arch/powerpc/kernel/asm-offsets.c   |   26 ++
 arch/powerpc/kvm/44x_tlb.c          |  224 ++++++++++
 arch/powerpc/kvm/44x_tlb.h          |   91 +++++
 arch/powerpc/kvm/Kconfig            |   43 ++
 arch/powerpc/kvm/Makefile           |   15 +
 arch/powerpc/kvm/booke_guest.c      |  615 ++++++++++++++++++++++++++++
 arch/powerpc/kvm/booke_host.c       |   83 ++++
 arch/powerpc/kvm/booke_interrupts.S |  436 ++++++++++++++++++++
 arch/powerpc/kvm/emulate.c          |  760 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c          |  436 ++++++++++++++++++++
 arch/s390/kvm/Kconfig               |    3 +
 arch/s390/kvm/interrupt.c           |    5 +
 arch/s390/kvm/kvm-s390.c            |   12 +
 arch/x86/kvm/Kconfig                |   11 +
 arch/x86/kvm/Makefile               |    3 +
 arch/x86/kvm/i8254.c                |   13 +-
 arch/x86/kvm/irq.c                  |   15 +
 arch/x86/kvm/irq.h                  |    3 +
 arch/x86/kvm/lapic.c                |   27 +-
 arch/x86/kvm/mmu.c                  |   92 ++---
 arch/x86/kvm/paging_tmpl.h          |   26 +-
 arch/x86/kvm/svm.c                  |  110 ++++--
 arch/x86/kvm/vmx.c                  |   35 ++-
 arch/x86/kvm/x86.c                  |   79 ++++-
 arch/x86/kvm/x86_emulate.c          |   33 +-
 include/asm-ia64/kvm_host.h         |    8 +-
 include/asm-powerpc/kvm.h           |   53 +++-
 include/asm-powerpc/kvm_asm.h       |   55 +++
 include/asm-powerpc/kvm_host.h      |  152 +++++++
 include/asm-powerpc/kvm_para.h      |   38 ++
 include/asm-powerpc/kvm_ppc.h       |   88 ++++
 include/asm-powerpc/mmu-44x.h       |    2 +
 include/asm-x86/kvm.h               |   20 +
 include/asm-x86/kvm_host.h          |   29 +-
 include/linux/kvm.h                 |   71 ++++-
 include/linux/kvm_host.h            |   31 ++
 include/linux/kvm_types.h           |    2 +
 virt/kvm/kvm_main.c                 |  107 +++++-
 virt/kvm/kvm_trace.c                |  276 +++++++++++++
 47 files changed, 4065 insertions(+), 164 deletions(-)
 create mode 100644 Documentation/powerpc/kvm_440.txt
 create mode 100644 arch/powerpc/kvm/44x_tlb.c
 create mode 100644 arch/powerpc/kvm/44x_tlb.h
 create mode 100644 arch/powerpc/kvm/Kconfig
 create mode 100644 arch/powerpc/kvm/Makefile
 create mode 100644 arch/powerpc/kvm/booke_guest.c
 create mode 100644 arch/powerpc/kvm/booke_host.c
 create mode 100644 arch/powerpc/kvm/booke_interrupts.S
 create mode 100644 arch/powerpc/kvm/emulate.c
 create mode 100644 arch/powerpc/kvm/powerpc.c
 create mode 100644 include/asm-powerpc/kvm_asm.h
 create mode 100644 include/asm-powerpc/kvm_host.h
 create mode 100644 include/asm-powerpc/kvm_para.h
 create mode 100644 include/asm-powerpc/kvm_ppc.h
 create mode 100644 virt/kvm/kvm_trace.c

[kvm-devel] [PATCH 03/31] KVM: SVM: indent svm_set_cr4 with tabs instead of spaces

From: Avi K. <av...@qu...> - 2008-04-21 10:29:58

From: Joerg Roedel <joe...@am...>

The svm_set_cr4 function is indented with spaces. This patch replaces
them with tabs.

Signed-off-by: Joerg Roedel <joe...@am...>
Signed-off-by: Avi Kivity <av...@qu...>
---
 arch/x86/kvm/svm.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ad27346..d7439ce 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -878,10 +878,10 @@ set:
 
 static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-       vcpu->arch.cr4 = cr4;
-       if (!npt_enabled)
-	       cr4 |= X86_CR4_PAE;
-       to_svm(vcpu)->vmcb->save.cr4 = cr4;
+	vcpu->arch.cr4 = cr4;
+	if (!npt_enabled)
+		cr4 |= X86_CR4_PAE;
+	to_svm(vcpu)->vmcb->save.cr4 = cr4;
 }
 
 static void svm_set_segment(struct kvm_vcpu *vcpu,
-- 
1.5.5

Re: [kvm-devel] [ RfC / patch ] kvmclock fixes

From: Jeremy F. <je...@go...> - 2008-04-21 09:57:21

Gerd Hoffmann wrote:
>   * Host: make kvm pv clock really compatible with xen pv clock.
>   * Guest/xen: factor out some xen clock code into a separate
>                source file (pvclock.[ch]), so kvm can reuse it.
>   * Guest/kvm: make kvm clock compatible with xen clock by using
>                the common code bits.
>   

I guess saving on code duplication is good...

> +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src)
> +{
> +	struct pvclock_shadow_time *shadow = &get_cpu_var(shadow_time);
> +	cycle_t ret;
> +
> +	pvclock_get_time_values(shadow, src);
> +	ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow);
>   

You need to put this in a loop in case the system clock parameters 
change between the pvclock_get_time_values() and pvclock_get_nsec_offset().

How does kvm deal with suspend/resume with respect to time?  Is the 
"system" timestamp guaranteed to remain monotonic?  For Xen, I think 
we'll need to maintain an offset between the initial system timestamp 
and whatever it is after resuming.

    J

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

From: Avi K. <av...@qu...> - 2008-04-21 09:20:21

David S. Ahern wrote:
> I added the traces and captured data over another apparent lockup of the guest.
> This seems to be representative of the sequence (pid/vcpu removed).
>
> (+4776)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016127c ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
> (+3632)  VMENTRY
> (+4552)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000 fffb61c8 ]
> (+   54928)  VMENTRY
>   

Can you oprofile the host to see where the 54K cycles are spent?

> (+4568)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41c5d363 ]
> (+8432)  VMENTRY
> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ]
> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ]
> (+   13832)  VMENTRY
>
>
> (+5768)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016127c ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
> (+3712)  VMENTRY
> (+4576)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000 fffb61d0 ]
> (+   0)  PTE_WRITE      [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 3d55d047 ]
>   

This indeed has the accessed bit clear.

> (+   65216)  VMENTRY
> (+4232)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000 3d598363 ]
>   

This has the accessed bit set and the user bit clear, and the pte 
pointing at the previous pte_write gpa.  Looks like a kmap_atomic().

> (+8640)  VMENTRY
> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ]
> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ]
> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ]
> (+   14160)  VMENTRY
>
> I can forward a more complete time snippet if you'd like. vcpu0 + corresponding
> vcpu1 files have 85000 total lines and compressed the files total ~500k.
>
> I did not see the FLOODED trace come out during this sample though I did bump
> the count from 3 to 4 as you suggested.
>
>
>   

Bumping the count was supposed to remove the flooding...

> Correlating rip addresses to the 2.4 kernel:
>
> c0160d00-c0161290 = page_referenced
>
> It looks like the event is kscand running through the pages. I suspected this
> some time ago, and tried tweaking the kscand_work_percent sysctl variable. It
> appeared to lower the peak of the spikes, but maybe I imagined it. I believe
> lowering that value makes kscand wake up more often but do less work (page
> scanning) each time it is awakened.
>
>   

What does 'top' in the guest show (perhaps sorted by total cpu time 
rather than instantaneous usage)?

What host kernel are you running?  How many host cpus?

-- 
error compiling committee.c: too many arguments to function

[kvm-devel] Booting with lilo after kvm upgrade to kvm-64

From: Tomas R. <li...@ko...> - 2008-04-21 09:10:03

Hello everybody

After I update to KVM-66 (from 65), I have problem to boot guests with 
lilo installed. Boot sequence always stop with "LIL" output. With kvm-65 
everythink works great. I have also windows XP guest, which boot without 
problem. With -no-kvm guests boot ok.

Processor: AMD Opteron 2210
KVM: kvm-64
Host: gentoo-sources-2.6.25-r1
Arch: x86_64
Guests: gentoo, 2.6.25, x86_64



dmesg:
Apr 21 09:53:37 kvm BUG: unable to handle kernel NULL pointer 
dereference at 0000000000000000
Apr 21 09:53:37 kvm IP: [<ffffffff88029c85>] 
:kvm:x86_emulate_insn+0x3a47/0x468f
Apr 21 09:53:37 kvm PGD 11e989067 PUD 11ec0a067 PMD 0
Apr 21 09:53:37 kvm Oops: 0002 [2] SMP
Apr 21 09:53:37 kvm CPU 2
Apr 21 09:53:37 kvm Modules linked in: w83627hf hwmon_vid xt_tcpudp 
xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 
nf_conntrack iptable_filter ip_tables x_tables tun kvm_amd kvm shpchp 
pci_hotplug k8temp i2c_nforce2 i2c_core
Apr 21 09:53:37 kvm Pid: 7130, comm: kvm Tainted: G      D 
2.6.25-gentoo-r1 #1
Apr 21 09:53:37 kvm RIP: 0010:[<ffffffff88029c85>]  [<ffffffff88029c85>] 
:kvm:x86_emulate_insn+0x3a47/0x468f
Apr 21 09:53:37 kvm RSP: 0018:ffff81011e3e5738  EFLAGS: 00010246
Apr 21 09:53:37 kvm RAX: 0000000000000010 RBX: 0000000000000000 RCX: 
ffff81011e3e7378
Apr 21 09:53:37 kvm RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff81011e3e6000
Apr 21 09:53:37 kvm RBP: ffff81011e3e7378 R08: 0000000000000000 R09: 
0000000000000000
Apr 21 09:53:37 kvm R10: ffffffff88041988 R11: ffff81011e3e7378 R12: 
ffff81011e3e7330
Apr 21 09:53:37 kvm R13: 0000000000000000 R14: ffffffff88033a20 R15: 
0000000000000ce3
Apr 21 09:53:37 kvm FS:  0000000041e68950(0063) 
GS:ffff81011ff1b000(0000) knlGS:0000000000000000
Apr 21 09:53:37 kvm CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 21 09:53:37 kvm CR2: 0000000000000000 CR3: 000000011e1af000 CR4: 
00000000000006e0
Apr 21 09:53:37 kvm DR0: ffffffff805b34f8 DR1: 0000000000000000 DR2: 
0000000000000000
Apr 21 09:53:37 kvm DR3: 0000000000000000 DR6: 00000000ffff0ff1 DR7: 
0000000000000701
Apr 21 09:53:37 kvm Process kvm (pid: 7130, threadinfo ffff81011e3e4000, 
task ffff81011e560000)
Apr 21 09:53:37 kvm Stack:  ffffffff8021a311 000000000000000f 
00000000fffffff7 ffffffff8021a49b
Apr 21 09:53:37 kvm 00000000ffffffff ffff81011ed41d00 ffffc20001926000 
0000000000000000
Apr 21 09:53:37 kvm ffffffff8021a311 ffffffff802347a0 ffff81011ed41d00 
ffffffff880419e0
Apr 21 09:53:37 kvm Call Trace:
Apr 21 09:53:37 kvm [<ffffffff8021a311>] ? do_flush_tlb_all+0x0/0x2f
Apr 21 09:53:37 kvm [<ffffffff8021a49b>] ? smp_call_function_mask+0x47/0x55
Apr 21 09:53:37 kvm [<ffffffff8021a311>] ? do_flush_tlb_all+0x0/0x2f
Apr 21 09:53:37 kvm [<ffffffff802347a0>] ? on_each_cpu+0x19/0x25
Apr 21 09:53:37 kvm [<ffffffff880419e0>]
Apr 21 09:53:37 kvm [<ffffffff88020501>] ? 
:kvm:kvm_get_cs_db_l_bits+0x9/0x2f
Apr 21 09:53:37 kvm [<ffffffff8801f101>] ? 
:kvm:emulate_instruction+0x1ef/0x3a5
Apr 21 09:53:37 kvm [<ffffffff8801f101>] ? 
:kvm:emulate_instruction+0x1ef/0x3a5
Apr 21 09:53:37 kvm [<ffffffff88041fbc>]
Apr 21 09:53:37 kvm [<ffffffff88020148>] ? 
:kvm:kvm_arch_vcpu_ioctl_run+0x44a/0x5b8
Apr 21 09:53:37 kvm [<ffffffff8801bf23>] ? :kvm:kvm_resched+0x1b4/0x9b7
Apr 21 09:53:37 kvm [<ffffffff8802ad63>] ? :kvm:kvm_pic_set_irq+0x21/0x6b
Apr 21 09:53:37 kvm [<ffffffff8801e81b>] ? 
:kvm:kvm_arch_vm_ioctl+0x38e/0x5e6
Apr 21 09:53:37 kvm [<ffffffff8026217b>] ? zone_statistics+0x41/0x94
Apr 21 09:53:37 kvm [<ffffffff8025bc16>] ? 
get_page_from_freelist+0x457/0x5af
Apr 21 09:53:37 kvm [<ffffffff8025bdc0>] ? __alloc_pages+0x52/0x2ee
Apr 21 09:53:37 kvm [<ffffffff80225e50>] ? source_load+0x25/0x41
Apr 21 09:53:37 kvm [<ffffffff802286f1>] ? find_busiest_group+0x268/0x742
Apr 21 09:53:37 kvm [<ffffffff80225552>] ? hrtick_set+0x99/0x107
Apr 21 09:53:37 kvm [<ffffffff805b3aae>] ? thread_return+0x64/0xa5
Apr 21 09:53:37 kvm [<ffffffff80249099>] ? get_futex_key+0x76/0x14d
Apr 21 09:53:37 kvm [<ffffffff80249816>] ? unqueue_me+0x6b/0x73
Apr 21 09:53:37 kvm [<ffffffff80249bc1>] ? futex_wait+0x290/0x327
Apr 21 09:53:37 kvm [<ffffffff80227c36>] ? try_to_wake_up+0xfa/0x10c
Apr 21 09:53:37 kvm [<ffffffff80229752>] ? __wake_up_common+0x49/0x74
Apr 21 09:53:37 kvm [<ffffffff80268c29>] ? find_extend_vma+0x16/0x61
Apr 21 09:53:37 kvm [<ffffffff80249099>] ? get_futex_key+0x76/0x14d
Apr 21 09:53:37 kvm [<ffffffff803c1439>] ? __up_read+0x10/0x8a
Apr 21 09:53:37 kvm [<ffffffff8024955e>] ? futex_wake+0xfa/0x10c
Apr 21 09:53:37 kvm [<ffffffff80242e5a>] ? ktime_get_ts+0x56/0x5d
Apr 21 09:53:37 kvm [<ffffffff8801c3cb>] ? :kvm:kvm_resched+0x65c/0x9b7
Apr 21 09:53:37 kvm [<ffffffff80225552>] ? hrtick_set+0x99/0x107
Apr 21 09:53:37 kvm [<ffffffff8028a311>] ? vfs_ioctl+0x29/0x6f
Apr 21 09:53:37 kvm [<ffffffff8028a5a4>] ? do_vfs_ioctl+0x24d/0x25c
Apr 21 09:53:37 kvm [<ffffffff8028a5ef>] ? sys_ioctl+0x3c/0x61
Apr 21 09:53:37 kvm [<ffffffff8020b09b>] ? 
system_call_after_swapgs+0x7b/0x80
Apr 21 09:53:37 kvm
Apr 21 09:53:37 kvm
Apr 21 09:53:37 kvm Code: 02 74 20 77 06 ff c8 74 0e eb 78 83 f8 04 74 
20 83 f8 08 74 27 eb 6c 48 8b 51 40 48 8b 41 30 88 02 eb 60 48 8b 51 40 
48 8b 41 30 <66> 89 02 eb 53 48 8b 41 40 8b 49 30 48 89 08 eb 47 48 8b 51 40
Apr 21 09:53:37 kvm RIP  [<ffffffff88029c85>] 
:kvm:x86_emulate_insn+0x3a47/0x468f
Apr 21 09:53:37 kvm RSP <ffff81011e3e5738>
Apr 21 09:53:37 kvm CR2: 0000000000000000
Apr 21 09:53:37 kvm ---[ end trace 8b01d2fbd0fdd57f ]---


-- 
Tomas Rusnak

[kvm-devel] Правильно оформить договор

From: Бухгалтерия <pl...@br...> - 2008-04-21 08:43:36

Бухгалтеру о договорной работе организации - правовые основы и налоговый аспект
7 мая 2008, г. Мoсква

Прoграмма семинара

Программа семинара

1. Как правильно оформить договор, обязательные и дополнительные условия договоров. Когда можно считать соблюденной простую письменную форму договора. Когда договор требует государственной регистрации или нотариального заверения. Рамочные договоры. Оферта (одностороннее предложение заключить сделку). Подписание договора.
2. Гарантийные условия в договорах - залог, задаток, неустойка (налоговые преимущества гарантий по сравнению с авансами по договору). Регулирование в договорах и ╚по умолчанию╩ вопросов возмещения ущерба от неисполнения договора. Упущенная выгода. Ничтожность противозаконных условий сделок и ее налоговые последствия.
3. Разрешение споров по договорам. Претензионная работа. Признание долга сомнительным, безнадежным, списание долга. Сроки исковой давности.
4. Цена в договоре - способы обозначения (должна ли быть указана определенная цена), единицы изменения (рубли, иностранная валюта, условные единицы), обоснование рыночной цены. Скидки - разовые и накопительные - порядок предоставления и учета.
5. Порядок расчетов по договору, наличные и безналичные платежи с учетом изменений порядка расчета наличными согласно Указанию ЦБ от 20.06.2007 N 1843-У "О предельном размере расчетов наличными деньгами и расходовании наличных денег, поступивших в кассу юридического лица или кассу индивидуального предпринимателя". Даты признания доходов и расходов по договорам. Расчетные документы в рублях, валюте, условных единицах. Сопроводительные и расчетные документы в электронном виде. Акты: обязательно ли составлять акт, ждать ли окончания договора или составлять акт поэтапно, форма акта, позиция Минфина относительно порядка заполнения актов и детализации сведений в них.
6. Договоры между юридическими лицами - оформление, учет и налогообложение. Зависимость налогового бремени от вида и содержания договора. Договор и налог на прибыль. Договор и НДС.
- договор купли-продажи (предмет, обязательные условия, оформление и последствия возврата товара),
- договор мены (обмен и мена - в чем отличия, рыночная цена сделки),
- договор аренды (регистрация договоров, аренда автотранспортного средства, аренда офиса),
- договор страхования (личное и имущественное страхование, страхование ответственности, налоговые льготы),
- договоры займа (признание расходов, беспроцентные займы),
- договоры безвозмездной передачи и безвозмездного пользования (ограничения в сфере применения и признания расходов),
- посреднические договоры (особенности договоров комиссии, агентирования, поручения),
- договор простого товарищества (участники, налоговые преимущества, доля участника и распределение расходов и доходов),
- договор возмездного оказания услуг (существенные условия договора, разновидности договоров услуг, преимущества перед договором подряда).
6. Договоры организации с физическими лицами: коллективные, трудовые, гражданско-правовые, договоры с индивидуальными предпринимателями - учет и налогообложение, выплаты по таким договорам. Возможность управления налоговой нагрузкой на предприятие с помощью таких договоров.

Пpoдoлжительнoсть oбучения: с 10 дo 17 часoв (с пеpеpывoм на oбед и кoфе-паузу).
Местo oбучения: г. Мoсква, 5 мин. пешкoм oт м. Академическая.
Стoимoсть oбучения: 4900 pуб. (с НДС).
(В стoимoсть вxoдит: pаздатoчный матеpиал, кoфе-пауза, oбед в pестopане).

Пpи oтсутствии вoзмoжнoсти пoсетить семинаp, мы пpедлагаем пpиoбpести егo видеoвеpсию на DVD/CD дискаx или видеoкассетаx (пpилагается автopский pаздатoчный матеpиал).
Цена видеoкуpса - 3500 pублей, с учетoм НДС.

Для pегистpации на семинаp неoбxoдимo oтпpавить нам пo факсу: pеквизиты opганизации, тему и дату семинаpа, пoлнoе ФИo участникoв, кoнтактный телефoн и факс.
Для заказа видеoкуpса неoбxoдимo oтпpавить нам пo факсу: pеквизиты opганизации, тему видеoкуpса, указать нoситель (ДВД или СД диски), телефoн, факс, кoнтактнoе лицo и тoчный адpес дoставки.

Пoлучить дoпoлнительную инфopмацию и заpегистpиpoваться мoжнo:
пo т/ф: ( Ч 9 5 ) 5 ЧЗ = 8 8 = Ч 6

[kvm-devel] [ RfC / patch ] kvmclock fixes

From: Gerd H. <kr...@re...> - 2008-04-21 08:34:07

Attachments: kvmclock-4.diff

Gerd Hoffmann wrote:
> Marcelo Tosatti wrote:

>> Haven't seen Gerd's guest patches ? 
> 
> I'm still busy cooking them up.  I've mentioned them in a mail, but they
> didn't ran over the list (yet).  Stay tuned ;)

It compiles, ship it!

This time as all-in one patch (both guest and host side).  Almost
untested and not (yet) splitted into pieces.

Changes:

  * Host: make kvm pv clock really compatible with xen pv clock.
  * Guest/xen: factor out some xen clock code into a separate
               source file (pvclock.[ch]), so kvm can reuse it.
  * Guest/kvm: make kvm clock compatible with xen clock by using
               the common code bits.

Tests, reviews and comments are welcome.

cheers,
  Gerd

-- 
http://kraxel.fedorapeople.org/xenner/

[kvm-devel] KVM Test result, kernel 6cf5973.., userspace 4320192.. -- One Issue Fixed

From: Yunfeng Z. <yun...@in...> - 2008-04-21 08:29:57

Hi All,

This is today's KVM test result against kvm.git 
6cf59734fc9bc89954d0157524eea156c2f9a5ab and kvm-userspace.git 
43201923a67647913b67da255ca60f0269a3e34a.

One Issue Fixed
================================================
1.Can't boot smp guests on ia32e host
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1944629&group_id=180599 


Three Old Issues:
================================================
1. Booting four guests likely fails
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1919354&group_id=180599 

2.  booting smp windows guests has 30% chance of hang
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1910923&group_id=180599 

3. Cannot boot guests with hugetlbfs
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1941302&group_id=180599 


Test environment
================================================ 
Platform                Woodcrest
CPU                     4
Memory size         8G'
 
Details
================================================
IA32-pae: 
1. boot guest with 256M memory                                  PASS
2. boot two windows xp guest                                       PASS
3. boot 4 same guest in parallel                                    PASS
4. boot linux and windows guest in parallel                     PASS
5. boot guest with 1500M memory                                 PASS
6. boot windows 2003 with ACPI enabled                       PASS
7. boot Windows xp with ACPI enabled                          PASS
8. boot Windows 2000 without ACPI                              PASS
9. kernel build on SMP linux guest                                PASS
10. LTP on linux guest                                                  PASS
11. boot base kernel linux                                             PASS
12. save/restore 32-bit HVM guests                               PASS
13. live migration 32-bit HVM guests                              PASS
14. boot SMP Windows xp with ACPI enabled                PASS
15. boot SMP Windows 2003 with ACPI enabled             PASS
16. boot SMP Windows 2000 with ACPI enabled             PASS
 
================================================
IA32e: 
1. boot four 32-bit guest in 
parallel                                      PASS
2. boot four 64-bit guest in 
parallel                                      PASS
3. boot 4G 64-bit 
guest                                                      PASS
4. boot 4G pae 
guest                                                         PASS
5. boot 32-bit linux and 32 bit windows guest in parallel        PASS
6. boot 32-bit guest with 1500M memory                             PASS
7. boot 64-bit guest with 1500M memory                             PASS
8. boot 32-bit guest with 256M memory                               PASS
9. boot 64-bit guest with 256M memory                               PASS
10. boot two 32-bit windows xp in parallel                            PASS
11. boot four 32-bit different guest in para                             
PASS
12. save/restore 64-bit linux guests                                     
PASS
13. save/restore 32-bit linux guests                                     
PASS
14. boot 32-bit SMP windows 2003 with ACPI enabled          PASS
15. boot 32-bit SMP Windows 2000 with ACPI enabled         PASS
16. boot 32-bit SMP Windows xp with ACPI enabled            PASS
17. boot 32-bit Windows 2000 without ACPI                        PASS
18. boot 64-bit Windows xp with ACPI enabled                    PASS
19. boot 32-bit Windows xp without ACPI                            PASS
20. boot 64-bit UP 
vista                                                      PASS
21. boot 64-bit SMP 
vista                                                   PASS
22. kernel build in 32-bit linux guest OS                              PASS
23. kernel build in 64-bit linux guest OS                              PASS
24. LTP on 32-bit linux guest OS                                        PASS
25. LTP on 64-bit linux guest OS                                        PASS
26. boot 64-bit guests with ACPI enabled                             PASS
27. boot 32-bit 
x-server                                                       PASS  
28. boot 64-bit SMP windows XP with ACPI enabled             PASS
29. boot 64-bit SMP windows 2003 with ACPI enabled          PASS
30. live migration 64bit linux 
guests                                     PASS
31. live migration 32bit linux 
guests                                     PASS
32. reboot 32bit windows xp guest                                       PASS
33. reboot 32bit windows xp guest                                       PASS
 
 
Report Summary on IA32-pae
Summary Test Report of Last Session
=====================================================================
                          Total   Pass    Fail    NoResult   Crash
=====================================================================
control_panel               8       5       3         0        0
Restart                     2       2       0         0        0
gtest                       15      14      1         0        0
=====================================================================
control_panel               8       5       3         0        0
 :KVM_LM_PAE_gPAE           1       0       1         0        0
 :KVM_four_sguest_PAE_gPA   1       1       0         0        0
 :KVM_256M_guest_PAE_gPAE   1       1       0         0        0
 :KVM_linux_win_PAE_gPAE    1       1       0         0        0
 :KVM_1500M_guest_PAE_gPA   1       1       0         0        0
 :KVM_SR_PAE_gPAE           1       0       1         0        0
 :KVM_two_winxp_PAE_gPAE    1       1       0         0        0
 :KVM_4G_guest_PAE_gPAE     1       0       1         0        0
Restart                     2       2       0         0        0
 :GuestPAE_PAE_gPAE         1       1       0         0        0
 :BootTo32pae_PAE_gPAE      1       1       0         0        0
gtest                       15      14      1         0        0
 :ltp_nightly_PAE_gPAE      1       1       0         0        0
 :boot_up_acpi_PAE_gPAE     1       1       0         0        0
 :reboot_xp_PAE_gPAE        1       1       0         0        0
 :boot_up_vista_PAE_gPAE    1       0       1         0        0
 :boot_up_acpi_xp_PAE_gPA   1       1       0         0        0
 :boot_up_acpi_win2k3_PAE   1       1       0         0        0
 :boot_base_kernel_PAE_gP   1       1       0         0        0
 :boot_smp_acpi_win2k3_PA   1       1       0         0        0
 :boot_smp_acpi_win2k_PAE   1       1       0         0        0
 :boot_up_acpi_win2k_PAE_   1       1       0         0        0
 :boot_smp_acpi_xp_PAE_gP   1       1       0         0        0
 :boot_up_noacpi_win2k_PA   1       1       0         0        0
 :boot_smp_vista_PAE_gPAE   1       1       0         0        0
 :bootx_PAE_gPAE            1       1       0         0        0
 :kb_nightly_PAE_gPAE       1       1       0         0        0
=====================================================================
Total                       25      21      4         0        0
 
Report Summary on IA32e
Summary Test Report of Last Session
=====================================================================
                          Total   Pass    Fail    NoResult   Crash
=====================================================================
control_panel               15      15      0         0        0
Restart                     3       3       0         0        0
gtest                       25      25      0         0        0
=====================================================================
control_panel               15      15      0         0        0
 :KVM_LM_64_g64             1       1       0         0        0
 :KVM_four_sguest_64_gPAE   1       1       0         0        0
 :KVM_4G_guest_64_g64       1       1       0         0        0
 :KVM_four_sguest_64_g64    1       1       0         0        0
 :KVM_linux_win_64_gPAE     1       1       0         0        0
 :KVM_1500M_guest_64_gPAE   1       1       0         0        0
 :KVM_SR_64_g64             1       1       0         0        0
 :KVM_LM_64_gPAE            1       1       0         0        0
 :KVM_256M_guest_64_g64     1       1       0         0        0
 :KVM_1500M_guest_64_g64    1       1       0         0        0
 :KVM_4G_guest_64_gPAE      1       1       0         0        0
 :KVM_SR_64_gPAE            1       1       0         0        0
 :KVM_256M_guest_64_gPAE    1       1       0         0        0
 :KVM_two_winxp_64_gPAE     1       1       0         0        0
 :KVM_four_dguest_64_gPAE   1       1       0         0        0
Restart                     3       3       0         0        0
 :GuestPAE_64_gPAE          1       1       0         0        0
 :BootTo64_64_gPAE          1       1       0         0        0
 :Guest64_64_gPAE           1       1       0         0        0
gtest                       25      25      0         0        0
 :boot_up_acpi_64_gPAE      1       1       0         0        0
 :boot_up_noacpi_xp_64_gP   1       1       0         0        0
 :boot_smp_acpi_xp_64_g64   1       1       0         0        0
 :boot_base_kernel_64_gPA   1       1       0         0        0
 :boot_smp_acpi_win2k3_64   1       1       0         0        0
 :boot_smp_acpi_win2k_64_   1       1       0         0        0
 :boot_base_kernel_64_g64   1       1       0         0        0
 :bootx_64_gPAE             1       1       0         0        0
 :kb_nightly_64_gPAE        1       1       0         0        0
 :ltp_nightly_64_g64        1       1       0         0        0
 :boot_up_acpi_64_g64       1       1       0         0        0
 :boot_up_noacpi_win2k_64   1       1       0         0        0
 :boot_smp_acpi_xp_64_gPA   1       1       0         0        0
 :boot_smp_vista_64_gPAE    1       1       0         0        0
 :boot_up_acpi_win2k3_64_   1       1       0         0        0
 :reboot_xp_64_gPAE         1       1       0         0        0
 :bootx_64_g64              1       1       0         0        0
 :boot_up_vista_64_g64      1       1       0         0        0
 :boot_smp_vista_64_g64     1       1       0         0        0
 :boot_up_acpi_xp_64_g64    1       1       0         0        0
 :boot_up_vista_64_gPAE     1       1       0         0        0
 :ltp_nightly_64_gPAE       1       1       0         0        0
 :boot_smp_acpi_win2k3_64   1       1       0         0        0
 :boot_up_noacpi_win2k3_6   1       1       0         0        0
 :kb_nightly_64_g64         1       1       0         0        0
=====================================================================
Total                       43      43      0         0        0
 

Best Regards,
Yunfeng

Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(

From: Gerd H. <kr...@re...> - 2008-04-21 07:31:51

Jeremy Fitzhardinge wrote:
> Gerd Hoffmann wrote:
>> I'm looking at the guest side of the issue right now, trying to identify
>> common code, and while doing so noticed that xen does the
>> version-check-loop in both get_time_values_from_xen(void) and
>> xen_clocksource_read(void), and I can't see any obvious reason for that.
>>  The loop in xen_clocksource_read(void) is not needed IMHO.  Can I
>> drop it?
> 
> No.  The get_nsec_offset() needs to be atomic with respect to the
> get_time_values() parameters.

Hmm, I somehow fail to see a case where it could be non-atomic ...

get_time_values() copies a consistent snapshot, thus
xen_clocksource_read() doesn't race against xen updating the fields.
The snapshot is in a per-cpu variable, thus it doesn't race against
other guest vcpus running get_time_values() at the same time.

> There could be a loopless
> __get_time_values() for use in this case, but given that it almost never
> loops, I don't think its worthwhile.

"in this case" ???  I'm confused.  There is only a single user of
get_nsec_offset(), which is xen_clocksource_read() ...

cheers,
  Gerd

-- 
http://kraxel.fedorapeople.org/xenner/

Re: [kvm-devel] paravirt clock stil causing hangs in kvm-65

From: Gerd H. <kr...@re...> - 2008-04-21 07:15:31

Marcelo Tosatti wrote:
>> >From what me and marcelo discussed, I think there's a possibility that
>> it has marginally something to do with precision of clock calculation.
>> Gerd's patches address that issues. Can somebody test this with those
>> patches (both guest and host), while I'm off ?
> 
> Haven't seen Gerd's guest patches ? 

I'm still busy cooking them up.  I've mentioned them in a mail, but they
didn't ran over the list (yet).  Stay tuned ;)

cheers,
  Gerd

-- 
http://kraxel.fedorapeople.org/xenner/

[kvm-devel] 32-bit binaries failing in 64 bit guests after using vmport

From: Soren H. <so...@ub...> - 2008-04-21 07:08:15

Esteemed kvm developers!

I've been trying to debug this bug

    https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/219165

It originally revealed itself by failing to run grub (which is a 32 bit
binary) when installing Ubuntu from our live cd. It turned out to be a
more general problem of 32 bit binaries failing to run. The server
install worked like a charm. I eventually discovered that loading the
vmmouse driver triggered it and narrowed it down to the call to
kvm_load_registers in vmport_ioport_read. 

We're releasing on Thursday, and I needed a quick fix, so I reverted the
calls to kvm_{save,load}_registers in vmport_ioport_read to the old code
that simply saved the eax, ebx, ecx, edx, esi, and edi registers, but
I'm supposing kvm_{load,save}_registers really should work here.

I dug a bit further into the code and tried disabling various pieces of
the kvm_load_registers until it finally worked again. The problem seems
to only arise when the lstar msr is loaded. I've looked at the code, but
seeing as three days ago I didn't know there was such a thing as an
lstar msr, I'm finding myself getting stuck. :)

Any pointers in the right direction would be lovely.

-- 
Soren Hansen               | 
Virtualisation specialist  | Ubuntu Server Team
Canonical Ltd.             | http://www.ubuntu.com/

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Avi K. <av...@qu...> - 2008-04-21 06:44:36

Javier Guerra Giraldez wrote:
> On Sunday 20 April 2008, Avi Kivity wrote:
>   
>> Also, I'd presume that those that need 10K IOPS and above will not place
>> their high throughput images on a filesystem; rather on a separate SAN LUN.
>>     
>
> i think that too; but still that LUN would be accessed by the VM's via one of 
> these IO emulation layers, right?
>
>   

Yes.  Hopefully Linux aio.

> or maybe you're advocating using the SAN initiator in the VM instead of the 
> host?
>   

That works too, especially for iSCSI, but that's not what I'm advocating.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

[kvm-devel] Легализация документов

From: Учет в б. < <ron...@ep...> - 2008-04-21 06:43:11

Дoкументooбoрoт в бухгалтерскoм учете
22 апpеля 2008, г. Мoсква

Прoграмма семинара

1. Дoкументы в бухгалтерскoм учете и их классификация, график дoкументooбoрoта. oрганизация дoкументooбoрoта на предприятии, сoставление oбязательных лoкальных нoрмативных актoв: учетнoй пoлитики, пoлoжения пo кассе и пoдoтчетным суммам, пoлoжения пo oплате труда и премирoвании, пoлoжения пo пoдoтчетным суммам, графика oтпускoв, правил внутреннегo распoрядка, пoлoжения пo аттестации, oб инвентаризации и т.д.
2. Правo пoдписи дoкументoв. Приказ o праве пoдписи как oбязательный дoкумент. Передача права пoдписи на oснoвании дoвереннoсти. Факсимильная и электрoнная цифрoвая пoдпись на дoкументе - взгляд кoнтрoлирующих oрганoв, пoлoжения закoнoдательства, закoн "oб инфoрмации┘" легализует электрoнную пoдпись. oсoбеннoсти пoдписи счетoв-фактур. Пoследствия пoдписания дoкументoв ненадлежащим лицoм. Дoлжен ли налoгoплательщик прoверять правoмернoсть пoдписи на дoкументах, пoлученных oт кoнтрагентoв.
3. Легализация дoкументoв, oставленных на инoстранных языках, дoкументoв, принятых в делoвoм oбoрoте в зарубежных странах.
4. Дoкументы стрoгoй oтчетнoсти. Нoвoе в 2008 г. Бланки стрoгoй oтчетнoсти - учет, пoрядoк хранения, утверждение и изгoтoвление бланка. Замена бланками стрoгoй oтчетнoсти кассoвых чекoв - кoгда этo вoзмoжнo. Пoлученные дoкументы на бланках стрoгoй oтчетнoсти: счета гoстиниц, талoны и карты на тoпливo и др. Денежные дoкументы в кассе oрганизации.
5. Правила oфoрмления дoкументoв. Унифицирoванные фoрмы первичных дoкументoв - oбязательнo ли их применять. Самoстoятельнoе утверждение фoрм первичных дoкументoв - oбязательные реквизиты, пoрядoк утверждения и применения. Типичные нарушения при oфoрмлении первичных дoкументoв. Дoгoвoр и акт, как первичные дoкументы.
6. Хранение дoкументации. Срoки и oрганизация хранения бухгалтерских и управленческих дoкументoв, дoкументoв кадрoвoгo учета. oтветственнoсть бухгалтера и рукoвoдителя за хранение дoкументации. Истребoвание дoкументoв oрганизации при налoгoвых прoверках (нoвый пoрядoк в 2007 г.), прoверках, oрганизуемых внебюджетными фoндами, банками, инспекцией пo труду - oграничения пo срoкам давнoсти и набoру фoрм. Выемка дoкументoв или их кoпий. Дoступ дoлжнoстных лиц кoнтрoлирующих oрганoв к местам хранения дoкументoв. oсoбеннoсти дoкументальнoй камеральнoй и выезднoй прoверoк (нoвый пoрядoк в 2007 г.).
7. Внесение исправлений в учетные дoкументы. Случаи, кoгда правка запрещена. Кoрректирoвание данных учета и oтчетнoсти. Изменение налoгoвых деклараций за прoшедшие налoгoвые и oтчетные периoды.
8. Вoсстанoвление утраченных первичных дoкументoв
9. Дoступ к инфoрмации, сoдержащейся в первичнoй и oтчетнoй дoкументации. Публичная инфoрмация и инфoрмация, сoставляющая кoммерческую тайну.
10. Придание юридическoй силы дoкументам, сoзданным на кoмпьютере. Хранение дoкументoв в электрoннoм виде. Кoгда требуется распечатка.

Для pегистpации на семинаp неoбxoдимo oтпpавить нам пo факсу или электpoннoй пoчте: pеквизиты opганизации, тему и дату семинаpа, пoлнoе ФИo участникoв, кoнтактный телефoн и факс.
Для заказа видеoкуpса неoбxoдимo oтпpавить нам пo факсу или электpoннoй пoчте: pеквизиты opганизации, тему видеoкуpса, указать нoситель (ДВД или СД диски), телефoн, факс, кoнтактнoе лицo и тoчный адpес дoставки.

Пoлучить дoпoлнительную инфopмацию и заpегистpиpoваться мoжнo:
пo т/ф: (495) 543-88-46
пo электpoннoй пoчте: so...@se...

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Avi K. <av...@qu...> - 2008-04-21 06:43:08

Jamie Lokier wrote:
> Avi Kivity wrote:
>   
>>> Does that mean "for the majority of deployments, the slow version is
>>> sufficient.  The few that care about performance can use Linux AIO?"
>>>       
>> In essence, yes. s/slow/slower/ and s/performance/ultimate block device 
>> performance/.
>>
>> Many deployments don't care at all about block device performance; they 
>> care mostly about networking performance.
>>     
>
> That's interesting.  I'd have expected block device performance to be
> important for most things, for the same reason that disk performance
> is (well, reasonably) important for non-virtual machines.
>
>   

Seek time is important.  Bandwidth is somewhat important.  But for one- 
and two- spindle workloads (the majority), the cpu utilization induced 
by getting requests to the disk is not important, and that's what we're 
optimizing here.

Disks work at around 300 Hz.  Processors at around 3 GHz.  That's seven 
orders of magnitude difference.  Even if you spent 100 usec calculating 
what's the next best seek, even if it saves you only 10% of seeks it's a 
win.  And of course modern processors spend a few microseconds at most 
getting a request out.

You really need 50+ disks or a large write-back cache to make 
microoptimizations around the submission path felt.

> But as you say next:
>
>   
>>> I'm under the impression that the entire and only point of Linux AIO
>>> is that it's faster than POSIX AIO on Linux.
>>>       
>> It is.  I estimate posix aio adds a few microseconds above linux aio per 
>> I/O request, when using O_DIRECT.  Assuming 10 microseconds, you will 
>> need 10,000 I/O requests per second per vcpu to have a 10% performance 
>> difference.  That's definitely rare.
>>     
>
> Oh, I didn't realise the difference was so small.
>
> At such a tiny difference, I'm wondering why Linux-AIO exists at all,
> as it complicates the kernel rather a lot.  I can see the theoretical
> appeal, but if performance is so marginal, I'm surprised it's in
> there.
>
>   

Linux aio exists, but that's all that can be said for it.  It works 
mostly for raw disks, doesn't integrate with networking, and doesn't 
advance at the same pace as the rest of the kernel.  I believe only 
databases use it (and a userspace filesystem I wrote some time ago).

> I'm also surprised the Glibc implementation of AIO using ordinary
> threads is so close to it.  

Why are you surprised?

Actually the glibc implementation could be improved from what I've 
heard.  My estimates are for a thread pool implementation, but there is 
not reason why glibc couldn't achieve exactly the same performance.

> And then, I'm wondering why use AIO it
> all: it suggests QEMU would run about as fast doing synchronous I/O in
> a few dedicated I/O threads.
>
>   

Posix aio is the unix API for this, why not use it?

>> Also, I'd presume that those that need 10K IOPS and above will not place 
>> their high throughput images on a filesystem; rather on a separate SAN LUN.
>>     
>
> Does the separate LUN make any difference?  I thought O_DIRECT on a
> filesystem was meant to be pretty close to block device performance.
>   

On a good extent-based filesystem like XFS you will get good performance 
(though more cpu overhead due to needing to go through additional 
mapping layers.  Old clunkers like ext3 will require additional seeks or 
a ton of cache (1 GB per 1 TB).

> I base this on messages here and there which say swapping to a file is
> about as fast as swapping to a block device, nowadays.
>   

Swapping to a file preloads the block mapping into memory, so the 
filesystem is not involved at all in the I/O path.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

From: Avi K. <av...@qu...> - 2008-04-21 06:13:55

Marcelo Tosatti wrote:
> On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote:
>   
>>> The iperf numbers are pretty good. Performance of UP guests increase 
>>> slightly but SMP
>>> is quite significant.
>>>       
>> I expect you're seeing contention induced by memcpy()s and inefficient 
>> emulation.  With the dma api, I expect the benefit will drop.
>>     
>
> You still have to memcpy() with the dma api. Even with vringfd the
> kernel->user copy has to be performed under the global mutex protection,
> difference being that several packets can be copied per-syscall instead
> of only one.
>
>   

Block does the copy outside the mutex protection, so net can be adapted 
to do the same.  It does mean we will need to block all I/O temporarily 
during memory hotplug.

>> For pure cpu emulation, there is a ton of work to be done: protecting
>> the translator as well as making the translated code smp safe.
>>     
>
> I now believe there is a lot of work (which was not clear before).
> Not particularly interested in getting real emulation to be
> multithreaded.
>
> Anyways, the lack of multithreading in qemu emulation should not be a
> blocker for these patches to get in, since these are infrastructural
> changes.
>
>   

Getting this into qemu upstream is essential as this is far more 
intrusive than anything else we've done.  But again, I believe there are 
many other fruit hanging from lower branches.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

157 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 51 52 53 54 55 .. 703 > >> (Page 53 of 703)

2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (33)	Nov (325)	Dec (320)
2007	Jan (484)	Feb (438)	Mar (407)	Apr (713)	May (831)	Jun (806)	Jul (1023)	Aug (1184)	Sep (1118)	Oct (1461)	Nov (1224)	Dec (1042)
2008	Jan (1449)	Feb (1110)	Mar (1428)	Apr (1643)	May (682)	Jun	Jul	Aug	Sep	Oct	Nov	Dec