kvm-devel Mailing List for kernel virtual machine (Page 30)

Brought to you by: avik, mtosatti

kvm-devel — kernel virtual machine development

You can subscribe to this list here.

2006	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (33)	_Nov (325)	_Dec (320)
2007	_Jan (484)	_Feb (438)	_Mar (407)	_Apr (713)	_May (831)	_Jun (806)	_Jul (1023)	_Aug (1184)	_Sep (1118)	_Oct (1461)	_Nov (1224)	_Dec (1042)
2008	_Jan (1449)	_Feb (1110)	_Mar (1428)	_Apr (1643)	_May (682)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 28 29 30 31 32 .. 703 > >> (Page 30 of 703)

Re: [kvm-devel] Fwd: [kvm-ppc-devel] [PATCH] kvmppc: deliver INTERRUPT_FP_UNAVAIL to the guest

From: Avi K. <av...@qu...> - 2008-04-30 13:39:55

Hollis Blanchard wrote:
> Acked-by: Hollis Blanchard <ho...@us...>
>
> Avi, please apply for 2.6.26.
>
>   

Sure thing.  Thanks.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

[kvm-devel] [PATCH] libkvm: fix physical_memory calculation

From: Jan K. <jan...@si...> - 2008-04-30 13:39:52

This looks bogus, but it is so far without practical impact (phys_start
is always 0 when we do the calculation).

Signed-off-by: Jan Kiszka <jan...@si...>
---
 libkvm/libkvm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/libkvm/libkvm.c
===================================================================
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -550,7 +550,7 @@ int kvm_register_userspace_phys_mem(kvm_
 	int r;
 
 	if (!kvm->physical_memory)
-		kvm->physical_memory = userspace_addr - phys_start;
+		kvm->physical_memory = userspace_addr + phys_start;
 
 	memory.slot = get_free_slot(kvm);
 	r = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &memory);

Re: [kvm-devel] [PATCH] x86: handle double and triple faults for every exception

From: Avi K. <av...@qu...> - 2008-04-30 13:29:51

Jan Kiszka wrote:
>>> Clear the pending original exception when raising a triple fault. This
>>> allows to re-use the vcpu instance, e.g. after a reset which is
>>> typically issued as reaction on the triple fault.
>>>
>>> Signed-off-by: Jan Kiszka <jan...@si...>
>>>
>>> ---
>>>  arch/x86/kvm/x86.c |    4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> Index: b/arch/x86/kvm/x86.c
>>> ===================================================================
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -149,8 +149,10 @@ static void handle_multiple_faults(struc
>>>      if (vcpu->arch.exception.nr != DF_VECTOR) {
>>>          vcpu->arch.exception.nr = DF_VECTOR;
>>>          vcpu->arch.exception.error_code = 0;
>>> -    } else
>>> +    } else {
>>>          set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>>> +        vcpu->arch.exception.pending = false;
>>> +    }
>>>  }
>>>  
>>>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>>>
>>>   
>>>       
>> There's a bigger problem here.  The exception queue is hidden state that
>> qemu and load and save.
>>     
>
> Could you elaborate a bit on what the problematic scenario precisely is
> (that pending triple faults would not be saved/restored while pending
> exceptions are?), and if I/we can do anything to resolve it?
>   

Two scenarios:

  savevm (no pending exception)
  guest runs...
  loadvm (with a pending exception in the current state)
  spurious exception injected

  savevm (pending exception, lost)
  new qemu instance (or live migration)
  loadvm (exception not delivered)

The second scenario is not too bad, I guess: for fault-like exceptions, 
the first instruction would fault again and the exception would be 
regenerated.  The first scenario is bad, but I guess very unlikely.

One fix would be to expose the exception queue to userspace.  I don't 
like it since this is not x86 architectural state but a kvm artifact.  
Maybe we should clear the exception queue on kvm_set_sregs() (that 
should fix the reset case as well).

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH] x86: handle double and triple faults for every exception

From: Jan K. <jan...@si...> - 2008-04-30 13:12:45

Joerg Roedel wrote:
> The current KVM x86 exception code handles double and triple faults only for
> page fault exceptions. This patch extends this detection for every exception
> that gets queued for the guest.
> 
> Signed-off-by: Joerg Roedel <joe...@am...>
> Cc: Jan Kiszka <jan...@si...>
> ---
>  arch/x86/kvm/x86.c |   31 +++++++++++++++++--------------
>  1 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 578a0c1..c05aa32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -144,9 +144,21 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
>  
> +static void handle_multiple_faults(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.exception.nr != DF_VECTOR) {
> +		vcpu->arch.exception.nr = DF_VECTOR;
> +		vcpu->arch.exception.error_code = 0;
> +	} else
> +		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> +}
> +
>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>  {
> -	WARN_ON(vcpu->arch.exception.pending);
> +	if (vcpu->arch.exception.pending) {
> +		handle_multiple_faults(vcpu);
> +		return;
> +	}
>  	vcpu->arch.exception.pending = true;
>  	vcpu->arch.exception.has_error_code = false;
>  	vcpu->arch.exception.nr = nr;
> @@ -157,25 +169,16 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  			   u32 error_code)
>  {
>  	++vcpu->stat.pf_guest;
> -	if (vcpu->arch.exception.pending) {
> -		if (vcpu->arch.exception.nr == PF_VECTOR) {
> -			printk(KERN_DEBUG "kvm: inject_page_fault:"
> -					" double fault 0x%lx\n", addr);
> -			vcpu->arch.exception.nr = DF_VECTOR;
> -			vcpu->arch.exception.error_code = 0;
> -		} else if (vcpu->arch.exception.nr == DF_VECTOR) {
> -			/* triple fault -> shutdown */
> -			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> -		}
> -		return;
> -	}
>  	vcpu->arch.cr2 = addr;
>  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
>  }
>  
>  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
>  {
> -	WARN_ON(vcpu->arch.exception.pending);
> +	if (vcpu->arch.exception.pending) {
> +		handle_multiple_faults(vcpu);
> +		return;
> +	}
>  	vcpu->arch.exception.pending = true;
>  	vcpu->arch.exception.has_error_code = true;
>  	vcpu->arch.exception.nr = nr;

And here is an add-on patch to fix reset-on-triple-fault:


Clear the pending original exception when raising a triple fault. This
allows to re-use the vcpu instance, e.g. after a reset which is
typically issued as reaction on the triple fault.

Signed-off-by: Jan Kiszka <jan...@si...>

---
 arch/x86/kvm/x86.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: b/arch/x86/kvm/x86.c
===================================================================
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -149,8 +149,10 @@ static void handle_multiple_faults(struc
 	if (vcpu->arch.exception.nr != DF_VECTOR) {
 		vcpu->arch.exception.nr = DF_VECTOR;
 		vcpu->arch.exception.error_code = 0;
-	} else
+	} else {
 		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
+		vcpu->arch.exception.pending = false;
+	}
 }
 
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)

[kvm-devel] [PATCH] libkvm: dump all errors to stderr

From: Jan K. <jan...@si...> - 2008-04-30 13:08:51

Minor cleanup I came across while reverting printf instrumentations.

Signed-off-by: Jan Kiszka <jan...@si...>
---
 libkvm/libkvm-x86.c |    4 ++--
 libkvm/libkvm.c     |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: b/libkvm/libkvm-x86.c
===================================================================
--- a/libkvm/libkvm-x86.c
+++ b/libkvm/libkvm-x86.c
@@ -117,7 +117,7 @@ static int kvm_init_tss(kvm_context_t kv
 		 */
 		r = kvm_set_tss_addr(kvm, 0xfffbd000);
 		if (r < 0) {
-			printf("kvm_init_tss: unable to set tss addr\n");
+			fprintf(stderr, "kvm_init_tss: unable to set tss addr\n");
 			return r;
 		}
 
@@ -157,7 +157,7 @@ int kvm_create_pit(kvm_context_t kvm)
 			if (r >= 0)
 				kvm->pit_in_kernel = 1;
 			else {
-				printf("Create kernel PIC irqchip failed\n");
+				fprintf(stderr, "Create kernel PIC irqchip failed\n");
 				return r;
 			}
 		}
Index: b/libkvm/libkvm.c
===================================================================
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -368,7 +368,7 @@ void kvm_create_irqchip(kvm_context_t kv
 			if (r >= 0)
 				kvm->irqchip_in_kernel = 1;
 			else
-				printf("Create kernel PIC irqchip failed\n");
+				fprintf(stderr, "Create kernel PIC irqchip failed\n");
 		}
 	}
 #endif
@@ -877,7 +877,7 @@ again:
 	if (r == -1 && errno != EINTR && errno != EAGAIN) {
 		r = -errno;
 		post_kvm_run(kvm, vcpu);
-		printf("kvm_run: %s\n", strerror(-r));
+		fprintf(stderr, "kvm_run: %s\n", strerror(-r));
 		return r;
 	}

Re: [kvm-devel] Wiki: Add SMP Count to "Guest Support Status"

From: Fabian D. <fab...@gm...> - 2008-04-30 13:06:59

Attachments: smime.p7s

Avi Kivity wrote:
> Fabian Deutsch wrote:
> > Hey.
> >
> > I've been trying Microsoft Windows 2003 a couple of times. The wiki
> > tells me that "everything" should work okay. It does, when using -smp 1,
> > but gets ugly when using -smp 2 or so.
> >
> > SO might it be useful, to add the column "smp" to the "Guest Support
> > Status" Page in the wiki?
> >   
> 
> SMP Windows work best if you have FlexPriority on your hardware.  What 
> host cpu are you using?

In general I am not able to install Microsoft Windows guests when using
-smp > 1 on the following hardware (and kvm modules+userspace head):
Intel(R) Xeon(R) CPU           X3210  @ 2.13GHz

Re: [kvm-devel] [PATCH] x86: handle double and triple faults for every exception

From: Jan K. <jan...@si...> - 2008-04-30 13:03:40

Avi Kivity wrote:
> Jan Kiszka wrote:
>> Joerg Roedel wrote:
>>  
>>> The current KVM x86 exception code handles double and triple faults
>>> only for
>>> page fault exceptions. This patch extends this detection for every
>>> exception
>>> that gets queued for the guest.
>>>
>>> Signed-off-by: Joerg Roedel <joe...@am...>
>>> Cc: Jan Kiszka <jan...@si...>
>>> ---
>>>  arch/x86/kvm/x86.c |   31 +++++++++++++++++--------------
>>>  1 files changed, 17 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 578a0c1..c05aa32 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -144,9 +144,21 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu,
>>> u64 data)
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
>>>  
>>> +static void handle_multiple_faults(struct kvm_vcpu *vcpu)
>>> +{
>>> +    if (vcpu->arch.exception.nr != DF_VECTOR) {
>>> +        vcpu->arch.exception.nr = DF_VECTOR;
>>> +        vcpu->arch.exception.error_code = 0;
>>> +    } else
>>> +        set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>>> +}
>>> +
>>>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>>>  {
>>> -    WARN_ON(vcpu->arch.exception.pending);
>>> +    if (vcpu->arch.exception.pending) {
>>> +        handle_multiple_faults(vcpu);
>>> +        return;
>>> +    }
>>>      vcpu->arch.exception.pending = true;
>>>      vcpu->arch.exception.has_error_code = false;
>>>      vcpu->arch.exception.nr = nr;
>>> @@ -157,25 +169,16 @@ void kvm_inject_page_fault(struct kvm_vcpu
>>> *vcpu, unsigned long addr,
>>>                 u32 error_code)
>>>  {
>>>      ++vcpu->stat.pf_guest;
>>> -    if (vcpu->arch.exception.pending) {
>>> -        if (vcpu->arch.exception.nr == PF_VECTOR) {
>>> -            printk(KERN_DEBUG "kvm: inject_page_fault:"
>>> -                    " double fault 0x%lx\n", addr);
>>> -            vcpu->arch.exception.nr = DF_VECTOR;
>>> -            vcpu->arch.exception.error_code = 0;
>>> -        } else if (vcpu->arch.exception.nr == DF_VECTOR) {
>>> -            /* triple fault -> shutdown */
>>> -            set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>>> -        }
>>> -        return;
>>> -    }
>>>      vcpu->arch.cr2 = addr;
>>>      kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
>>>  }
>>>  
>>>  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32
>>> error_code)
>>>  {
>>> -    WARN_ON(vcpu->arch.exception.pending);
>>> +    if (vcpu->arch.exception.pending) {
>>> +        handle_multiple_faults(vcpu);
>>> +        return;
>>> +    }
>>>      vcpu->arch.exception.pending = true;
>>>      vcpu->arch.exception.has_error_code = true;
>>>      vcpu->arch.exception.nr = nr;
>>>     
>>
>> And here is an add-on patch to fix reset-on-triple-fault:
>>
>>
>> Clear the pending original exception when raising a triple fault. This
>> allows to re-use the vcpu instance, e.g. after a reset which is
>> typically issued as reaction on the triple fault.
>>
>> Signed-off-by: Jan Kiszka <jan...@si...>
>>
>> ---
>>  arch/x86/kvm/x86.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> Index: b/arch/x86/kvm/x86.c
>> ===================================================================
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -149,8 +149,10 @@ static void handle_multiple_faults(struc
>>      if (vcpu->arch.exception.nr != DF_VECTOR) {
>>          vcpu->arch.exception.nr = DF_VECTOR;
>>          vcpu->arch.exception.error_code = 0;
>> -    } else
>> +    } else {
>>          set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>> +        vcpu->arch.exception.pending = false;
>> +    }
>>  }
>>  
>>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>>
>>   
> 
> There's a bigger problem here.  The exception queue is hidden state that
> qemu and load and save.

Could you elaborate a bit on what the problematic scenario precisely is
(that pending triple faults would not be saved/restored while pending
exceptions are?), and if I/we can do anything to resolve it?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

[kvm-devel] [ANNOUNCE] kvm mailing lists moving

From: Avi K. <av...@qu...> - 2008-04-30 13:02:11

In about a week, the various kvm lists will move to vger.kenel.org.  
This will improve responsiveness, and reduce spam and advertising.

Please subscribe to the lists you are interested in as soon as 
possible.  You can subscribe by sending an email to 
maj...@vg..., with the following lines in the body:

subscribe kvm
subscribe kvm-commits
subscribe kvm-ia64
subscribe kvm-ppc

Of course, omit lines for the lists you are not interested in.  
Majordomo will then send further instructions.

Thanks to the vger admins for hosting the kvm lists.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

From: Avi K. <av...@qu...> - 2008-04-30 13:02:11

David S. Ahern wrote:
> Another tidbit for you guys as I make my way through various permutations:
> I installed the RHEL3 hugemem kernel and the guest behavior is *much* better.
> System time still has some regular hiccups that are higher than xen and esx
> (e.g., 1 minute samples out of 5 show system time between 10 and 15%), but
> overall guest behavior is good with the hugemem kernel.
>
>   

Wait, the amount of info here is overwhelming. Let's stick with the 
current kernel (32-bit, HIGHMEM4G, right?)

Did you get any traces with bypass_guest_pf=0? That may show more info.

-- 

Any sufficiently difficult bug is indistinguishable from a feature.

[kvm-devel] Windows PV driver for KVM

From: Jiang, Y. <yun...@in...> - 2008-04-30 12:58:05

I noticed there is a windows PV driver based on virtIO in
http://sourceforge.net/project/showfiles.php?group_id=180599

But when I enable the driver in guest, the guest will hang. I'm using
changeset around April, 18. Since the driver is created in March, I
assume the changeset in Apri should be ok.

Are there any special action needed to enable the PV driver in windows?
Have anyone tried it recently?

-- Yunhong Jiang

Re: [kvm-devel] [PATCH] x86: handle double and triple faults for every exception

From: Joerg R. <joe...@am...> - 2008-04-30 12:57:53

On Wed, Apr 30, 2008 at 10:45:12AM +0200, Jan Kiszka wrote:
> Joerg Roedel wrote:
> > The current KVM x86 exception code handles double and triple faults only for
> > page fault exceptions. This patch extends this detection for every exception
> > that gets queued for the guest.
> > 
> > Signed-off-by: Joerg Roedel <joe...@am...>
> > Cc: Jan Kiszka <jan...@si...>
> > ---
> >  arch/x86/kvm/x86.c |   31 +++++++++++++++++--------------
> >  1 files changed, 17 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 578a0c1..c05aa32 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -144,9 +144,21 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
> >  
> > +static void handle_multiple_faults(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.exception.nr != DF_VECTOR) {
> > +		vcpu->arch.exception.nr = DF_VECTOR;
> > +		vcpu->arch.exception.error_code = 0;
> > +	} else
> > +		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> > +}
> > +
> >  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
> >  {
> > -	WARN_ON(vcpu->arch.exception.pending);
> > +	if (vcpu->arch.exception.pending) {
> > +		handle_multiple_faults(vcpu);
> > +		return;
> > +	}
> >  	vcpu->arch.exception.pending = true;
> >  	vcpu->arch.exception.has_error_code = false;
> >  	vcpu->arch.exception.nr = nr;
> > @@ -157,25 +169,16 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
> >  			   u32 error_code)
> >  {
> >  	++vcpu->stat.pf_guest;
> > -	if (vcpu->arch.exception.pending) {
> > -		if (vcpu->arch.exception.nr == PF_VECTOR) {
> > -			printk(KERN_DEBUG "kvm: inject_page_fault:"
> > -					" double fault 0x%lx\n", addr);
> > -			vcpu->arch.exception.nr = DF_VECTOR;
> > -			vcpu->arch.exception.error_code = 0;
> > -		} else if (vcpu->arch.exception.nr == DF_VECTOR) {
> > -			/* triple fault -> shutdown */
> > -			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> > -		}
> > -		return;
> > -	}
> >  	vcpu->arch.cr2 = addr;
> >  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
> >  }
> >  
> >  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
> >  {
> > -	WARN_ON(vcpu->arch.exception.pending);
> > +	if (vcpu->arch.exception.pending) {
> > +		handle_multiple_faults(vcpu);
> > +		return;
> > +	}
> >  	vcpu->arch.exception.pending = true;
> >  	vcpu->arch.exception.has_error_code = true;
> >  	vcpu->arch.exception.nr = nr;
> 
> And here is an add-on patch to fix reset-on-triple-fault:
> 
> 
> Clear the pending original exception when raising a triple fault. This
> allows to re-use the vcpu instance, e.g. after a reset which is
> typically issued as reaction on the triple fault.
> 
> Signed-off-by: Jan Kiszka <jan...@si...>
> 
> ---
>  arch/x86/kvm/x86.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Index: b/arch/x86/kvm/x86.c
> ===================================================================
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -149,8 +149,10 @@ static void handle_multiple_faults(struc
>  	if (vcpu->arch.exception.nr != DF_VECTOR) {
>  		vcpu->arch.exception.nr = DF_VECTOR;
>  		vcpu->arch.exception.error_code = 0;
> -	} else
> +	} else {
>  		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> +		vcpu->arch.exception.pending = false;
> +	}
>  }
>  
>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)

Ah, indeed. Thanks.

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

Re: [kvm-devel] [PATCH] x86: handle double and triple faults for every exception

From: Avi K. <av...@qu...> - 2008-04-30 12:57:10

Jan Kiszka wrote:
> Joerg Roedel wrote:
>   
>> The current KVM x86 exception code handles double and triple faults only for
>> page fault exceptions. This patch extends this detection for every exception
>> that gets queued for the guest.
>>
>> Signed-off-by: Joerg Roedel <joe...@am...>
>> Cc: Jan Kiszka <jan...@si...>
>> ---
>>  arch/x86/kvm/x86.c |   31 +++++++++++++++++--------------
>>  1 files changed, 17 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 578a0c1..c05aa32 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -144,9 +144,21 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
>>  
>> +static void handle_multiple_faults(struct kvm_vcpu *vcpu)
>> +{
>> +	if (vcpu->arch.exception.nr != DF_VECTOR) {
>> +		vcpu->arch.exception.nr = DF_VECTOR;
>> +		vcpu->arch.exception.error_code = 0;
>> +	} else
>> +		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>> +}
>> +
>>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>>  {
>> -	WARN_ON(vcpu->arch.exception.pending);
>> +	if (vcpu->arch.exception.pending) {
>> +		handle_multiple_faults(vcpu);
>> +		return;
>> +	}
>>  	vcpu->arch.exception.pending = true;
>>  	vcpu->arch.exception.has_error_code = false;
>>  	vcpu->arch.exception.nr = nr;
>> @@ -157,25 +169,16 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>>  			   u32 error_code)
>>  {
>>  	++vcpu->stat.pf_guest;
>> -	if (vcpu->arch.exception.pending) {
>> -		if (vcpu->arch.exception.nr == PF_VECTOR) {
>> -			printk(KERN_DEBUG "kvm: inject_page_fault:"
>> -					" double fault 0x%lx\n", addr);
>> -			vcpu->arch.exception.nr = DF_VECTOR;
>> -			vcpu->arch.exception.error_code = 0;
>> -		} else if (vcpu->arch.exception.nr == DF_VECTOR) {
>> -			/* triple fault -> shutdown */
>> -			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
>> -		}
>> -		return;
>> -	}
>>  	vcpu->arch.cr2 = addr;
>>  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
>>  }
>>  
>>  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
>>  {
>> -	WARN_ON(vcpu->arch.exception.pending);
>> +	if (vcpu->arch.exception.pending) {
>> +		handle_multiple_faults(vcpu);
>> +		return;
>> +	}
>>  	vcpu->arch.exception.pending = true;
>>  	vcpu->arch.exception.has_error_code = true;
>>  	vcpu->arch.exception.nr = nr;
>>     
>
> And here is an add-on patch to fix reset-on-triple-fault:
>
>
> Clear the pending original exception when raising a triple fault. This
> allows to re-use the vcpu instance, e.g. after a reset which is
> typically issued as reaction on the triple fault.
>
> Signed-off-by: Jan Kiszka <jan...@si...>
>
> ---
>  arch/x86/kvm/x86.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: b/arch/x86/kvm/x86.c
> ===================================================================
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -149,8 +149,10 @@ static void handle_multiple_faults(struc
>  	if (vcpu->arch.exception.nr != DF_VECTOR) {
>  		vcpu->arch.exception.nr = DF_VECTOR;
>  		vcpu->arch.exception.error_code = 0;
> -	} else
> +	} else {
>  		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> +		vcpu->arch.exception.pending = false;
> +	}
>  }
>  
>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>
>   

There's a bigger problem here.  The exception queue is hidden state that 
qemu and load and save.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] Moving kvm lists to kernel.org?

From: Avi K. <av...@qu...> - 2008-04-30 12:57:07

David Miller wrote:
> I've created (and tested) all of these lists.
>
>   

Thanks. I about a week I'll make the sourceforge lists read-only.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Avi K. <av...@qu...> - 2008-04-30 12:57:06

Muli Ben-Yehuda wrote:
>> @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
>>  	npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
>>  				NULL);
>>  
>> -	if (npages != 1) {
>> -		get_page(bad_page);
>> -		return page_to_pfn(bad_page);
>> -	}
>> +	if (unlikely(npages != 1)) {
>> +		struct vm_area_struct *vma;
>>  
>> -	return page_to_pfn(page[0]);
>> +		vma = find_vma(current->mm, addr);
>> +		if (vma == NULL || addr >= vma->vm_start ||
>> +		    !(vma->vm_flags & VM_PFNMAP)) {
>>     
>
> Isn't the check for addr backwards here? For the VMA we would like to
> to find, vma->vm_start <= addr < vma->vm_end.
>
>   

The code is not trying to find a vma for the address, but a vma for the 
address which also has VM_PFNMAP set. The cases for vma not found, or 
vma found, but not VM_PFNMAP, are folded together.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] small author mixup (was: git pull KVM updates for 2.6.26rc)

From: Ingo M. <mi...@el...> - 2008-04-30 12:56:58

* Christian Borntraeger <bor...@de...> wrote:

> While it is not a typical case, is there a better way of specifying 
> multiple authors to avoid future confusion?

i think the established rule is that there's one Author field per 
commit. Multiple authors should either submit a tree with multiple 
commits (which shows the exact lineage of work) - or, for nontrivial 
joint work where the development tree would be way too messy, expose 
proper credits in copyrights/credit info in the source code. It's seldom 
that work is split exactly in half - better spell out who did what both 
in the source code and in the commit log - without trying to formalize 
the From/Author line. [which line will always be imprecise for multiple 
authors.]

	Ingo

[kvm-devel] [ kvm-Bugs-1953353 ] could not load PC BIOS '/path/to/bios.bin' on "-m 4096"

From: SourceForge.net <no...@so...> - 2008-04-30 12:53:28

Bugs item #1953353, was opened at 2008-04-28 13:50
Message generated for change (Comment added) made by ravpl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1953353&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rafal Wijata (ravpl)
Assigned to: Nobody/Anonymous (nobody)
Summary: could not load PC BIOS '/path/to/bios.bin' on "-m 4096"

Initial Comment:
The maximum amount of memory I can give to kvm is ~3560M

I run custom compiled kvm-66 on F8 box with
Linux mailhub 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:15:49 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
The modules are loaded from F8 kernel, rather than those shipped with kvm-66

----------------------------------------------------------------------

>Comment By: Rafal Wijata (ravpl)
Date: 2008-04-30 14:53

Message:
Logged In: YES 
user_id=996150
Originator: YES

Indeed, I grabbed the kvm-67, recompiled, and loaded modules that comes
with kvm(kvm-intel).
After that I could give to the guest even 6G of ram.

And BTW, after I loaded those modules I was able to assign more than 4
CPUs to the guest as well(I remember there's such bug here).

Thanx for prompt reply.

----------------------------------------------------------------------

Comment By: Marcelo Tosatti (mtosatti)
Date: 2008-04-30 03:34

Message:
Logged In: YES 
user_id=2022487
Originator: NO

Can you reproduce the problem with the modules shipped with kvm-66?


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1953353&group_id=180599

Re: [kvm-devel] small author mixup

From: Avi K. <av...@qu...> - 2008-04-30 12:52:24

Christian Borntraeger wrote:
> Am Sonntag, 27. April 2008 schrieb Avi Kivity:
>   
>> Carsten Otte (4):
>>       s390: KVM preparation: provide hook to enable pgstes in user pagetable
>>       KVM: s390: interrupt subsystem, cpu timer, waitpsw
>>       KVM: s390: API documentation
>>       s390: KVM guest: detect when running on kvm
>>
>> Christian Borntraeger (10):
>>       KVM: kvm.h: __user requires compiler.h
>>       s390: KVM preparation: host memory management changes for s390 kvm
>>       s390: KVM preparation: address of the 64bit extint parm in lowcore
>>       KVM: s390: sie intercept handling
>>       KVM: s390: intercepts for privileged instructions
>>       KVM: s390: interprocessor communication via sigp
>>       KVM: s390: intercepts for diagnose instructions
>>       KVM: s390: add kvm to kconfig on s390
>>       KVM: s390: update maintainers
>>       s390: KVM guest: virtio device support, and kvm hypercalls
>>     
>
> Thats interesting, some of these patches should actually be credited to 
> Carsten - and in fact on kvm.git master they are credited to Carsten.
>
> I think the problem is, that these patches contained multiple From lines. On 
> kvm.git the first line (Carsten) was used. When you transferred these patches 
> to the kvm.git-2.6.26-branch, git used the next From-line as the original one 
> was already removed.
>
> While it is not a typical case, is there a better way of specifying multiple 
> authors to avoid future confusion?

It's probably due to my heavy use of git cherry-pick, rebase, and rebase 
-i.  I couldn't reproduce this with a test that mimics that workflow, so 
either it has been fixed already, or it's a little more subtle.

I don't think you should change anything to avoid this.  I'll keep an 
eye open for this, and if it happens again I'll fix it locally and send 
a proper bug report to the git mailing list.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Avi K. <av...@qu...> - 2008-04-30 12:52:13

Andrea Arcangeli wrote:
> On Wed, Apr 30, 2008 at 11:59:47AM +0300, Avi Kivity wrote:
>   
>> The code is not trying to find a vma for the address, but a vma for the 
>> address which also has VM_PFNMAP set. The cases for vma not found, or vma 
>> found, but not VM_PFNMAP, are folded together.
>>     
>
> Muli's saying the comparison is reversed, change >= to <.
>   

Err, yes.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Andrea A. <an...@qu...> - 2008-04-30 12:52:02

On Wed, Apr 30, 2008 at 11:59:47AM +0300, Avi Kivity wrote:
> The code is not trying to find a vma for the address, but a vma for the 
> address which also has VM_PFNMAP set. The cases for vma not found, or vma 
> found, but not VM_PFNMAP, are folded together.

Muli's saying the comparison is reversed, change >= to <.

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Anthony L. <an...@co...> - 2008-04-30 12:50:38

Muli Ben-Yehuda wrote:
> On Tue, Apr 29, 2008 at 02:09:20PM -0500, Anthony Liguori wrote:
>   
>> This patch allows VMA's that contain no backing page to be used for guest
>> memory.  This is a drop-in replacement for Ben-Ami's first page in his direct
>> mmio series.  Here, we continue to allow mmio pages to be represented in the
>> rmap.
>>
>> Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP
>> instead of VM_IO and changed the BUG_ON to a return of bad_page.
>>
>> Signed-off-by: Anthony Liguori <ali...@us...>
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 1d7991a..64e5efe 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
>>  	struct page *page[1];
>>  	unsigned long addr;
>>  	int npages;
>> +	pfn_t pfn;
>>  
>>  	might_sleep();
>>  
>> @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
>>  	npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
>>  				NULL);
>>  
>> -	if (npages != 1) {
>> -		get_page(bad_page);
>> -		return page_to_pfn(bad_page);
>> -	}
>> +	if (unlikely(npages != 1)) {
>> +		struct vm_area_struct *vma;
>>  
>> -	return page_to_pfn(page[0]);
>> +		vma = find_vma(current->mm, addr);
>> +		if (vma == NULL || addr >= vma->vm_start ||
>> +		    !(vma->vm_flags & VM_PFNMAP)) {
>>     
>
> Isn't the check for addr backwards here? For the VMA we would like to
> to find, vma->vm_start <= addr < vma->vm_end.
>   

Yes it is.  Thanks for spotting that.

Regards,

Anthony Liguori

> Cheers,
> Muli
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> kvm-devel mailing list
> kvm...@li...
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Andrea A. <an...@qu...> - 2008-04-30 07:00:40

On Tue, Apr 29, 2008 at 06:12:51PM -0500, Anthony Liguori wrote:
> IIUC PPC correctly, all IO pages have corresponding struct pages.  This 
> means that get_user_pages() would succeed and you can reference count them? 
>  In this case, we would never take the VM_PFNMAP path.

get_user_pages only works on vmas where only pfn with struct page can
be mapped, but if a struct page exists it doesn't mean get_user_pages
will succeed. All mmio regions should be marked VM_IO as reading on
them affects hardware somehow and that prevents get_user_pages to work
on them regardless if a struct page exists.

> That's independent of this patchset.  For non-aware guests, we'll have to 
> pin all of physical memory up front and then create an IOMMU table from the 
> pinned physical memory.  For aware guests with a PV DMA window API, we'll 
> be able to build that mapping on the fly (enforcing mlock allocation 
> limits).

BTW, as far as linux guest is concerned, if the PV DMA API mlock
ulimit triggers the guest will crash. Nothing checks when
pci_map_single returns null (the fix would be to throttle the I/O
until some other dma is completed and to split the dma in multiple
operations if it's a SG entry and if it repeteadly fails to fallback
to PIO or return an IO error if PIO isn't available). It can fail if
there's lots of weird pci hardware doing rdma at the same time (for
example see iommu_arena_alloc retval in
arch/alpha/kernel/pci_iommu.c). In short we'll either need ulimit -l
unlimited or we'll have to define practical limits so depending on the
guest driver code and number of devices using passthrough.

I'll make the reserved-ram patch incremental with those patches, then
it should pick the right pfn coming from /dev/mem without my
page_count == 0 check, and then I've only to fixup the page pinning
(so likely it'll also be incremental with the kvm mmu notifier patch
so I can hope to get something final and remove page pinning for good
not only on mmio regions that don't have a struct page). I've
currently troubles with the blk-settings.c change done in 2.6.25 to
boot in the host, I thought I fixed that already...(I did when loading
the host kernel in kvm, but on real hardware it fails still for
another reason). And Andrew sent me a large email about mmu notifiers,
so before I return on the reserved-ram I've to answer him and upload
an updated mmu-notifier patch with certain cleanups he requested, so
go ahead ignoring the reserved-ram and mmu notifiers, I'll pick
whatever is available in or outside kvm.git when I'm ready. Thanks!

Re: [kvm-devel] [PATCH] x86 DMA: Handle devices assigned to the guest by the host

From: Muli Ben-Y. <mu...@il...> - 2008-04-30 06:27:15

On Tue, Apr 29, 2008 at 01:37:29PM +0300, Amit Shah wrote:

> dma_alloc_coherent() doesn't call dma_ops->alloc_coherent in case no
> IOMMU translations are necessary.

I always thought this was a huge wart in the x86-64 DMA ops. Would
there be strong resistance to fixing it so that alloc_coherent
matches the way the other ops are used? This will eliminate the need
for this patch and will make other DMA ops implementations saner.

Cheers,
Muli

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

From: Muli Ben-Y. <mu...@il...> - 2008-04-30 06:09:16

On Tue, Apr 29, 2008 at 02:09:20PM -0500, Anthony Liguori wrote:
> This patch allows VMA's that contain no backing page to be used for guest
> memory.  This is a drop-in replacement for Ben-Ami's first page in his direct
> mmio series.  Here, we continue to allow mmio pages to be represented in the
> rmap.
> 
> Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP
> instead of VM_IO and changed the BUG_ON to a return of bad_page.
> 
> Signed-off-by: Anthony Liguori <ali...@us...>
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1d7991a..64e5efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
>  	struct page *page[1];
>  	unsigned long addr;
>  	int npages;
> +	pfn_t pfn;
>  
>  	might_sleep();
>  
> @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
>  	npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
>  				NULL);
>  
> -	if (npages != 1) {
> -		get_page(bad_page);
> -		return page_to_pfn(bad_page);
> -	}
> +	if (unlikely(npages != 1)) {
> +		struct vm_area_struct *vma;
>  
> -	return page_to_pfn(page[0]);
> +		vma = find_vma(current->mm, addr);
> +		if (vma == NULL || addr >= vma->vm_start ||
> +		    !(vma->vm_flags & VM_PFNMAP)) {

Isn't the check for addr backwards here? For the VMA we would like to
to find, vma->vm_start <= addr < vma->vm_end.

Cheers,
Muli

Re: [kvm-devel] [PATCH] KVM x86: Handle hypercalls for assigned PCI devices

From: Muli Ben-Y. <mu...@il...> - 2008-04-30 06:02:50

On Wed, Apr 30, 2008 at 01:48:38AM +0300, Avi Kivity wrote:
> Amit Shah wrote:
>>
>>>> +	if (is_error_page(host_page)) {
>>>> +		printk(KERN_INFO "%s: gfn %p not valid\n",
>>>> +		       __func__, (void *)page_gfn);
>>>> +		r = -1;
>>>>       
>>> r = -1 is not really informative. Better use some meaningful error.
>>>     
>>
>> The error's going to the guest. The guest, as we know, has already
>> done a successful DMA allocation. Something went wrong in the
>> hypercall, and we don't know why (bad page). Any kind of error here
>> isn't going to be intelligible to the guest anyway. It's mostly a
>> host thing if we ever hit this.
>>   
>
> If the guest is not able to handle it, why bother returning an
> error?  Better to kill it.
>
> But in any case, -1 is not a good error number.

The guest should be able to deal with transient DMA mapping errors,
either by retrying, or quiescing the device. This is in line with how
HW IOMMUs work - they may run out of mappings for example and the
driver should be able to cope with it. Killing the guest is a last
resort.

Cheers,
Muli

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

From: David S. A. <da...@ci...> - 2008-04-30 04:18:21

Another tidbit for you guys as I make my way through various permutations:
I installed the RHEL3 hugemem kernel and the guest behavior is *much* better.
System time still has some regular hiccups that are higher than xen and esx
(e.g., 1 minute samples out of 5 show system time between 10 and 15%), but
overall guest behavior is good with the hugemem kernel.

One side effect I've noticed is that I cannot restart the RHEL3 guest running
the hugemem kernel in successive attempts. The guest has 2 vcpus and qemu shows
one thread at 100% cpu. If I recall correctly kvm_stat shows a large amount of
tlb_flushes (like millions in a 5-second sample). The scenario is:
1. start guest running hugemem kernel,
2. shutdown,
3. restart guest.

During 3. it hangs, but at random points. Removing kvm/kvm-intel has no effect -
guest still hangs on the restart. Rebooting the host clears the problem.

Alternatively, during the hang on a restart I can kill the guest, and then on
restart choose the normal, 32-bit smp kernel and the guest boots just fine. At
this point I can shutdown the guest and restart with the hugemem kernel and it
boots just fine.

david


David S. Ahern wrote:
> Hi Marcelo:
> 
> mmu_recycled is always 0 for this guest -- even after almost 4 hours of uptime.
> 
> Here is a kvm_stat sample where guest time was very high and qemu had 2
> processors at 100% on the host. I removed counters where both columns have 0
> value for brevity.
> 
>  exits                 45937979  758051
>  fpu_reload             1416831      87
>  halt_exits              112911       0
>  halt_wakeup              31771       0
>  host_state_reload      2068602     263
>  insn_emulation        21601480  365493
>  io_exits               1827374    2705
>  irq_exits              8934818  285196
>  mmio_exits              421674     147
>  mmu_cache_miss         4817689   93680
>  mmu_flooded            4815273   93680
>  mmu_pde_zapped           51344       0
>  mmu_prefetch           4817625   93680
>  mmu_pte_updated       14803298  270104
>  mmu_pte_write         19859863  363785
>  mmu_shadow_zapped      4832106   93679
>  pf_fixed              32184355  468398
>  pf_guest                264138       0
>  remote_tlb_flush      10697762  280522
>  tlb_flush             10301338  176424
> 
> (NOTE: This is for a *5* second sample interval instead of 1 to allow me to
> capture the data).
> 
> Here's a sample when the guest is "well-behaved" (system time <10%, though ):
>  exits                 51502194   97453
>  fpu_reload             1421736     227
>  halt_exits              138361    1927
>  halt_wakeup              33047     117
>  host_state_reload      2110190    3740
>  insn_emulation        24367441   47260
>  io_exits               1874075    2576
>  irq_exits             10224702   13333
>  mmio_exits              435154    1726
>  mmu_cache_miss         5414097   11258
>  mmu_flooded            5411548   11243
>  mmu_pde_zapped           52851      44
>  mmu_prefetch           5414031   11258
>  mmu_pte_updated       16854686   29901
>  mmu_pte_write         22526765   42285
>  mmu_shadow_zapped      5430025   11313
>  pf_fixed              36144578   67666
>  pf_guest                282794     430
>  remote_tlb_flush      12126268   14619
>  tlb_flush             11753162   21460
> 
> 
> There is definitely a strong correlation between the mmu counters and high
> system times in the guest. I am still trying to find out what in the guest is
> stimulating it when running on RHEL3; I do not see this same behavior for an
> equivalent setup running on RHEL4.
> 
> By the way I added an mmu_prefetch stat in prefetch_page() to count the number
> of times the for() loop is hit with PTTYPE == 64; ie., number of times
> paging64_prefetch_page() is invoked. (I wanted an explicit counter for this
> loop, though the info seems to duplicate other entries.) That counter is listed
> above. As I mentioned in a prior post when kscand kicks in the change in
> mmu_prefetch counter is at 20,000+/sec, with each trip through that function
> taking 45k+ cycles.
> 
> kscand is an instigator shortly after boot, however, kscand is *not* the culprit
> once the system has been up for 30-45 minutes. I have started instrumenting the
> RHEL3U8 kernel and for the load I am running kscand does not walk the active
> lists very often once the system is up.
> 
> So, to dig deeper on what in the guest is stimulating the mmu I collected
> kvmtrace data for about a 2 minute time interval which caught about a 30-second
> period where guest system time was steady in the 25-30% range. Summarizing the
> number of times a RIP appears in an VMEXIT shows the following high runners:
> 
>   count      RIP       RHEL3-symbol
>   82549   0xc0140e42  follow_page [kernel] c0140d90 offset b2
>   42532   0xc0144760  handle_mm_fault [kernel] c01446d0 offset 90
>   36826   0xc013da4a  futex_wait [kernel] c013d870 offset 1da
>   29987   0xc0145cd0  zap_pte_range [kernel] c0145c10 offset c0
>   27451   0xc0144018  do_no_page [kernel] c0143e20 offset 1f8
> 
> (halt entry removed the list since that is the ideal scenario for an exit).
> 
> So the RIP correlates to follow_page() for a large percentage of the VMEXITs.
> 
> I wrote an awk script to summarize (histogram style) the TSC cycles between
> VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times
> (ie., almost 100% of the time) the trace shows a delta between 50k and 100k
> cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second
> one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace
> shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These
> seems to correlate with the prefetch_page function in kvm, though I am not 100%
> positive on that.
> 
> I am now investigating the kernel paths leading to those functions. Any insights
> would definitely be appreciated.
> 
> david
> 
> 
> Marcelo Tosatti wrote:
>> On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote:
>>> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page():
>>>
>>>         for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
>>>                 gpa_t pte_gpa = gfn_to_gpa(sp->gfn);
>>>                 pte_gpa += (i+offset) * sizeof(pt_element_t);
>>>
>>>                 r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt,
>>>                                           sizeof(pt_element_t));
>>>                 if (r || is_present_pte(pt))
>>>                         sp->spt[i] = shadow_trap_nonpresent_pte;
>>>                 else
>>>                         sp->spt[i] = shadow_notrap_nonpresent_pte;
>>>         }
>>>
>>> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per
>>> loop.
>>>
>>> This function gets run >20,000/sec during some of the kscand loops.
>> Hi David,
>>
>> Do you see the mmu_recycled counter increase?
>>
>

157 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 28 29 30 31 32 .. 703 > >> (Page 30 of 703)

2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (33)	Nov (325)	Dec (320)
2007	Jan (484)	Feb (438)	Mar (407)	Apr (713)	May (831)	Jun (806)	Jul (1023)	Aug (1184)	Sep (1118)	Oct (1461)	Nov (1224)	Dec (1042)
2008	Jan (1449)	Feb (1110)	Mar (1428)	Apr (1643)	May (682)	Jun	Jul	Aug	Sep	Oct	Nov	Dec