kvm-devel Mailing List for kernel virtual machine (Page 54)

Brought to you by: avik, mtosatti

kvm-devel — kernel virtual machine development

You can subscribe to this list here.

2006	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (33)	_Nov (325)	_Dec (320)
2007	_Jan (484)	_Feb (438)	_Mar (407)	_Apr (713)	_May (831)	_Jun (806)	_Jul (1023)	_Aug (1184)	_Sep (1118)	_Oct (1461)	_Nov (1224)	_Dec (1042)
2008	_Jan (1449)	_Feb (1110)	_Mar (1428)	_Apr (1643)	_May (682)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 52 53 54 55 56 .. 703 > >> (Page 54 of 703)

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

From: David S. A. <da...@ci...> - 2008-04-21 04:31:12

I added the traces and captured data over another apparent lockup of the guest.
This seems to be representative of the sequence (pid/vcpu removed).

(+4776)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016127c ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
(+3632)  VMENTRY
(+4552)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
(+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000 fffb61c8 ]
(+   54928)  VMENTRY
(+4568)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
(+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41c5d363 ]
(+8432)  VMENTRY
(+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ]
(+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ]
(+   13832)  VMENTRY


(+5768)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016127c ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
(+3712)  VMENTRY
(+4576)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
(+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000 fffb61d0 ]
(+   0)  PTE_WRITE      [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 3d55d047 ]
(+   65216)  VMENTRY
(+4232)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ]
(+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000 3d598363 ]
(+8640)  VMENTRY
(+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ]
(+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ]
(+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ]
(+   14160)  VMENTRY

I can forward a more complete time snippet if you'd like. vcpu0 + corresponding
vcpu1 files have 85000 total lines and compressed the files total ~500k.

I did not see the FLOODED trace come out during this sample though I did bump
the count from 3 to 4 as you suggested.


Correlating rip addresses to the 2.4 kernel:

c0160d00-c0161290 = page_referenced

It looks like the event is kscand running through the pages. I suspected this
some time ago, and tried tweaking the kscand_work_percent sysctl variable. It
appeared to lower the peak of the spikes, but maybe I imagined it. I believe
lowering that value makes kscand wake up more often but do less work (page
scanning) each time it is awakened.

david


Avi Kivity wrote:
> Can you add a trace at mmu_guess_page_from_pte_write(), right before "if 
> (is_present_pte(gpte))"?  I'm interested in gpa and gpte.  Also a trace 
> at kvm_mmu_pte_write(), where it sets flooded = 1 (hmm, try to increase 
> the 3 to 4 in the line right above that, maybe the fork detector is 
> misfiring).

Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

From: Nguyen A. Q. <aq...@gm...> - 2008-04-21 03:36:42

Attachments: linuxboot1.1.diff

Hmm, the last patch includes a binary. So please take this patch instead.

Thanks,
Q

# diffstat linuxboot1.diff
 Makefile             |   13 ++++-
 linuxboot/Makefile   |   40 +++++++++++++++
 linuxboot/boot.S     |   54 +++++++++++++++++++++
 linuxboot/farvar.h   |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++
 linuxboot/rom.c      |  104 ++++++++++++++++++++++++++++++++++++++++
 linuxboot/signrom.c  |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++
 linuxboot/util.h     |   69 +++++++++++++++++++++++++++
 qemu/Makefile        |    3 -
 qemu/Makefile.target |    2
 qemu/hw/linuxboot.c  |   39 +++++++++++++++
 qemu/hw/pc.c         |   22 +++++++-
 qemu/hw/pc.h         |    5 +
 12 files changed, 600 insertions(+), 9 deletions(-)





On Mon, Apr 21, 2008 at 12:33 PM, Nguyen Anh Quynh <aq...@gm...> wrote:
> Forget to say that this patch is against kvm-66.
>
>  Thanks,
>  Q
>
>
>
>  On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh <aq...@gm...> wrote:
>  > Hi,
>  >
>  >  This should be submitted to upstream (but not to kvm-devel list), but
>  >  this is only the test code that I want to quickly send out for
>  >  comments. In case it looks OK, I will send it to upstream later.
>  >
>  >  Inspired by extboot and conversations with Anthony and HPA, this
>  >  linuxboot option ROM is a simple option ROM that intercepts int19 in
>  >  order to execute linux setup code. This approach eliminates the need
>  >  to manipulate the boot sector for this purpose.
>  >
>  >  To test it, just load linux kernel with your KVM/QEMU image using
>  >  -kernel option in normal way.
>  >
>  >  I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
>  >  Ubuntu 8.04.
>  >
>  >  Thanks,
>  >  Quynh
>  >
>  >
>  >  # diffstat linuxboot1.diff
>  >   Makefile             |   13 ++++-
>  >   linuxboot/Makefile   |   40 +++++++++++++++
>  >   linuxboot/boot.S     |   54 +++++++++++++++++++++
>  >   linuxboot/farvar.h   |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  >   linuxboot/rom.c      |  104 ++++++++++++++++++++++++++++++++++++++++
>  >   linuxboot/signrom    |binary
>  >   linuxboot/signrom.c  |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  >   linuxboot/util.h     |   69 +++++++++++++++++++++++++++
>  >   qemu/Makefile        |    3 -
>  >   qemu/Makefile.target |    2
>  >   qemu/hw/linuxboot.c  |   39 +++++++++++++++
>  >   qemu/hw/pc.c         |   22 +++++++-
>  >   qemu/hw/pc.h         |    5 +
>  >   13 files changed, 600 insertions(+), 9 deletions(-)
>  >
>

Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

From: Nguyen A. Q. <aq...@gm...> - 2008-04-21 03:33:48

Forget to say that this patch is against kvm-66.

Thanks,
Q

On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh <aq...@gm...> wrote:
> Hi,
>
>  This should be submitted to upstream (but not to kvm-devel list), but
>  this is only the test code that I want to quickly send out for
>  comments. In case it looks OK, I will send it to upstream later.
>
>  Inspired by extboot and conversations with Anthony and HPA, this
>  linuxboot option ROM is a simple option ROM that intercepts int19 in
>  order to execute linux setup code. This approach eliminates the need
>  to manipulate the boot sector for this purpose.
>
>  To test it, just load linux kernel with your KVM/QEMU image using
>  -kernel option in normal way.
>
>  I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
>  Ubuntu 8.04.
>
>  Thanks,
>  Quynh
>
>
>  # diffstat linuxboot1.diff
>   Makefile             |   13 ++++-
>   linuxboot/Makefile   |   40 +++++++++++++++
>   linuxboot/boot.S     |   54 +++++++++++++++++++++
>   linuxboot/farvar.h   |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   linuxboot/rom.c      |  104 ++++++++++++++++++++++++++++++++++++++++
>   linuxboot/signrom    |binary
>   linuxboot/signrom.c  |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   linuxboot/util.h     |   69 +++++++++++++++++++++++++++
>   qemu/Makefile        |    3 -
>   qemu/Makefile.target |    2
>   qemu/hw/linuxboot.c  |   39 +++++++++++++++
>   qemu/hw/pc.c         |   22 +++++++-
>   qemu/hw/pc.h         |    5 +
>   13 files changed, 600 insertions(+), 9 deletions(-)
>

[kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

From: Nguyen A. Q. <aq...@gm...> - 2008-04-21 03:32:43

Attachments: linuxboot1.diff

Hi,

This should be submitted to upstream (but not to kvm-devel list), but
this is only the test code that I want to quickly send out for
comments. In case it looks OK, I will send it to upstream later.

Inspired by extboot and conversations with Anthony and HPA, this
linuxboot option ROM is a simple option ROM that intercepts int19 in
order to execute linux setup code. This approach eliminates the need
to manipulate the boot sector for this purpose.

To test it, just load linux kernel with your KVM/QEMU image using
-kernel option in normal way.

I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
Ubuntu 8.04.

Thanks,
Quynh


# diffstat linuxboot1.diff
 Makefile             |   13 ++++-
 linuxboot/Makefile   |   40 +++++++++++++++
 linuxboot/boot.S     |   54 +++++++++++++++++++++
 linuxboot/farvar.h   |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++
 linuxboot/rom.c      |  104 ++++++++++++++++++++++++++++++++++++++++
 linuxboot/signrom    |binary
 linuxboot/signrom.c  |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++
 linuxboot/util.h     |   69 +++++++++++++++++++++++++++
 qemu/Makefile        |    3 -
 qemu/Makefile.target |    2
 qemu/hw/linuxboot.c  |   39 +++++++++++++++
 qemu/hw/pc.c         |   22 +++++++-
 qemu/hw/pc.h         |    5 +
 13 files changed, 600 insertions(+), 9 deletions(-)

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Javier G. G. <ja...@gu...> - 2008-04-21 00:31:50

On Sunday 20 April 2008, Avi Kivity wrote:
> Also, I'd presume that those that need 10K IOPS and above will not place
> their high throughput images on a filesystem; rather on a separate SAN LUN.

i think that too; but still that LUN would be accessed by the VM's via one of 
these IO emulation layers, right?

or maybe you're advocating using the SAN initiator in the VM instead of the 
host?

-- 
Javier

Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

From: Marcelo T. <mto...@re...> - 2008-04-20 23:57:32

On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote:
> >The iperf numbers are pretty good. Performance of UP guests increase 
> >slightly but SMP
> >is quite significant.
>
> I expect you're seeing contention induced by memcpy()s and inefficient 
> emulation.  With the dma api, I expect the benefit will drop.

You still have to memcpy() with the dma api. Even with vringfd the
kernel->user copy has to be performed under the global mutex protection,
difference being that several packets can be copied per-syscall instead
of only one.

> >Note that workloads with multiple busy devices (such as databases, web 
> >servers) should
> >be the real winners.
> >
> >What is the feeling on this? Its not _that_ intrusive and can be easily 
> >NOP'ed out for
> >QEMU.
> >
> >  
> 
> I think many parts are missing (or maybe, I missed them).  You need to 
> lock the qemu internals (there are many read-mostly qemu caches 
> scattered around the code), lock against hotplug, etc.  

Yes, there are some parts missing, such as the bh list and hotplug as
you mention.

> For pure cpu emulation, there is a ton of work to be done: protecting
> the translator as well as making the translated code smp safe.

I now believe there is a lot of work (which was not clear before).
Not particularly interested in getting real emulation to be
multithreaded.

Anyways, the lack of multithreading in qemu emulation should not be a
blocker for these patches to get in, since these are infrastructural
changes.

> I think that QemuDevice makes sense, and that we want this long term, 
> but that we first need to improve efficiency (which reduces cpu 
> utilization _and_ improves scalability) rather than look at scalability 
> alone (which is much harder in addition to the drawback of not reducing 
> cpu utilization).

Will complete the QEMUDevice+splitlock patchset, keep it uptodated, and
test it under a wider variety of workloads.

Thanks.

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Jamie L. <ja...@sh...> - 2008-04-20 23:39:11

Avi Kivity wrote:
> >Does that mean "for the majority of deployments, the slow version is
> >sufficient.  The few that care about performance can use Linux AIO?"
>
> In essence, yes. s/slow/slower/ and s/performance/ultimate block device 
> performance/.
> 
> Many deployments don't care at all about block device performance; they 
> care mostly about networking performance.

That's interesting.  I'd have expected block device performance to be
important for most things, for the same reason that disk performance
is (well, reasonably) important for non-virtual machines.

But as you say next:

> >I'm under the impression that the entire and only point of Linux AIO
> >is that it's faster than POSIX AIO on Linux.
> 
> It is.  I estimate posix aio adds a few microseconds above linux aio per 
> I/O request, when using O_DIRECT.  Assuming 10 microseconds, you will 
> need 10,000 I/O requests per second per vcpu to have a 10% performance 
> difference.  That's definitely rare.

Oh, I didn't realise the difference was so small.

At such a tiny difference, I'm wondering why Linux-AIO exists at all,
as it complicates the kernel rather a lot.  I can see the theoretical
appeal, but if performance is so marginal, I'm surprised it's in
there.

I'm also surprised the Glibc implementation of AIO using ordinary
threads is so close to it.  And then, I'm wondering why use AIO it
all: it suggests QEMU would run about as fast doing synchronous I/O in
a few dedicated I/O threads.

> >Does that mean "a managed environment can have some code which check
> >the host kernel version + filesystem type holding the VM image, to
> >conditionally enable Linux AIO?"  (Since if you care about
> >performance, which is the sole reason for using Linux AIO, you
> >wouldn't want to enable Linux AIO on any host in your cluster where it
> >will trash performance.)
> 
> Either that, or mandate that all hosts use a filesystem and kernel which 
> provide the necessary performance.  Take ovirt for example, which 
> provides the entire hypervisor environment, and so can guarantee this.
> 
> Also, I'd presume that those that need 10K IOPS and above will not place 
> their high throughput images on a filesystem; rather on a separate SAN LUN.

Does the separate LUN make any difference?  I thought O_DIRECT on a
filesystem was meant to be pretty close to block device performance.
I base this on messages here and there which say swapping to a file is
about as fast as swapping to a block device, nowadays.

Thanks for your useful remarks, btw.  There doesn't seem to be a lot
of good info about Linux-AIO around.

-- Jamie

Re: [kvm-devel] [Qemu-devel] [PATCH 1/5] PCI DMA API (v3)

From: Anthony L. <ali...@us...> - 2008-04-20 19:32:12

Blue Swirl wrote:
> On 4/19/08, Anthony Liguori <an...@co...> wrote:
>   
> Well, the IOVector part and bdrv_readv look OK, except for the heavy
> mallocing involved.
>   

I don't think that in practice, malloc is going to have any sort of 
performance impact.  If it does, it's easy enough to implement a small 
object allocator for common, small vector sizes.

> I'm not so sure about the DMA side and how everything fits together
> for zero-copy IO. For example, do we still need explicit translation
> at some point?

I'm thinking that zero copy will be implemented by setting the map and 
unmap functions to NULL by default (instead of to the PCI read/write 
functions).  Then the bus can decide whether copy functions are needed.  
I'll send an updated patch series tomorrow that includes this functionality.

Regards,

Anthony Liguori

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Avi K. <av...@qu...> - 2008-04-20 18:47:16

Jamie Lokier wrote:
> Avi Kivity wrote:
>   
>> For the majority of deployments posix aio should be sufficient.  The few 
>> that need something else can use Linux aio.
>>     
>
> Does that mean "for the majority of deployments, the slow version is
> sufficient.  The few that care about performance can use Linux AIO?"
>
>   

In essence, yes. s/slow/slower/ and s/performance/ultimate block device 
performance/.

Many deployments don't care at all about block device performance; they 
care mostly about networking performance.

> I'm under the impression that the entire and only point of Linux AIO
> is that it's faster than POSIX AIO on Linux.
>   

It is.  I estimate posix aio adds a few microseconds above linux aio per 
I/O request, when using O_DIRECT.  Assuming 10 microseconds, you will 
need 10,000 I/O requests per second per vcpu to have a 10% performance 
difference.  That's definitely rare.

>> Of course, a managed environment can use Linux aio unconditionally if 
>> knows the kernel has all the needed goodies.
>>     
>
> Does that mean "a managed environment can have some code which check
> the host kernel version + filesystem type holding the VM image, to
> conditionally enable Linux AIO?"  (Since if you care about
> performance, which is the sole reason for using Linux AIO, you
> wouldn't want to enable Linux AIO on any host in your cluster where it
> will trash performance.)
>   

Either that, or mandate that all hosts use a filesystem and kernel which 
provide the necessary performance.  Take ovirt for example, which 
provides the entire hypervisor environment, and so can guarantee this.

Also, I'd presume that those that need 10K IOPS and above will not place 
their high throughput images on a filesystem; rather on a separate SAN LUN.

> Just wondering.
>   

Hope this clarifies.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

From: Jamie L. <ja...@sh...> - 2008-04-20 15:49:46

Avi Kivity wrote:
> For the majority of deployments posix aio should be sufficient.  The few 
> that need something else can use Linux aio.

Does that mean "for the majority of deployments, the slow version is
sufficient.  The few that care about performance can use Linux AIO?"

I'm under the impression that the entire and only point of Linux AIO
is that it's faster than POSIX AIO on Linux.

> Of course, a managed environment can use Linux aio unconditionally if 
> knows the kernel has all the needed goodies.

Does that mean "a managed environment can have some code which check
the host kernel version + filesystem type holding the VM image, to
conditionally enable Linux AIO?"  (Since if you care about
performance, which is the sole reason for using Linux AIO, you
wouldn't want to enable Linux AIO on any host in your cluster where it
will trash performance.)

Just wondering.

Thanks,
-- Jamie

Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support

From: Yang, S. <she...@in...> - 2008-04-20 13:46:35

On Friday 18 April 2008 23:54:04 Anthony Liguori wrote:
> Yang, Sheng wrote:
> > On Friday 18 April 2008 21:30:14 Anthony Liguori wrote:
> >> Yang, Sheng wrote:
> >>> @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu,
> >>> u64 *shadow_pte,
> >>>  	 * whether the guest actually used the pte (in order to detect
> >>>  	 * demand paging).
> >>>  	 */
> >>> -	spte = PT_PRESENT_MASK | PT_DIRTY_MASK;
> >>> +	spte = shadow_base_present_pte | shadow_dirty_mask;
> >>>  	if (!speculative)
> >>>  		pte_access |= PT_ACCESSED_MASK;
> >>>  	if (!dirty)
> >>>  		pte_access &= ~ACC_WRITE_MASK;
> >>> -	if (!(pte_access & ACC_EXEC_MASK))
> >>> -		spte |= PT64_NX_MASK;
> >>> -
> >>> -	spte |= PT_PRESENT_MASK;
> >>> +	if (pte_access & ACC_EXEC_MASK) {
> >>> +		if (shadow_x_mask)
> >>> +			spte |= shadow_x_mask;
> >>> +	} else if (shadow_nx_mask)
> >>> +		spte |= shadow_nx_mask;
> >>
> >> This looks like it may be a bug.  The old behavior sets NX if
> >> (pte_access & ACC_EXEC_MASK).  The new behavior unconditionally sets NX
> >> and never sets PRESENT.  Also, the if (shadow_x_mas k) checks are
> >> unnecessary.  spte |= 0 is a nop.
> >
> > Thanks for the comment! I realized two judgments of shadow_nx/x_mask is
> > unnecessary... In fact, the correct behavior is either set shadow_x_mask
> > or shadow_nx_mask, may be there is a better approach for this. The logic
> > assured by program itself is always safer. But I will remove the
> > redundant code at first.
> >
> > But I don't think it's a bug. The old behavior set NX if (!(pte_access &
> > ACC_EXEC_MASK)), the same as the new one.
>
> The new behavior sets NX regardless of whether (pte_access &
> ACC_EXEC_MASK).  Is the desired change to unconditionally set NX?

Oh, I may see the point... shadow_x_mask != shadow_nx_mask.

the old behavior was:

if (!(pte_access & ACC_EXEC_MASK))
	spte |= PT64_NX_MASK;

the new behavior is:

if (pte_access & ACC_EXEC_MASK) {
		spte |= shadow_x_mask;
} else spte |= shadow_nx_mask;

For current behavior, kvm_arch_init() got:
       kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
                       PT_DIRTY_MASK, PT64_NX_MASK, 0);
which means shadow_nx_mask = PT64_NX_MASK, and shadow_x_mask = 0 (NX means not 
executable, and X means executable). 

In patch 5/6, EPT got:
       kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
                       VMX_EPT_FAKE_DIRTY_MASK, 0ull,
                       VMX_EPT_EXECUTABLE_MASK);
which means, shadow_nx_mask = 0, and shadow_x_mask = VMX_EPT_EXECUTABLE_MASK

So, when shadow enabled, and (!(pte_access & ACC_EXEC_MASK)), then spte |= 
shadow_nx_mask = PT64_NX_MASK (no change would happen when the condition is 
not satisfied). 

When EPT enabled, and (pte_access & ACC_EXEC_MASK), then spte |= shadow_x_mask 
= VMX_EPT_EXECUTABLE_MASK (no change would happen when condition is not 
satisfied).

They are two different bit and mutual exclusive ones. Maybe there are some 
better way to get their meaning more clearly...

>
> >  And I also curious about the
> > PRESENT bit. You see, the PRESENT bit was set at the beginning of the
> > code, and I really don't know why the duplicate one exists there...
>
> Looking at the code, you appear to be right.  In the future, I think you
> should separate any cleanups (like removing the redundant setting of
> PRESENT) into a separate patch and stick to just programmatic changes of
> PT_USER_MASK => shadow_user_mask, etc. in this patch.  That makes it a
> lot easier to review correctness.

Thanks for the advice, it's important to separate the cleanups. I will get it 
done more properly next time. 

-- 
Thanks
Yang, Sheng
>
> Regards,
>
> Anthony Liguori
>
> >>>  	if (pte_access & ACC_USER_MASK)
> >>> -		spte |= PT_USER_MASK;
> >>> +		spte |= shadow_user_mask;
> >>>  	if (largepage)
> >>>  		spte |= PT_PAGE_SIZE_MASK;

Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

From: Avi K. <av...@qu...> - 2008-04-20 11:20:29

Marcelo Tosatti wrote:
> Introduce QEMUDevice, making the ioport/iomem->device relationship visible. 
>
> At the moment it only contains a lock, but could be extended.
>
> With it the following is possible:
>     - vcpu's to read/write via ioports/iomem while the iothread is working on 
>       some unrelated device, or just copying data from the kernel.
>     - vcpu's to read/write via ioports/iomem to different devices simultaneously.
>
> This patchset is only a proof of concept kind of thing, so only serial+raw image
> are supported. 
>
> Tried two benchmarks, iperf and tiobench. With tiobench the reported latency is 
> significantly lower (20%+), but throughput with IDE is only slightly higher. 
>
> Expect to see larger improvements with a higher performing IO scheme (SCSI still buggy,
> looking at it).
>
> The iperf numbers are pretty good. Performance of UP guests increase slightly but SMP
> is quite significant.
>
>   

I expect you're seeing contention induced by memcpy()s and inefficient 
emulation.  With the dma api, I expect the benefit will drop.

> Note that workloads with multiple busy devices (such as databases, web servers) should
> be the real winners.
>
> What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed out for
> QEMU.
>
>   

I think many parts are missing (or maybe, I missed them).  You need to 
lock the qemu internals (there are many read-mostly qemu caches 
scattered around the code), lock against hotplug, etc.  For pure cpu 
emulation, there is a ton of work to be done: protecting the translator 
as well as making the translated code smp safe.

I think that QemuDevice makes sense, and that we want this long term, 
but that we first need to improve efficiency (which reduces cpu 
utilization _and_ improves scalability) rather than look at scalability 
alone (which is much harder in addition to the drawback of not reducing 
cpu utilization).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] Ubuntu Gutsy host / XP guest / -smp 2

From: Avi K. <av...@qu...> - 2008-04-20 10:37:00

David Abrahams wrote:
>>>   
>>>       
>> Versions of kvm producing this sort of output are common in
>> archaeological digs.  Please try a more recent release.
>>     
>
> Well, I'll try Hardy Heron soon enough, I suppose.  It's due out in 2
> weeks.
>
> I'm sure you understand that most people can't afford to rebuild all
> their important software so that it stays on the bleeding edge.  Have
> you considered getting more recent versions of kvm into the updates or
> backports repositories of major distros?  I'm not really sure how much
> influence you can have over such things; I'm just asking.
>
>   

That's up to the distro maintainers, or concerned users (who may either 
volunteer work or apply pressure).

>>>> What HAL do you see in device manager?
>>>>     
>>>>         
>>> "Standard PC"
>>>
>>>   
>>>       
>> This HAL does not support SMP.  You need the "ACPI Multiprocessor PC"
>> HAL or some such.
>>     
>
> And how would I get that HAL set up?
>
>   

Follow http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround, 
substituting your desired HAL for "Standard PC".

>> Unless you have a recent Intel processor, the combination of SMP and
>> Windows XP will give noticeably lower performance.  I recommend sticking
>> with uniprocessor in such cases.
>>     
>
> I have a Core Duo; isn't that recent enough?
>   

No, this feature is present only on some of the Core 2s, IIRC.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [PATCH 1/1] Enble a guest to access a device's memory mapped I/O regions directly.

From: Avi K. <av...@qu...> - 2008-04-20 10:33:16

Muli Ben-Yehuda wrote:
>>>   
>>>       
>> Why avoid rmap on mmio pages?  Sure it's unnecessary work, but
>> having less cases improves overall reliability.
>>     
>
> The rmap functions already have a check to bail out if the pte is not
> an rmap pte, so in that sense, we aren't adding a new case for the
> code to handle, just adding direct MMIO ptes to the existing list of
> non-rmap ptes.
>
>   

I'm worried about the huge chain of direct_mmio parameters passed to 
functions, impact on the audit code (at the end of mmu.c, and the poor 
souls who debug the mmu.

>> You can use pfn_valid() in gfn_to_pfn() and kvm_release_pfn_*() to
>> conditionally update the page refcounts.
>>     
>
> Since rmap isn't useful for direct MMIO ptes, doesn't it make more
> sense to "bail out" early rather than in the bowls of the rmap code?
>   

It does, from a purist point of view (which also favors explicit 
parameters a la direct_mmio rather than indirect parameters like 
pfn_valid()), but I'm looking from the practical point of view now.

With mmu notifiers, we don't need to hold the refcount at all.  So 
presuming we drop the refcounting code completely, are any changes 
actually necessary here?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

[kvm-devel] Plunge in and navigate deep

From: Kovanen <kel...@WI...> - 2008-04-20 09:53:05

Pump it all night long with our new winning formula that gives you the extra boost you need. http://www.tritwat.com/

Re: [kvm-devel] KVM: MMU: kvm_pv_mmu_op should not take mmap_sem

From: Avi K. <av...@qu...> - 2008-04-20 08:34:27

Marcelo Tosatti wrote:
> kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
> in the MMU processing will take it if necessary, so as it is it can
> deadlock.
>
> Apparently a leftover from the days before slots_lock.
>   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

From: Avi K. <av...@qu...> - 2008-04-20 07:55:46

Anthony Liguori wrote:
>
> I'd prefer you not do an emulate_instruction loop at all.  Just 
> emulate one instruction on vmentry failure and let VT tell you what 
> instructions you need to emulate.
>
> It's only four instructions so I don't think the performance is going 
> to matter.  Take a look at the patch I posted previously.

Once we remove the other VT realmode hacks, we may need more 
instructions emulated.  Consider for example changing to real mode 
without reloading fs and gs; this will cause all real mode code to be 
emulated.

However, there's no need to do everything at once; the loop can 
certainly be added later when we have a proven need for it.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] QEMU/KVM: clear HF_HALTED mask at vcpu startup time

From: Avi K. <av...@qu...> - 2008-04-20 07:49:10

Marcelo Tosatti wrote:
> Now that threads are spinned up before machine->init(), clearing
> of HF_HALTED_MASK for irqchip in kernel case needs to be moved
> to actual vcpu startup.
>   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [PATCH] [QEMU POWERPC] FPRs no longer live in kvm_vcpu

From: Avi K. <av...@qu...> - 2008-04-20 07:42:31

Hollis Blanchard wrote:

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [PATCH] [POWERPC KVM] Kconfig fixes

From: Avi K. <av...@qu...> - 2008-04-20 07:39:26

Hollis Blanchard wrote:
> 1 file changed, 5 insertions(+), 6 deletions(-)
> arch/powerpc/kvm/Kconfig |   11 +++++------
>
>
> Don't allow building as a module (asm-offsets dependencies).
>
> Also, automatically select KVM_BOOKE_HOST until we better separate the guest
> and host layers.
>
>   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] [Qemu-devel] [PATCH 1/5] PCI DMA API (v3)

From: Blue S. <bla...@gm...> - 2008-04-20 06:42:13

On 4/19/08, Anthony Liguori <an...@co...> wrote:
> Blue Swirl wrote:
>
> > On 4/17/08, Anthony Liguori <ali...@us...> wrote:
> >
> >
> > >  Yes, the vector version of packet receive is tough.  I'll take a look
> at
> > > your patch.  Basically, you need to associate a set of RX vectors with
> each
> > > VLANClientState and then when it comes time to deliver a packet to the
> VLAN,
> > > before calling fd_read, see if there is an RX vector available for the
> > > client.
> > >
> > >  In the case of tap, I want to optimize further and do the initial
> readv()
> > > to one of the clients RX buffers and then copy that RX buffer to the
> rest of
> > > the clients if necessary.
> > >
> > >
> >
> > The vector versions should also help SLIRP to add IP and Ethernet
> > headers to the incoming packets.
> >
> >
>
>  Yeah, I'm hoping that with my posted linux-aio interface, I can add vector
> support since linux-aio has a proper asynchronous vector function.
>
>  Are we happy with the DMA API?  If so, we should commit it now so we can
> start adding proper vector interfaces for net/block.

Well, the IOVector part and bdrv_readv look OK, except for the heavy
mallocing involved.

I'm not so sure about the DMA side and how everything fits together
for zero-copy IO. For example, do we still need explicit translation
at some point?

Re: [kvm-devel] [kvm-ppc-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Liu, E. E <eri...@in...> - 2008-04-20 05:38:35

Christian Ehrhardt wrote:
> Liu, Eric E wrote:
>> Hollis Blanchard wrote:
>>> On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote: [...]
>>> Actually... we could have kvmtrace itself insert the metadata, so
>>> there would be no chance of it being overwritten in the kernel
>>> buffers. The header could be written in tip_open_output(), and
>>> update fs_size accordingly. 
>>> 
>> Yes, let kvmtrace insert the metadata is more reasonable.
>> 
> 
> I wanted to note that the kvmtrace tool should, but not need to know
> everything about the data format. 
> I think of e.g. changing kernel implementations that change endianess
> or even flags we don't yet know, but we might need in the future. 
> 
> What about adding another debugfs entry the kernel can use to expose
> the "kvmtrace-metadata" defined by the kernel implementation. 
> The kvmtrace tool could then use that to build up the record by using
> one entry for kernel defined metadata and another to add any metadata
> that would be defined by kvmtrace tool itself.  
> 
> what about that one:
> 	struct metadata {
> 		u32 kmagic; /* stores kernel defined metadata read from
debugfs
> 		entry */ u32 umagic; /* stores userspace tool defined
metadata */
> 		u32 extra;  /* it is redundant, only use to fit into
record. */
> 	}
> 
> That should give us the flexibility to keep the format if we get more
> metadata requirements in the future. 

Yes, maybe we need metadata to indicate the changing kernel
implementations in the future, but adding debugfs entry seems not a good
approach. What about defining a similar metadat in kernel rather than in
userland and write it in rchan at the first time we add trace data. Then
we don't need kvmtrace tool to insert the medadata again.
like this: 
	struct kvm_trace_metadata {
		u32 kmagic; /* stores kernel defined metadata */
 		u64 extra;  /* use to fit into record. */
 	}

Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup

From: Alex D. <ale...@ya...> - 2008-04-20 00:08:31

--- On Sat, 4/19/08, Marcelo Tosatti <mto...@re...> wrote:

> From: Marcelo Tosatti <mto...@re...>
> Subject: Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
> To: "Alex Davis" <ale...@ya...>
> Cc: av...@qu..., kvm...@li...
> Date: Saturday, April 19, 2008, 7:11 PM
> On Sat, Apr 19, 2008 at 03:47:31PM -0700, Alex Davis wrote:
> > --- On Fri, 4/18/08, Avi Kivity
> <av...@qu...> wrote:
> > 
> > > From: Avi Kivity <av...@qu...>
> > > Subject: Re: [kvm-devel] Second KVM process hangs
> eating 80-100% CPU on host during startup
> > [snip]
> > 
> > I tried booting the guest with 'lpj=10682525'
> to work around the 
> > calibrate_delay issue, but that gave me:
> > 
> > [    0.004100] ENABLING IO-APIC IRQs
> > [    0.004100] ..TIMER: vector=0x31 apic1=0 pin1=0
> apic2=-1 pin2=-1
> > [    0.004100] ..MP-BIOS bug: 8254 timer not connected
> to IO-APIC
> > [    0.004100] ...trying to set up timer (IRQ0)
> through the 8259A ... failed.
> > [    0.004100] ...trying to set up timer as Virtual
> Wire IRQ ... failed.
> > [    0.004100] ...trying to set up timer as ExtINT IRQ
> ... failed :(.
> > [    0.004100] Kernel panic - not syncing: IO-APIC +
> timer doesn't work! Boot
> > with apic=debug and send a report.Then try booting
> with the 'noapic' option.
> > 
> > Booting with 'apic=debug' gives these
> additional lines:
> > [    0.004100] Getting VERSION: 50014
> > [    0.004100] Getting VERSION: 50014
> > [    0.004100] Getting ID: 0
> > [    0.004100] Getting LVT0: 700
> > [    0.004100] Getting LVT1: 10000
> 
> Hi Alex,
> 
> Can you please try the following.
> 
> KVM: PIT: make last_injected_time per-guest
> 
> Otherwise multiple guests use the same variable and boom.
> 
> Also use kvm_vcpu_kick() to make sure that if a timer
> triggers on 
> a different CPU the event won't be missed.
> 
> Signed-off-by: Marcelo Tosatti <mto...@re...>
> 
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index 2852dd1..5697ad2 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -200,10 +200,8 @@ int __pit_timer_fn(struct
> kvm_kpit_state *ps)
>  
>  	atomic_inc(&pt->pending);
>  	smp_mb__after_atomic_inc();
> -	if (vcpu0 && waitqueue_active(&vcpu0->wq))
> {
> -		vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -		wake_up_interruptible(&vcpu0->wq);
> -	}
> +	if (vcpu0)
> +		kvm_vcpu_kick(vcpu0);
>  
>  	pt->timer.expires = ktime_add_ns(pt->timer.expires,
> pt->period);
>  	pt->scheduled = ktime_to_ns(pt->timer.expires);
> @@ -572,7 +570,6 @@ void kvm_inject_pit_timer_irqs(struct
> kvm_vcpu *vcpu)
>  	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_kpit_state *ps;
> -	static unsigned long last_injected_time;
>  
>  	if (vcpu && pit) {
>  		ps = &pit->pit_state;
> @@ -582,11 +579,11 @@ void kvm_inject_pit_timer_irqs(struct
> kvm_vcpu *vcpu)
>  		 * 2. Last interrupt was accepted or waited for too long
> time*/
>  		if (atomic_read(&ps->pit_timer.pending)
> &&
>  		    (ps->inject_pending ||
> -		    (jiffies - last_injected_time
> +		    (jiffies - ps->last_injected_time
>  				>= KVM_MAX_PIT_INTR_INTERVAL))) {
>  			ps->inject_pending = 0;
>  			__inject_pit_timer_intr(kvm);
> -			last_injected_time = jiffies;
> +			ps->last_injected_time = jiffies;
>  		}
>  	}
>  }
> diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
> index e63ef38..db25c2a 100644
> --- a/arch/x86/kvm/i8254.h
> +++ b/arch/x86/kvm/i8254.h
> @@ -35,6 +35,7 @@ struct kvm_kpit_state {
>  	struct mutex lock;
>  	struct kvm_pit *pit;
>  	bool inject_pending; /* if inject pending interrupts */
> +	unsigned long last_injected_time;
>  };
>  
>  struct kvm_pit {


Problem(s) solved. Everything is working now. Can now boot both with and
without 'lpj='. The BogoMIPs are also being calculated correctly in secondary guests without 'lpj='. I'll play with it some more just to make sure, then I'll close the original bug.

Thanks, Marcelo et al.


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup

From: Marcelo T. <mto...@re...> - 2008-04-19 23:08:06

On Sat, Apr 19, 2008 at 03:47:31PM -0700, Alex Davis wrote:
> --- On Fri, 4/18/08, Avi Kivity <av...@qu...> wrote:
> 
> > From: Avi Kivity <av...@qu...>
> > Subject: Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
> [snip]
> 
> I tried booting the guest with 'lpj=10682525' to work around the 
> calibrate_delay issue, but that gave me:
> 
> [    0.004100] ENABLING IO-APIC IRQs
> [    0.004100] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
> [    0.004100] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [    0.004100] ...trying to set up timer (IRQ0) through the 8259A ... failed.
> [    0.004100] ...trying to set up timer as Virtual Wire IRQ ... failed.
> [    0.004100] ...trying to set up timer as ExtINT IRQ ... failed :(.
> [    0.004100] Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot
> with apic=debug and send a report.Then try booting with the 'noapic' option.
> 
> Booting with 'apic=debug' gives these additional lines:
> [    0.004100] Getting VERSION: 50014
> [    0.004100] Getting VERSION: 50014
> [    0.004100] Getting ID: 0
> [    0.004100] Getting LVT0: 700
> [    0.004100] Getting LVT1: 10000

Hi Alex,

Can you please try the following.

KVM: PIT: make last_injected_time per-guest

Otherwise multiple guests use the same variable and boom.

Also use kvm_vcpu_kick() to make sure that if a timer triggers on 
a different CPU the event won't be missed.

Signed-off-by: Marcelo Tosatti <mto...@re...>


diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 2852dd1..5697ad2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -200,10 +200,8 @@ int __pit_timer_fn(struct kvm_kpit_state *ps)
 
 	atomic_inc(&pt->pending);
 	smp_mb__after_atomic_inc();
-	if (vcpu0 && waitqueue_active(&vcpu0->wq)) {
-		vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-		wake_up_interruptible(&vcpu0->wq);
-	}
+	if (vcpu0)
+		kvm_vcpu_kick(vcpu0);
 
 	pt->timer.expires = ktime_add_ns(pt->timer.expires, pt->period);
 	pt->scheduled = ktime_to_ns(pt->timer.expires);
@@ -572,7 +570,6 @@ void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
 	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_kpit_state *ps;
-	static unsigned long last_injected_time;
 
 	if (vcpu && pit) {
 		ps = &pit->pit_state;
@@ -582,11 +579,11 @@ void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
 		 * 2. Last interrupt was accepted or waited for too long time*/
 		if (atomic_read(&ps->pit_timer.pending) &&
 		    (ps->inject_pending ||
-		    (jiffies - last_injected_time
+		    (jiffies - ps->last_injected_time
 				>= KVM_MAX_PIT_INTR_INTERVAL))) {
 			ps->inject_pending = 0;
 			__inject_pit_timer_intr(kvm);
-			last_injected_time = jiffies;
+			ps->last_injected_time = jiffies;
 		}
 	}
 }
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index e63ef38..db25c2a 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -35,6 +35,7 @@ struct kvm_kpit_state {
 	struct mutex lock;
 	struct kvm_pit *pit;
 	bool inject_pending; /* if inject pending interrupts */
+	unsigned long last_injected_time;
 };
 
 struct kvm_pit {

Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup

From: Alex D. <ale...@ya...> - 2008-04-19 22:47:30

--- On Fri, 4/18/08, Avi Kivity <av...@qu...> wrote:

> From: Avi Kivity <av...@qu...>
> Subject: Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
[snip]

I tried booting the guest with 'lpj=10682525' to work around the 
calibrate_delay issue, but that gave me:

[    0.004100] ENABLING IO-APIC IRQs
[    0.004100] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
[    0.004100] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[    0.004100] ...trying to set up timer (IRQ0) through the 8259A ... failed.
[    0.004100] ...trying to set up timer as Virtual Wire IRQ ... failed.
[    0.004100] ...trying to set up timer as ExtINT IRQ ... failed :(.
[    0.004100] Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot
with apic=debug and send a report.Then try booting with the 'noapic' option.

Booting with 'apic=debug' gives these additional lines:
[    0.004100] Getting VERSION: 50014
[    0.004100] Getting VERSION: 50014
[    0.004100] Getting ID: 0
[    0.004100] Getting LVT0: 700
[    0.004100] Getting LVT1: 10000

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

157 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 52 53 54 55 56 .. 703 > >> (Page 54 of 703)

2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (33)	Nov (325)	Dec (320)
2007	Jan (484)	Feb (438)	Mar (407)	Apr (713)	May (831)	Jun (806)	Jul (1023)	Aug (1184)	Sep (1118)	Oct (1461)	Nov (1224)	Dec (1042)
2008	Jan (1449)	Feb (1110)	Mar (1428)	Apr (1643)	May (682)	Jun	Jul	Aug	Sep	Oct	Nov	Dec