You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(22) |
Sep
(45) |
Oct
(165) |
Nov
(149) |
Dec
(53) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(155) |
Feb
(71) |
Mar
(219) |
Apr
(262) |
May
(21) |
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jerone Y. <jy...@us...> - 2008-04-15 18:57:05
|
This patch apparently fell through the cracks or I didn't send the rised version to the list. These patches fix cpu initilization for PowerPC. Without them guest cannot be launched. Signed-off-by: Jerone Young <jy...@us...> 2 files changed, 6 insertions(+), 3 deletions(-) qemu/hw/ppc440_bamboo.c | 3 --- qemu/qemu-kvm-powerpc.c | 6 ++++++ |
From: Hollis B. <ho...@us...> - 2008-04-15 18:56:12
|
On Tuesday 15 April 2008 13:29:19 Jerone Young wrote: > On Tue, 2008-04-15 at 13:19 -0500, Hollis Blanchard wrote: > > On Tuesday 15 April 2008 11:14:52 Jerone Young wrote: > > > Actually there appears to be a real problem with preempt notify in 44x. > > > I had no gotten a chance to get back with you about it. But I did some > > > investigation into it last week. We are following the same code paths > > > (common code) as x86 for preempt initalization. But I ran some tests > > > using preempt_disable() & preempt_enable() around some places where it > > > would make since (places where we disable interrupts), but just using > > > these functions whould cause the kernel to dump sig #11. > > > > preempt_disable() and preempt_enable() are completely different, and I've > > called those in the past and had no problems. > > But it's not used anywhere in our code. And places where it can go .. > blows up big time. We don't explicitly use preempt_disable() now, because we use local_irq_disable() instead. However, as I said before, when we *did* use preempt_disable() explicitly, it didn't even blow up a little bit. preempt_disable() IS NOT THE SAME as preempt_notify_unregister(). Please read that again, because I think you're skimming over the function names. These are very different things. > > > The issue we have using function kvm_vcpu_block() that it is identical > > > to the code below, BUT it calls vcpu_put which then calls > > > preempt_notify_unregister() if it is called it will also sig #11. > > > > If preempt_notify_unregister() dereferences a bad pointer, that shouldn't > > be too difficult to track down. > > > > > I'm not sure if what is going on honestly. Based on what I found it > > > should "just work" as we are initializing everything like x86 (we are > > > calling preempt_notify_init() in the same place). But for 44x any > > > preempt notfication calls blow up. So it appears calling anything > > > preempt notify related just blows up. > > > > > > This is a much bigger issue. I'm not sure that we honest want to be > > > stuck on this for long periods of time just to have a function call in > > > a place where we honestly do not absolutely need to have it at this > > > time. Plus I'm no expert with these scheduling frameworks. But givin > > > what have read around the net what we have now "should" work. It just > > > doesn't. > > > > > > Something can come back to later. But for now we should just roll with > > > the working code. > > > > No, I don't want to commit the hack. Sorry if I didn't make that clear > > before. > > But this isn't a hack. You just want to use the common function. The > issue is there are a lot place we do not use the common function. We use common code wherever it makes sense, and it makes a lot of sense here. If you can point out other areas we could or should share code, I would be interested in taking a look. > Since > we really do need to get functional code up and going this is something > that can be postponed to be fixed. I will (and have been) take a look at > this to change this. But if we put a ton of emphasis on something that > isn't needed to get us going, then we will never move forward to solve > the real issues left to get this stuff usable. > > > Accessing a bad pointer doesn't seem like a "much bigger issue" to me, so > > if it's more complicated than that please elaborate. > > The problem is the pointer is assigned by the frame work. I'm not > passing in any pointer to preempt. So something more is going on. Obviously you're not passing a pointer in, yet sig11 means the code is dereferencing a bad pointer. The backtrace probably indicates exactly what pointer is causing the problem, and it's 100% reproducible, so it doesn't sound like anything too subtle. > I will > have to look into this more after I finish the next task. I don't > honestly know all that much about the preempt frame work. But I have > spent time trying to figure it out. Great, I will look forward to the corrected patch. -- Hollis Blanchard IBM Linux Technology Center |
From: Jerone Y. <jy...@us...> - 2008-04-15 18:54:31
|
1 file changed, 3 deletions(-) qemu/hw/ppc440_bamboo.c | 3 --- This patch removes the call to kvm_load_registers while in board platform setup code. This must now be done later in vcpu initialization. Signed-off-by: Jerone Young <jy...@us...> diff --git a/qemu/hw/ppc440_bamboo.c b/qemu/hw/ppc440_bamboo.c --- a/qemu/hw/ppc440_bamboo.c +++ b/qemu/hw/ppc440_bamboo.c @@ -174,9 +174,6 @@ void bamboo_init(ram_addr_t ram_size, in env->gpr[3] = dt_base; #endif env->nip = ep; - - printf("%s: loading kvm registers\n", __func__); - kvm_load_registers(env); } for (i = 0; i < nb_nics; i++) { |
From: Hollis B. <ho...@us...> - 2008-04-15 18:21:41
|
On Tuesday 15 April 2008 11:20:58 Jerone Young wrote: > > What happened to my suggestion of creating a per-arch HACK_FILES and > > UNIFDEF_FILES variables, and looping over those? > > These macros are only for x86. We don't want them or need them. So I > just left them be as not to accidentally miss or break anything. Right, they are only used for x86. So as I said before, create arch-specific HACK_FILES and UNIFDEF_FILES variables, and use those instead. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-15 18:19:45
|
On Tuesday 15 April 2008 11:14:52 Jerone Young wrote: > Actually there appears to be a real problem with preempt notify in 44x. > I had no gotten a chance to get back with you about it. But I did some > investigation into it last week. We are following the same code paths > (common code) as x86 for preempt initalization. But I ran some tests > using preempt_disable() & preempt_enable() around some places where it > would make since (places where we disable interrupts), but just using > these functions whould cause the kernel to dump sig #11. preempt_disable() and preempt_enable() are completely different, and I've called those in the past and had no problems. > The issue we have using function kvm_vcpu_block() that it is identical > to the code below, BUT it calls vcpu_put which then calls > preempt_notify_unregister() if it is called it will also sig #11. If preempt_notify_unregister() dereferences a bad pointer, that shouldn't be too difficult to track down. > I'm not sure if what is going on honestly. Based on what I found it > should "just work" as we are initializing everything like x86 (we are > calling preempt_notify_init() in the same place). But for 44x any > preempt notfication calls blow up. So it appears calling anything > preempt notify related just blows up. > > This is a much bigger issue. I'm not sure that we honest want to be > stuck on this for long periods of time just to have a function call in a > place where we honestly do not absolutely need to have it at this time. > Plus I'm no expert with these scheduling frameworks. But givin what have > read around the net what we have now "should" work. It just doesn't. > > Something can come back to later. But for now we should just roll with > the working code. No, I don't want to commit the hack. Sorry if I didn't make that clear before. Accessing a bad pointer doesn't seem like a "much bigger issue" to me, so if it's more complicated than that please elaborate. -- Hollis Blanchard IBM Linux Technology Center |
From: Jerone Y. <jy...@us...> - 2008-04-15 16:21:07
|
On Tue, 2008-04-15 at 09:08 -0500, Hollis Blanchard wrote: > On Monday 14 April 2008 21:46:43 Jerone Young wrote: > > 1 file changed, 13 insertions(+), 5 deletions(-) > > kernel/Makefile | 18 +++++++++++++----- > > > > > > This patch add the ability for make sync in the kernel directory to work > > for mulitiple architectures and not just x86. > > > > Signed-off-by: Jerone Young <jy...@us...> > > > > diff --git a/kernel/Makefile b/kernel/Makefile > > --- a/kernel/Makefile > > +++ b/kernel/Makefile > > @@ -1,5 +1,10 @@ include ../config.mak > > include ../config.mak > > > > +ASM_DIR=$(ARCH) > > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > > + ASM_DIR=x86 > > +endif > > Minor complaint: "ASM_DIR" really isn't. You use it as arch/$(ASM_DIR) and > also as include/asm-$(ASM_DIR). I think what you really meant is "ARCH_DIR" > (or similar). I can change it. Not that big of a deal. Oh left the ia64 on there by accident. > > > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > > $(call unifdef, include/linux/kvm.h) > > $(call unifdef, include/linux/kvm_para.h) > > $(call unifdef, include/asm-x86/kvm.h) > > @@ -54,6 +60,8 @@ sync: > > $(call hack, svm.c) > > $(call hack, x86.c) > > $(call hack, irq.h) > > +endif > > + > > Why are you keeping IA64 touching asm-x86? Accident. Cut and past error from the first mistake. > > What happened to my suggestion of creating a per-arch HACK_FILES and > UNIFDEF_FILES variables, and looping over those? These macros are only for x86. We don't want them or need them. So I just left them be as not to accidentally miss or break anything. > |
From: Jerone Y. <jy...@us...> - 2008-04-15 16:15:52
|
Actually there appears to be a real problem with preempt notify in 44x. I had no gotten a chance to get back with you about it. But I did some investigation into it last week. We are following the same code paths (common code) as x86 for preempt initalization. But I ran some tests using preempt_disable() & preempt_enable() around some places where it would make since (places where we disable interrupts), but just using these functions whould cause the kernel to dump sig #11. The issue we have using function kvm_vcpu_block() that it is identical to the code below, BUT it calls vcpu_put which then calls preempt_notify_unregister() if it is called it will also sig #11. I'm not sure if what is going on honestly. Based on what I found it should "just work" as we are initializing everything like x86 (we are calling preempt_notify_init() in the same place). But for 44x any preempt notfication calls blow up. So it appears calling anything preempt notify related just blows up. This is a much bigger issue. I'm not sure that we honest want to be stuck on this for long periods of time just to have a function call in a place where we honestly do not absolutely need to have it at this time. Plus I'm no expert with these scheduling frameworks. But givin what have read around the net what we have now "should" work. It just doesn't. Something can come back to later. But for now we should just roll with the working code. On Tue, 2008-04-15 at 10:28 -0500, Hollis Blanchard wrote: > On Thursday 03 April 2008 14:35:59 Jerone Young wrote: > > # HG changeset patch > > # User Jerone Young <jy...@us...> > > # Date 1207250241 18000 > > # Node ID 3e781009d2f28c4691ccbb999bf679716f66f349 > > # Parent b7794c1fa50b531c9b84382c3f3d9a5466d86c0d > > Add PowerPC KVM guest wait handling support > > > > This patch handles a guest that is in a wait state. This ensures that the > > guest is not allways eating up 100% cpu when it is idle. > > > > Signed-off-by: Jerone Young <jy...@us...> > > > > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > > --- a/arch/powerpc/kvm/emulate.c > > +++ b/arch/powerpc/kvm/emulate.c > > @@ -269,6 +269,25 @@ int kvmppc_emulate_instruction(struct kv > > case 146: /* mtmsr */ > > rs = get_rs(inst); > > kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]); > > + > > + /* handle guest vcpu that is in wait state */ > > + if (vcpu->arch.msr & MSR_WE) { > > + /* XXX eventually replace with kvm_vcpu_block() */ > > + DECLARE_WAITQUEUE(wait, current); > > + > > + add_wait_queue(&vcpu->wq, &wait); > > + > > + while (!kvm_cpu_has_interrupt(vcpu) > > + && !signal_pending(current) > > + && !kvm_arch_vcpu_runnable(vcpu)) { > > + set_current_state(TASK_INTERRUPTIBLE); > > + schedule(); > > + } > > + > > + __set_current_state(TASK_RUNNING); > > + remove_wait_queue(&vcpu->wq, &wait); > > + } > > + > > break; > > > > case 163: /* wrteei */ > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > > --- a/arch/powerpc/kvm/powerpc.c > > +++ b/arch/powerpc/kvm/powerpc.c > > @@ -164,13 +164,12 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *v > > > > int kvm_cpu_has_interrupt(struct kvm_vcpu *v) > > { > > - /* XXX implement me */ > > - return 0; > > + return !!(v->arch.pending_exceptions); > > } > > > > int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) > > { > > - return 1; > > + return !(v->arch.msr & MSR_WE); > > } > > > > /* Check if we are ready to deliver the interrupt */ > > Hmm, did you ever revise this patch? I don't see the corrected version. > |
From: Hollis B. <ho...@us...> - 2008-04-15 15:30:00
|
On Thursday 03 April 2008 14:35:59 Jerone Young wrote: > # HG changeset patch > # User Jerone Young <jy...@us...> > # Date 1207250241 18000 > # Node ID 3e781009d2f28c4691ccbb999bf679716f66f349 > # Parent b7794c1fa50b531c9b84382c3f3d9a5466d86c0d > Add PowerPC KVM guest wait handling support > > This patch handles a guest that is in a wait state. This ensures that the > guest is not allways eating up 100% cpu when it is idle. > > Signed-off-by: Jerone Young <jy...@us...> > > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > --- a/arch/powerpc/kvm/emulate.c > +++ b/arch/powerpc/kvm/emulate.c > @@ -269,6 +269,25 @@ int kvmppc_emulate_instruction(struct kv > case 146: /* mtmsr */ > rs = get_rs(inst); > kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]); > + > + /* handle guest vcpu that is in wait state */ > + if (vcpu->arch.msr & MSR_WE) { > + /* XXX eventually replace with kvm_vcpu_block() */ > + DECLARE_WAITQUEUE(wait, current); > + > + add_wait_queue(&vcpu->wq, &wait); > + > + while (!kvm_cpu_has_interrupt(vcpu) > + && !signal_pending(current) > + && !kvm_arch_vcpu_runnable(vcpu)) { > + set_current_state(TASK_INTERRUPTIBLE); > + schedule(); > + } > + > + __set_current_state(TASK_RUNNING); > + remove_wait_queue(&vcpu->wq, &wait); > + } > + > break; > > case 163: /* wrteei */ > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -164,13 +164,12 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *v > > int kvm_cpu_has_interrupt(struct kvm_vcpu *v) > { > - /* XXX implement me */ > - return 0; > + return !!(v->arch.pending_exceptions); > } > > int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) > { > - return 1; > + return !(v->arch.msr & MSR_WE); > } > > /* Check if we are ready to deliver the interrupt */ Hmm, did you ever revise this patch? I don't see the corrected version. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-15 14:41:55
|
Could you please use a blank line when you intend to start a new paragraph in your email? The lack of whitespace here hurts readability. On Tuesday 15 April 2008 02:34:47 Christian Ehrhardt wrote: > Hollis Blanchard wrote: > > On Monday 14 April 2008 07:23:56 ehr...@li... wrote: > >> From: Christian Ehrhardt <ehr...@li...> > >> > >> This extends the kvm_stat reports for kvm on embedded powerpc. Since > >> kvmppc is using emulation (no hardware support yet) this gives people > >> interested in performance a detailed split of the emul_instruction > >> counter already available. This statistic does not cover e.g. operants > >> of the commands, but that way it should have only a small perf impact > >> (never break what you want to measure). This feature is configurable in > >> .config under the kvmppc virtualization itself. > > > > This array-based approach seems to add a lot of lines of code, and it's > > also copy/paste stuff that is just begging for a typo (e.g. miscounting > > STHX in the STHUX bucket) or being forgotten entirely when adding new > > emulation code. > > > > A more general approach would be to record just the opcode/extended > > opcode in a variable-sized structure, allocating a new bucket as we > > encounter a previously unseen instruction. I'm thinking of something like > > this: > > I already thought about some bucket like counting, but I focused small > runtime overhead above everything else here by hijacking the already > existent switch/case and avoiding any new if's & function calls. I agree > that the array like solution is messy in the aspect of lines and is more > error prone by forgetting something. > > > log_instruction(inst) { > > bucket = hash_u32(inst, 5); > > list_for_each(l, vcpu->instlog[bucket]) { > > struct instlog = list_entry(l); > > if (instlog->inst == inst) { > > instlog->count++; > > break; > > } > > } > > } > > > > emulate(inst) { > > log = get_op(inst); > > switch (get_op(inst)) { > > ... > > case 31: > > log |= get_xop(inst); > > switch (inst) { > > ... > > } > > } > > log_instruction(log); > > } > > I agree that this looks better, but is has per instruction: > 1x hash function This is a shift. > a loop wich runs ~1-2 times A couple additions and memory accesses. > an if (do we actually have long pipelines that suffer from wrong branch > prediction?) I believe we have many "if" statements. :) > When we add more functions here like mapping sprn's and > whatever else comes in mind later this might get even more complex, while > this statistics should be the low overhead stats. I don't know what "mapping SPRNs" means. In general though, I'm not so worried about adding some memory references or branches in this path. First, there are much bigger problems to worry about (need I remind you about our TLB faults?). Second, it's the trace path: if we care that much, we would just disable tracing. I think the benefits of the more general approach are well worth it. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-15 14:21:00
|
On Tuesday 15 April 2008 01:20:48 Christian Ehrhardt wrote: > > I'm not sure how I feel about logging *every* faulting instruction, > > rather than aggregating statistics about them. Actually, I know how I > > feel, and it's uncomfortable. :) > > > > It's a userspace process that must be extracting this data from the relay > > buffers, and that only happens when it's scheduled. How many instruction > > exits could we get in a few timeslices? I think we could easily overflow > > the relay buffers, especially if the host is under load. > > The patch series intentionally has two instruction statistics. > One for low overhead giving you just some counters, and additionally we can > extend these counters as we want e.g. with sprn numbers. The other is to > enable knowingly a high overhead profiling, but it gives you a full trace > which can be used e.g. to completely see "what exactly is the guest doing > there" by looking at which instructions are emulated in which order. So > both have there right to exist, because there are use scenarios for both of > them. By making them configurable it is developers choice what to use. I guess it can be useful to see what's happening when a guest is getting "stuck". However, there are other interesting events to account for in this case as well, such as the TLB tracing I posted earlier. I think it makes sense to include all of those events in the same relay channel, rather than having one channel for instructions, one for TLB, etc. > >> +struct instr_record { > >> + u32 time; > >> + u32 pc; > >> + u32 instr; > >> + u32 rsval; > >> + u32 raval; > >> + u32 rbval; > >> +}; > > > > In addition to the instruction itself, I can see how the PC would be > > useful, and maybe also the time when we have multiple cores. (Well, or > > maybe it would be better to have one channel per core anyways.) But I > > think RS/RA/RB are way overkill here. What would a userspace tool do with > > this data anyways? > > The decoding scripts shows you what values are used e.g. for a mtmsr you > see what value was moved to the register. But I agree that the value of > r*val is much smaller than time,pc,instr. Removing these three should speed > it up to allow us tracing all instructions. I will change the patch and > resend it. Does that really speed it up? I wouldn't expect a measurable difference. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-15 14:10:47
|
On Monday 14 April 2008 21:46:43 Jerone Young wrote: > 1 file changed, 13 insertions(+), 5 deletions(-) > kernel/Makefile | 18 +++++++++++++----- > > > This patch add the ability for make sync in the kernel directory to work > for mulitiple architectures and not just x86. > > Signed-off-by: Jerone Young <jy...@us...> > > diff --git a/kernel/Makefile b/kernel/Makefile > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -1,5 +1,10 @@ include ../config.mak > include ../config.mak > > +ASM_DIR=$(ARCH) > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > + ASM_DIR=x86 > +endif Minor complaint: "ASM_DIR" really isn't. You use it as arch/$(ASM_DIR) and also as include/asm-$(ASM_DIR). I think what you really meant is "ARCH_DIR" (or similar). > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > $(call unifdef, include/linux/kvm.h) > $(call unifdef, include/linux/kvm_para.h) > $(call unifdef, include/asm-x86/kvm.h) > @@ -54,6 +60,8 @@ sync: > $(call hack, svm.c) > $(call hack, x86.c) > $(call hack, irq.h) > +endif > + Why are you keeping IA64 touching asm-x86? What happened to my suggestion of creating a per-arch HACK_FILES and UNIFDEF_FILES variables, and looping over those? -- Hollis Blanchard IBM Linux Technology Center |
From: Avi K. <av...@qu...> - 2008-04-15 08:38:28
|
Jerone Young wrote: > 2 files changed, 2 insertions(+), 8 deletions(-) > qemu/Makefile.target | 3 +-- > qemu/configure | 7 +------ > > > Now that kvm headers are synced locally, qemu does not need a specific option to find the kernel headers as they can now be specified in the --extra-cflags option. > > I find it quite useful to compile qemu without running 'make sync' too often (using --with-patched-kernel). These days the headers don't change so often, so maybe this is less of an issue. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-04-15 08:35:40
|
Jerone Young wrote: > 2 files changed, 25 insertions(+), 4 deletions(-) > Makefile | 21 ++++++++++++++++++++- > configure | 8 +++++--- > > > This patch adds ability for kvm-userspace build system to sync needed kernel headers locally without the need of compiled kernel source. > > Signed-off-by: Jerone Young <jy...@us...> > > diff --git a/Makefile b/Makefile > --- a/Makefile > +++ b/Makefile > @@ -7,7 +7,7 @@ rpmrelease = devel > > .PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt > > -all: libkvm qemu > +all: sync libkvm qemu > 'all' shouldn't include 'sync', since it's run by end users that have just the tarball, not an entire kernel. > ifneq '$(filter $(ARCH), x86_64 i386 ia64)' '' > all: $(if $(WANT_MODULE), kernel) user > endif > @@ -69,6 +69,24 @@ install: > make -C libkvm DESTDIR="$(DESTDIR)" install > make -C qemu DESTDIR="$(DESTDIR)" install > > + > +ASM_DIR=$(ARCH) > +ifneq '$(filter $(ARCH), i386 x86_64)' '' > + ASM_DIR=x86 > +endif > + > +sync: > + mkdir -p $(INCLUDES_DIR) > + mkdir -p $(INCLUDES_DIR)/asm-$(ASM_DIR) > + mkdir -p $(INCLUDES_DIR)/linux > + cp -f $(KERNELDIR)/include/asm-$(ASM_DIR)/kvm*.h \ > + $(INCLUDES_DIR)/asm-$(ASM_DIR)/ > + cp -f $(KERNELDIR)/include/linux/kvm*.h \ > + $(KERNELDIR)/include/linux/compiler*.h \ > + $(INCLUDES_DIR)/linux > + ln -sf $(INCLUDES_DIR)/asm-$(ASM_DIR) $(INCLUDES_DIR)/asm > + > + > Please use the existing infrastructure in kernel/Makefile. -- error compiling committee.c: too many arguments to function |
From: Christian E. <ehr...@li...> - 2008-04-15 07:34:08
|
Hollis Blanchard wrote: > On Monday 14 April 2008 07:23:56 ehr...@li... wrote: >> From: Christian Ehrhardt <ehr...@li...> >> >> This extends the kvm_stat reports for kvm on embedded powerpc. Since kvmppc >> is using emulation (no hardware support yet) this gives people interested >> in performance a detailed split of the emul_instruction counter already >> available. This statistic does not cover e.g. operants of the commands, but >> that way it should have only a small perf impact (never break what you want >> to measure). This feature is configurable in .config under the kvmppc >> virtualization itself. > > This array-based approach seems to add a lot of lines of code, and it's also > copy/paste stuff that is just begging for a typo (e.g. miscounting STHX in > the STHUX bucket) or being forgotten entirely when adding new emulation code. > > A more general approach would be to record just the opcode/extended opcode in > a variable-sized structure, allocating a new bucket as we encounter a > previously unseen instruction. I'm thinking of something like this: I already thought about some bucket like counting, but I focused small runtime overhead above everything else here by hijacking the already existent switch/case and avoiding any new if's & function calls. I agree that the array like solution is messy in the aspect of lines and is more error prone by forgetting something. > log_instruction(inst) { > bucket = hash_u32(inst, 5); > list_for_each(l, vcpu->instlog[bucket]) { > struct instlog = list_entry(l); > if (instlog->inst == inst) { > instlog->count++; > break; > } > } > } > > emulate(inst) { > log = get_op(inst); > switch (get_op(inst)) { > ... > case 31: > log |= get_xop(inst); > switch (inst) { > ... > } > } > log_instruction(log); > } I agree that this looks better, but is has per instruction: 1x hash function a loop wich runs ~1-2 times an if (do we actually have long pipelines that suffer from wrong branch prediction?) When we add more functions here like mapping sprn's and whatever else comes in mind later this might get even more complex, while this statistics should be the low overhead stats. The thing I want to point out is that the tracing with the relay channel also only have (speaking of the reduced version discussed in the other thread with only 3 integers) 3 integers to move into a buffer and some overhead to select the right buffer. The userspace application reads 4096b => 341 records per scheduling of the reading app and we might even increase that size. I really like the "full trace" feeling of the relay based patch, so I don't like to increase the functionality&overhead of this patch which was intended to be low-overhead. As stated above I agree with you about lines of code and error proneness of that patch. What about that: - for integration into our upstream code only the relay based approach is take into account, giving a full trace while it has some overhead - we keep the slight overhead, but error prone kvm_stat based patch in our private queues and only use it for our own measurements until we either need or change it once we have more experience with the relay based tracing (e.g. know the perf. overhead of that better) - That way we would have two very similar solutions that share code and are configurable so developers can use them as they want - additionally I think again about Arnd's comments and if we might need/want a runtime switch e.g. proc/sys interface to enable/disable that tracing functions on the fly Comments welcome > It looks like we could build a hash table pretty easily with hash_long() and > list_entry stuff (see fs/mbcache.c for example). So far you've found 17 > different instructions types emulated in the "boot" workload, so 32 entries > would probably be a reasonable hash size. The same approach could be used for > SPR accesses, where you've hit 31 different registers. Really I think those > numbers won't vary much by workload, but rather by guest kernel... > -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization |
From: Avi K. <av...@qu...> - 2008-04-15 07:25:49
|
Jerone Young wrote: > I just took the old description that was there before. A much better one > would be: > > Remove declarations of kvm_*_pit() on architectures who do not support > not have a PIT. > > That is what I was really intending. It removes a lot of compile > warnings, when compiling anything with libkvm.h on platforms that do not > have a pit. It's mainly because of the structures that are used as > arguments to these function declrations. > Thanks, applied. -- error compiling committee.c: too many arguments to function |
From: Christian E. <ehr...@li...> - 2008-04-15 06:21:07
|
Hollis Blanchard wrote: > On Monday 14 April 2008 07:23:57 ehr...@li... wrote: >> From: Christian Ehrhardt <ehr...@li...> >> >> This adds a relayfs based kernel interface that allows tracing of all >> emulated guest instructions. It is based on Hollis tracing of tlb >> activities. A suitable userspace program to read from that interface and a >> post processing script that give users a readable output follow. >> This feature is configurable in .config under the kvmppc virtualization >> itself. > > I'm not sure how I feel about logging *every* faulting instruction, rather > than aggregating statistics about them. Actually, I know how I feel, and it's > uncomfortable. :) > > It's a userspace process that must be extracting this data from the relay > buffers, and that only happens when it's scheduled. How many instruction > exits could we get in a few timeslices? I think we could easily overflow the > relay buffers, especially if the host is under load. The patch series intentionally has two instruction statistics. One for low overhead giving you just some counters, and additionally we can extend these counters as we want e.g. with sprn numbers. The other is to enable knowingly a high overhead profiling, but it gives you a full trace which can be used e.g. to completely see "what exactly is the guest doing there" by looking at which instructions are emulated in which order. So both have there right to exist, because there are use scenarios for both of them. By making them configurable it is developers choice what to use. >> +struct instr_record { >> + u32 time; >> + u32 pc; >> + u32 instr; >> + u32 rsval; >> + u32 raval; >> + u32 rbval; >> +}; > > In addition to the instruction itself, I can see how the PC would be useful, > and maybe also the time when we have multiple cores. (Well, or maybe it would > be better to have one channel per core anyways.) But I think RS/RA/RB are way > overkill here. What would a userspace tool do with this data anyways? > The decoding scripts shows you what values are used e.g. for a mtmsr you see what value was moved to the register. But I agree that the value of r*val is much smaller than time,pc,instr. Removing these three should speed it up to allow us tracing all instructions. I will change the patch and resend it. -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization |
From: Anthony L. <an...@co...> - 2008-04-15 03:36:42
|
Jerone Young wrote: > 1 file changed, 13 insertions(+), 5 deletions(-) > kernel/Makefile | 18 +++++++++++++----- > > > This patch add the ability for make sync in the kernel directory to work for mulitiple architectures and not just x86. > > Signed-off-by: Jerone Young <jy...@us...> > > diff --git a/kernel/Makefile b/kernel/Makefile > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -1,5 +1,10 @@ include ../config.mak > include ../config.mak > > +ASM_DIR=$(ARCH) > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > + ASM_DIR=x86 > +endif > That doesn't look correct for ia64... Regards, Anthony Liguori > KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR)) > > DESTDIR= > @@ -34,15 +39,16 @@ sync: > sync: > rm -rf tmp include > rsync --exclude='*.mod.c' -R \ > - "$(LINUX)"/arch/x86/kvm/./*.[ch] \ > + "$(LINUX)"/arch/$(ASM_DIR)/kvm/./*.[ch] \ > "$(LINUX)"/virt/kvm/./*.[ch] \ > "$(LINUX)"/./include/linux/kvm*.h \ > - "$(LINUX)"/./include/asm-x86/kvm*.h \ > + "$(LINUX)"/./include/asm-$(ASM_DIR)/kvm*.h \ > tmp/ > - mkdir -p include/linux include/asm-x86 > - ln -s asm-x86 include/asm > - ln -sf asm-x86 include-compat/asm > + mkdir -p include/linux include/asm-$(ASM_DIR) > + ln -s asm-$(ASM_DIR) include/asm > + ln -sf asm-$(ASM_DIR) include-compat/asm > > +ifneq '$(filter $(ASM_DIR), x86_64 i386 ia64)' '' > $(call unifdef, include/linux/kvm.h) > $(call unifdef, include/linux/kvm_para.h) > $(call unifdef, include/asm-x86/kvm.h) > @@ -54,6 +60,8 @@ sync: > $(call hack, svm.c) > $(call hack, x86.c) > $(call hack, irq.h) > +endif > + > for i in $$(find tmp -type f -printf '%P '); \ > do cmp -s $$i tmp/$$i || cp tmp/$$i $$i; done > rm -rf tmp > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel > |
From: Hollis B. <ho...@us...> - 2008-04-14 21:45:38
|
On Monday 14 April 2008 07:23:57 ehr...@li... wrote: > From: Christian Ehrhardt <ehr...@li...> > > This adds a relayfs based kernel interface that allows tracing of all > emulated guest instructions. It is based on Hollis tracing of tlb > activities. A suitable userspace program to read from that interface and a > post processing script that give users a readable output follow. > This feature is configurable in .config under the kvmppc virtualization > itself. I'm not sure how I feel about logging *every* faulting instruction, rather than aggregating statistics about them. Actually, I know how I feel, and it's uncomfortable. :) It's a userspace process that must be extracting this data from the relay buffers, and that only happens when it's scheduled. How many instruction exits could we get in a few timeslices? I think we could easily overflow the relay buffers, especially if the host is under load. > +struct instr_record { > + u32 time; > + u32 pc; > + u32 instr; > + u32 rsval; > + u32 raval; > + u32 rbval; > +}; In addition to the instruction itself, I can see how the PC would be useful, and maybe also the time when we have multiple cores. (Well, or maybe it would be better to have one channel per core anyways.) But I think RS/RA/RB are way overkill here. What would a userspace tool do with this data anyways? -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-14 21:34:34
|
On Monday 14 April 2008 07:23:56 ehr...@li... wrote: > From: Christian Ehrhardt <ehr...@li...> > > This extends the kvm_stat reports for kvm on embedded powerpc. Since kvmppc > is using emulation (no hardware support yet) this gives people interested > in performance a detailed split of the emul_instruction counter already > available. This statistic does not cover e.g. operants of the commands, but > that way it should have only a small perf impact (never break what you want > to measure). This feature is configurable in .config under the kvmppc > virtualization itself. This array-based approach seems to add a lot of lines of code, and it's also copy/paste stuff that is just begging for a typo (e.g. miscounting STHX in the STHUX bucket) or being forgotten entirely when adding new emulation code. A more general approach would be to record just the opcode/extended opcode in a variable-sized structure, allocating a new bucket as we encounter a previously unseen instruction. I'm thinking of something like this: log_instruction(inst) { bucket = hash_u32(inst, 5); list_for_each(l, vcpu->instlog[bucket]) { struct instlog = list_entry(l); if (instlog->inst == inst) { instlog->count++; break; } } } emulate(inst) { log = get_op(inst); switch (get_op(inst)) { ... case 31: log |= get_xop(inst); switch (inst) { ... } } log_instruction(log); } It looks like we could build a hash table pretty easily with hash_long() and list_entry stuff (see fs/mbcache.c for example). So far you've found 17 different instructions types emulated in the "boot" workload, so 32 entries would probably be a reasonable hash size. The same approach could be used for SPR accesses, where you've hit 31 different registers. Really I think those numbers won't vary much by workload, but rather by guest kernel... -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-14 20:26:30
|
On Monday 14 April 2008 08:00:47 Christian Ehrhardt wrote: > > A short check of the ratio between the different sprn's used gave that > listing: grep sprn boot_halt_v4.instr.log | awk '{ printf("%s > %s\n",$11,$12) }' | sort | uniq -c| sort -rn 2461 0x01b SRR1 > 2256 0x110 SPRG0 > 2245 0x111 SPRG1 > 2004 0x01a SRR0 > 1538 0x113 SPRG3 > 1127 0x114 SPRG4 > 1034 0x3b2 MMUCR > 979 0x016 DEC > 597 0x030 PID > 509 0x115 SPRG5 > 499 0x117 SPRG7 > 475 0x150 TSR > 358 0x03d DEAR > 106 0x03e ESR > 3 0x11f PVR > 1 0x19f IVOR15 > 1 0x19e IVOR14 > 1 0x19d IVOR13 > 1 0x19c IVOR12 > 1 0x19b IVOR11 > 1 0x19a IVOR10 > 1 0x199 IVOR9 > 1 0x198 IVOR8 > 1 0x197 IVOR7 > 1 0x196 IVOR6 > 1 0x195 IVOR5 > 1 0x194 IVOR4 > 1 0x193 IVOR3 > 1 0x192 IVOR2 > 1 0x191 IVOR1 > 1 0x190 IVOR0 > 1 0x03f IVPR Good stuff. > While the instruction statistics for all instructions in that log is: > $awk '{ printf("%s\n",$8) }' boot_halt_v4.instr.log | sort | uniq -c | sort > -rn 9157 mtspr What I find interesting is that the top 7 SPR traps are for registers that have no immediate effect on execution, and that means they could easily be replaced with memory references (in both mtspr and mfspr cases). > 7494 wrteei > 7051 mfspr > 6073 mfmsr Another good candidate for rewriting as a memory reference. > 4532 wrtee Hmm, wrtee(i) would be more tricky to replace with memory references, at least without nop padding in the guest... > 1693 rfi > 1647 tlbwe > 943 mtmsr > 89 lbz > 84 stb > 81 tlbsx > 22 mfdcr > 21 lbzx > 20 mtdcr > 7 stbx > 3 sthbrx > 3 iccci -- Hollis Blanchard IBM Linux Technology Center |
From: Christian E. <ehr...@li...> - 2008-04-14 15:32:12
|
Arnd Bergmann wrote: > On Monday 14 April 2008, ehr...@li... wrote: >> @@ -58,6 +58,14 @@ config KVM_POWERPC_440_TRACE_INSTRUCTION >> channel. >> If unsure, say N. >> >> +config KVM_POWERPC_440_TRACE_TLB >> + bool "ppc440 guest tlb activity tracing" >> + depends on KVM && 44x && KVM_POWERPC_440 >> + select RELAY >> + ---help--- >> + Adds the complete tracing of the tlb activities via a relayfs channel. >> + If unsure, say N. >> + > > I think you're taking the configurability a little too far, do you really > want users to be able to select the three new tracing options separately? > Choosing between all stats and no stats at all sounds sufficient to me. > > Arnd <>< hmmm ... I intended it exaclty to be that selectable to save runtime overhead because e.g. the relay code add overhead all the time. We might think of enabling them at runtime, but since all are developer tools I would prefer the .config way. E.g. If I want to low overhead profile instructions so I would only enable KVM_POWERPC_440_INSTRUCTION_STAT and none of the others. Actually that's the only reason the kvm_stat version exists, because the instruction tracing has the same info + much more just with more overhead. btw - I already think of removing the kvm_stat variant, because the relay based tracing revealed to be not as overhead bound as I thought. -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization |
From: Avi K. <av...@qu...> - 2008-04-14 14:44:05
|
Anthony Liguori wrote: > > I think as we split libkvm into it's own library and remove the > dependency of kernel headers from libkvm consumers, this will stop > being a problem in practice. > That will take a while, as there are many structures used for libkvm<->user communications which are provided by the kernel headers, and our TODO queue is not quite empty. -- error compiling committee.c: too many arguments to function |
From: Arnd B. <ar...@ar...> - 2008-04-14 14:39:44
|
On Monday 14 April 2008, ehr...@li... wrote: > @@ -58,6 +58,14 @@ config KVM_POWERPC_440_TRACE_INSTRUCTION > channel. > If unsure, say N. > > +config KVM_POWERPC_440_TRACE_TLB > + bool "ppc440 guest tlb activity tracing" > + depends on KVM && 44x && KVM_POWERPC_440 > + select RELAY > + ---help--- > + Adds the complete tracing of the tlb activities via a relayfs channel. > + If unsure, say N. > + I think you're taking the configurability a little too far, do you really want users to be able to select the three new tracing options separately? Choosing between all stats and no stats at all sounds sufficient to me. Arnd <>< |
From: Anthony L. <an...@co...> - 2008-04-14 14:23:48
|
Avi Kivity wrote: > Christoph Hellwig wrote: > >> It would be nice to just be able to build kvm from git without a kernel >> around. The lack of that is what in fact keeps from hacking kvm >> userspace currently. >> >> > > It would be nice, but committing all header changes twice is not so nice > and error prone as well. > > Anyone serious about hacking kvm-userspace would not be deterred by the > additional clone. > > [maybe we can provide an automatically-generated git tree that has only > the kernel headers?] > Then people will simply find something else to complain about :-) I think as we split libkvm into it's own library and remove the dependency of kernel headers from libkvm consumers, this will stop being a problem in practice. Regards, Anthony Liguori |
From: Anthony L. <an...@co...> - 2008-04-14 14:20:17
|
Christoph Hellwig wrote: > On Mon, Apr 14, 2008 at 03:37:04AM -0500, Jerone Young wrote: > >> 2 files changed, 25 insertions(+), 4 deletions(-) >> Makefile | 21 ++++++++++++++++++++- >> configure | 8 +++++--- >> >> >> This patch adds ability for kvm-userspace build system to sync needed kernel headers locally without the need of compiled kernel source. >> > > Please just keep a copy of the kernel headers in the userspace tree so > it can be built standalone. > This is what make sync in kernel/ does FWIW and what we distribute for releases. Regards, Anthony Liguori > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel > |