Re: [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction emulation statistics

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Could you please use a blank line when you intend to start a new paragraph in 
your email? The lack of whitespace here hurts readability.

On Tuesday 15 April 2008 02:34:47 Christian Ehrhardt wrote:
> Hollis Blanchard wrote:
> > On Monday 14 April 2008 07:23:56 ehr...@li... wrote:
> >> From: Christian Ehrhardt <ehr...@li...>
> >>
> >> This extends the kvm_stat reports for kvm on embedded powerpc. Since
> >> kvmppc is using emulation (no hardware support yet) this gives people
> >> interested in performance a detailed split of the emul_instruction
> >> counter already available. This statistic does not cover e.g. operants
> >> of the commands, but that way it should have only a small perf impact
> >> (never break what you want to measure). This feature is configurable in
> >> .config under the kvmppc virtualization itself.
> >
> > This array-based approach seems to add a lot of lines of code, and it's
> > also copy/paste stuff that is just begging for a typo (e.g. miscounting
> > STHX in the STHUX bucket) or being forgotten entirely when adding new
> > emulation code.
> >
> > A more general approach would be to record just the opcode/extended
> > opcode in a variable-sized structure, allocating a new bucket as we
> > encounter a previously unseen instruction. I'm thinking of something like
> > this:
>
> I already thought about some bucket like counting, but I focused small
> runtime overhead above everything else here by hijacking the already
> existent switch/case and avoiding any new if's & function calls. I agree
> that the array like solution is messy in the aspect of lines and is more
> error prone by forgetting something.
>
> > log_instruction(inst) {
> > 	bucket = hash_u32(inst, 5);
> > 	list_for_each(l, vcpu->instlog[bucket]) {
> > 		struct instlog = list_entry(l);
> > 		if (instlog->inst == inst) {
> > 			instlog->count++;
> > 			break;
> > 		}
> > 	}
> > }
> >
> > emulate(inst) {
> > 	log = get_op(inst);
> > 	switch (get_op(inst)) {
> > 		...
> > 	case 31:
> > 		log |= get_xop(inst);
> > 		switch (inst) {
> > 			...
> > 		}
> > 	}
> > 	log_instruction(log);
> > }
>
> I agree that this looks better, but is has per instruction:
> 1x hash function

This is a shift.

> a loop wich runs ~1-2 times

A couple additions and memory accesses.

> an if (do we actually have long pipelines that suffer from wrong branch
> prediction?)

I believe we have many "if" statements. :)

> When we add more functions here like mapping sprn's and 
> whatever else comes in mind later this might get even more complex, while
> this statistics should be the low overhead stats.

I don't know what "mapping SPRNs" means.

In general though, I'm not so worried about adding some memory references or 
branches in this path. First, there are much bigger problems to worry about 
(need I remind you about our TLB faults?). Second, it's the trace path: if we 
care that much, we would just disable tracing. I think the benefits of the 
more general approach are well worth it.

-- 
Hollis Blanchard
IBM Linux Technology Center