Howdy ho,
Apply from root dosbox directory "patch -p0 <
optimize.diff".
Optimizations to normal_loop, some pic routines, and a
little touch up to the mixer handler. ie. speedup
isn't cpu core specific.
Rearrangment for fallthrough to the most common paths
and ridding of jumps mainly.
Timed with rdtsc and just used variables to track
common paths.
Logged In: YES
user_id=1039189
There's something wrong with this patch after applying it to
the current DOSBox CVS source, i'm getting no input from my
keyboard in DOSBox.
Logged In: YES
user_id=1045474
Shouldn't any modern compiler do that sort of optimization automatically? (Profile guided optimization, at least the intel compiler and gcc have that)
For example, gcc's profiling optimization improves things 20-30% (measured by turning up cycles until problems appear). IMHO the better solution.
Logged In: YES
user_id=535630
well proper written code beats any optimalisation.
although this isn't exactly always true for profile optimization
(think of a a switch).
I've included some of the things you proposed. But I didn't
touch the main loop.
Futher I changed paging.h to use a similar construct in the
memory handler. (that is were the real time is spend!). so
memory acces of words and double words might be a bit
faster.
Didn't have the time to test it though.
Logged In: YES
user_id=1008467
Though I like my snazy lookup table:), this is a faster
mixer_clip:
static inline Bit16s MIXER_CLIP(Bits SAMP) {
if (SAMP>MIN_AUDIO) {
if (SAMP<MAX_AUDIO)
return SAMP;
else return MAX_AUDIO;
} else return MIN_AUDIO;
}
Bulk of data is in bounds so the common path of only two
compares and a direct jump(maybe a mov in there too, I need
to dump to assembly to see what gcc is doing with it) trumps
the lookup overhead.
I suppose this could also be written as a define again
though I believe static inlines are equivalent.
There was/is a bug in my patch for PIC_runIRQs in that I
forgot to change an || to an && in my editing but your
changes to CVS covers it making it moot(the PIC_Special_Mode
variable handles the condition now). In my testing I never
had the path taken for special mode so it's unlikely anyone
hit it anyway but I'd thought I'd mention it.
Oh, I too had optimized the memory functions but hadn't put
out a patch yet since there was such disinterest. Here's
some of my testing in the comments(percentage is of times
path taken):
INLINE Bit16u mem_readw_inline(PhysPt address) {
if (!(address & 1)) { // mri called ~128 million x/sec /w
8k cpu cycles
Bitu index=(address>>12); // get rid of conditional jump
for common path.
if (paging.tlb.read[index]) return
host_readw(paging.tlb.read[index]+address); // 99.2%
else return paging.tlb.handler[index]->readw(address); // .004
} else return mem_unalignedreadw(address); // .076 %
}
INLINE void mem_writew_inline(PhysPt address,Bit16u val) {
if (!(address & 1)) { // also called ~128 m/sec
Bitu index=(address>>12); // gprof says...
percentage of times x path taken:
if (paging.tlb.write[index])
{host_writew(paging.tlb.write[index]+address,val);return;}
// 71.5%
else
{paging.tlb.handler[index]->writew(address,val);return;}
// 28%
} else {mem_unalignedwritew(address,val);return;} // .5%
}
Logged In: YES
user_id=535630
As I'd like to mixing code to be as fast as possible I will
commit this code as well.
The special mode of the pic is used when you run windows.
that one uses it.
I decided to create one variable for it as it's set at only
one location and at one time. but tested many times.
Some testing by moe showed that our joined optimalisations
gave a 0-5 % speed increase.
Thank you for sharing the call information of the memory
functions with us.
Logged In: YES
user_id=1008467
I noticed you didn't include the optimization I made to the
write command. It gets called at the end of every interrupt
to signal EOI(ret command). Unoptimized it has around
three more compares and three more jumps. FYI, all other
paths either weren't taken or had only a handful of entries;
which is why I left them alone.
Logged In: YES
user_id=1008467
s/ret/iret/
Logged In: YES
user_id=535630
well I didn't get to it yet.
I will change that part of the code as well as some as the
E_Exits should be just warnings.
Logged In: YES
user_id=535630
hmm that new idea of that mixer stuff.
could post a full patch for that ?
(as initclip isn't needed anymore ?)
Logged In: YES
user_id=1008467
Correct; the init isn't needed by the new idea.
patch attached(mixer.diff).
Logged In: YES
user_id=1008467
A slightly better version:) max is checked first. replaced
mixer.diff.
Logged In: YES
user_id=1008467
Oh, I noticed your comment from the changes; it isn't
reversing back to the old way as it(the original define)
works like this:
The most common path being the last path taken.