Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#71 10+% speed optimization.

open
None
5
2012-09-07
2005-02-22
Keith
No

Howdy ho,

Apply from root dosbox directory "patch -p0 <
optimize.diff".

Optimizations to normal_loop, some pic routines, and a
little touch up to the mixer handler. ie. speedup
isn't cpu core specific.

Rearrangment for fallthrough to the most common paths
and ridding of jumps mainly.

Timed with rdtsc and just used variables to track
common paths.

Discussion

1 2 > >> (Page 1 of 2)
  • Keith
    Keith
    2005-02-22

     
    Attachments
  • Allustar
    Allustar
    2005-02-24

    Logged In: YES
    user_id=1039189

    There's something wrong with this patch after applying it to
    the current DOSBox CVS source, i'm getting no input from my
    keyboard in DOSBox.

     
  • `Moe`
    `Moe`
    2005-03-13

    Logged In: YES
    user_id=1045474

    Shouldn't any modern compiler do that sort of optimization automatically? (Profile guided optimization, at least the intel compiler and gcc have that)
    For example, gcc's profiling optimization improves things 20-30% (measured by turning up cycles until problems appear). IMHO the better solution.

     
  • Peter Veenstra
    Peter Veenstra
    2005-03-26

    Logged In: YES
    user_id=535630

    well proper written code beats any optimalisation.
    although this isn't exactly always true for profile optimization
    (think of a a switch).

    I've included some of the things you proposed. But I didn't
    touch the main loop.
    Futher I changed paging.h to use a similar construct in the
    memory handler. (that is were the real time is spend!). so
    memory acces of words and double words might be a bit
    faster.
    Didn't have the time to test it though.

     
  • Keith
    Keith
    2005-03-31

    Logged In: YES
    user_id=1008467

    Though I like my snazy lookup table:), this is a faster
    mixer_clip:

    static inline Bit16s MIXER_CLIP(Bits SAMP) {
    if (SAMP>MIN_AUDIO) {
    if (SAMP<MAX_AUDIO)
    return SAMP;
    else return MAX_AUDIO;
    } else return MIN_AUDIO;
    }

    Bulk of data is in bounds so the common path of only two
    compares and a direct jump(maybe a mov in there too, I need
    to dump to assembly to see what gcc is doing with it) trumps
    the lookup overhead.

    I suppose this could also be written as a define again
    though I believe static inlines are equivalent.

    There was/is a bug in my patch for PIC_runIRQs in that I
    forgot to change an || to an && in my editing but your
    changes to CVS covers it making it moot(the PIC_Special_Mode
    variable handles the condition now). In my testing I never
    had the path taken for special mode so it's unlikely anyone
    hit it anyway but I'd thought I'd mention it.

    Oh, I too had optimized the memory functions but hadn't put
    out a patch yet since there was such disinterest. Here's
    some of my testing in the comments(percentage is of times
    path taken):

    INLINE Bit16u mem_readw_inline(PhysPt address) {
    if (!(address & 1)) { // mri called ~128 million x/sec /w
    8k cpu cycles
    Bitu index=(address>>12); // get rid of conditional jump
    for common path.
    if (paging.tlb.read[index]) return
    host_readw(paging.tlb.read[index]+address); // 99.2%
    else return paging.tlb.handler[index]->readw(address); // .004
    } else return mem_unalignedreadw(address); // .076 %
    }

    INLINE void mem_writew_inline(PhysPt address,Bit16u val) {
    if (!(address & 1)) { // also called ~128 m/sec
    Bitu index=(address>>12); // gprof says...
    percentage of times x path taken:
    if (paging.tlb.write[index])
    {host_writew(paging.tlb.write[index]+address,val);return;}
    // 71.5%
    else
    {paging.tlb.handler[index]->writew(address,val);return;}
    // 28%
    } else {mem_unalignedwritew(address,val);return;} // .5%
    }

     
  • Peter Veenstra
    Peter Veenstra
    2005-04-03

    Logged In: YES
    user_id=535630

    As I'd like to mixing code to be as fast as possible I will
    commit this code as well.

    The special mode of the pic is used when you run windows.
    that one uses it.
    I decided to create one variable for it as it's set at only
    one location and at one time. but tested many times.

    Some testing by moe showed that our joined optimalisations
    gave a 0-5 % speed increase.

    Thank you for sharing the call information of the memory
    functions with us.

     
  • Keith
    Keith
    2005-04-07

    Logged In: YES
    user_id=1008467

    I noticed you didn't include the optimization I made to the
    write command. It gets called at the end of every interrupt
    to signal EOI(ret command). Unoptimized it has around
    three more compares and three more jumps. FYI, all other
    paths either weren't taken or had only a handful of entries;
    which is why I left them alone.

     
  • Keith
    Keith
    2005-04-07

    Logged In: YES
    user_id=1008467

    s/ret/iret/

     
  • Peter Veenstra
    Peter Veenstra
    2005-04-15

    Logged In: YES
    user_id=535630

    well I didn't get to it yet.
    I will change that part of the code as well as some as the
    E_Exits should be just warnings.

     
  • Peter Veenstra
    Peter Veenstra
    2005-04-22

    Logged In: YES
    user_id=535630

    hmm that new idea of that mixer stuff.
    could post a full patch for that ?

    (as initclip isn't needed anymore ?)

     
1 2 > >> (Page 1 of 2)