#54 Hq2x patch


This is a patch to add a Hq2x scaler to dosbox. See
http://www.hiend3d.com/hq2x.html for a generic description.

I've finished a first, experimental version of hq2x for dosbox. It
still has a fair number of limitations, and I need to do some
serious profiling to get the optimal cache-vs-cpu usage.
Moreover, it is only tested on GCC. The mmx code is disabled,
the mmx-less variant is (for now) faster.


patch, recompile (with maximum optimizations!), set
scaler=hq2x. Use Ctrl-F3/Ctrl-F4 to turn the hq2x trigger value
up and down.

Limitation: Only works in 16 or 32 bit BGR mode (this includes
opengl, but not overlay - support for "everything" is on the todo).

Speed is roughly like advmame2x, but there are still some
sound-related issues (due to cache trashing, it seems).

I'm still profiling and experimenting, testing new variants and
everything, but it is 100% stable and looks so cool.

The patch is against dosbox CVS as of today (2004-05-19).
I will update the patch (mainly render_hq2x.*) as improvements
are made.


  • Peter Veenstra
    Peter Veenstra

    Logged In: YES

    you need submit the diff again
    uploading files while submitting is broken.

  • Sjoerd

    Logged In: YES

    Needs an update for the current cvs.

  • `Moe`

    Logged In: YES

    I have updated the patch for today's CVS. The recent changes in the
    scaler code made life a lot easier for me.
    The hq2x code can now be considered true beta quality: It is supposed
    to work stable and correct in all cases, it just needs testing as I can
    only do surface rendering with 16 or 32 bpp.
    (I still left some gcc'isms inside, sorry. Non-GCC users need two lines:

    define attribute(x)

    define __builtin_expect(x,y) x


    There is also a small proposed patch to ymf262.c inside. According to
    oprofile, it reduces CPU load of the opl2/3 code by about 50%. Stuff
    commented out with "#ifdef SMALL_CACHE" might or might not
    improve things on some CPUs, but I'm lacking definite numbers for

  • Sjoerd

    Logged In: YES

    Hmm is there still any need to have the mmx distance
    calculations stuff in, since we'll probably only ever need 8bpp
    input sources you can probably always do it faster with a
    distance lookup table.

  • `Moe`

    Logged In: YES

    Oops, m ade a few mistakes. The same patch again, minus one
    segfault, plus one "it actually compiles" and with added non-gcc

    I am keeping the MMX code, as the distance table is quite big. Since I
    blew up my p3 laptop shortly after getting it, I haven't yet found time to
    see if SSE's mmx can do better than the current code. The MMX code is
    close enough in runtime, so some of the newer unpacking instructions
    may pay off. The reduced cache trashing may well pay off (see my
    small ymf262.cpp patch, same principle).
    The mmx code is all #defined away at the moment, but I'd like it to stay
    in there until I get a better devel box and can decide on facts.

  • `Moe`

    Logged In: YES

    Oh, and I am pondering an adaptive distance calculation to catch even
    more edges. I have ideas how to solve that with the table, but it's quite
    possible MMX will win even more in that case. I fear my devel box is too
    slow for that, however, so don't hold your breath.

  • `Moe`

    Logged In: YES

    Another version, hopefully fixing the MS VS Net problem reported in
    the forums.

  • `Moe`

    Logged In: YES

    Another version. Now supports aspect ratio correction. Looks ugly, but
    some people seem to like that.

  • Sjoerd

    Logged In: YES

    Hmm the adlib patches might be nice too though, might also be
    able to use split each waveform into a table of 4 pointers and
    use the highest 2 bits from the frequency as an index there,
    and the pointers point to small pieces of the sine wave.

  • `Moe`

    Logged In: YES

    Here's the next version, for current CVS.

    New features: adaptive threshold calculation. It now scans the
    surroundings of the current pixel to find the maximum and minimum
    difference, and then sets the actual edge detect threshold to some
    average value (the exact averaging ratio is configurable). The old
    (static) threshold variant is also present, as both together give
    noticeably better results.

    See forums for user manual.

    It can be turned off via a #define, but the CPU cost is fairly low
    compared to the static algorithm. The mmx code is still inactive, but
    the user reports in the forum indicate it could have a big effect on
    newer CPUs. I hope to get at a bigger devel machine soon.

  • `Moe`

    Logged In: YES

    Just a quick update: Looks like I am finally getting a decent box, so
    expect some performance improvements for newer CPUs (mine will
    probably be an athlon64) in the near future.

    Also, the longer I use the current patch (and compare it to stock hq2x
    as found in, e.g., scummvm), the more I like it. I'm quite confident that
    it won't change anymore (feature-wise). Hq3x/4x may appear, if they
    are noticeably better than just using 640x400 fullscreen - I'll give that a

  • `Moe`

    Logged In: YES

    So here it is, the latest variant.

    This time, the patch includes 3 things at once: The already well-known
    software Hq2x inplementation, the hardware OpenGL-HQ scaler and
    16-bit VESA SVGA support.

    I didn't break them up as they depend on each other.

    Changes in Hq2x:
    - it now follows the template-style of the other scalers, though it's still
    in it's own file (two actually) in order to get the important
    320-pixel-source-width optimization
    - it has become a little bit slower due to using the more generic pixel
    depth conversion; previous code has been better optimized but less
    - the 32-bit interpolation optimization has been added to all render
    templates, see comment in render_templates.h
    - the GCC way of marking conditionals as quite unlikely has been
    added for all scalers, correctly #ifdef'ed
    - threshold values are shared with the OpenGL-HQ code

    About OpenGL-HQ:
    - all OpenGL code has been broken out of sdlmain.cpp and placed
    into sdl_opengl.cpp
    - opengl rendering is threaded as some OpenGL calls seem to block
    the process until the hardware is done, which defies the purpose of
    hardware acceleration
    - see the comment in sdl_opengl.cpp for some general information
    - IMPORTANT: the code NEEDS SDL 1.3 (which is the current CVS),
    as it uses the new platform-independent render targets; if you don't
    have SDL 1.3 installed, the OpenGL-HQ code is automatically
    disabled; the traditional OpenGL modes continue to work, of course
    - rendering is done in 3 passes; I've tried hard to reduce that to just 2
    passes, but joining pass 1+2 is impossible on my hardware, and
    joining 2+3 is slower that doing 3 passes, as pass 2 uses the source
    resolution and pass 3 runs at the destination resolution.

    About 16-bit VESA SVGA:
    - I have absolutely no idea what I did. It's really just cut'n'paste with
    some educated guessing.
    - It works fine. I can finally play Schleichfahrt, even speed is
    - It is untested with anything else.
    - OpenGL output will try to exploit hardware 16 bit support.
    - adding SBPP=16 required small changes to the scaler templates
    - I haven't added anything but normal scaler yet, I see no point in
    others, as programs using 16bpp probably need a lot of performance
    - if you own something >4GHz and think you could spare some cycles
    for scaling, OpenGL-HQ is probably your best option

    The code can be considered stable. I'm using it for quite a while and
    didn't find any need for modifications.

    If anything breaks, send patches.

    Have fun!

  • `Moe`

    Logged In: YES

    Just an additional comment: You can apply the patch without SDL 1.3,
    you will still get the updated (better-looking) Hq2x and the 16bit VESA
    support, OpenGL-HQ will silently be disabled, even multithreaded
    rendering won't happen.

  • Allustar

    Logged In: YES

    hey moe,

    On trying to build with the standard libSDL 1.2.8 that I've
    been using it would get stuck because certain OPENGL
    functions were not in 1.2.8 or are different in the 1.3.x
    branch. So in order to build with OPENGL at all you must
    have libSDL 1.3.x branch otherwise it fails building.


  • `Moe`

    Logged In: YES

    Thanks for the report. I'll go and debug it on windows soon, resolving the crash reported in the forums and the 1.2 problem.

  • `Moe`

    Logged In: YES

    Another month goes by, another version is out.


    • applies to latest CVS
    • compiles and runs on windows
    • some minor bugs fixed
    • moved all GUI calls into render thread for openglhq (windows is less forgiving than X11)
    • compiles with SDL-1.2 (no openglhq, of course)
    • VESA fix and extension by wd
    • still untested on nvidia (the whole world around me seems to have ATI...)

    I've also attached my win32 build including SDL-1.3 dll. Creating your own in mingw is trivial: Fetch the sources from SDL CVS (you have to use the branch option to get the 1.3 branch), optionally search for the directx fix in the forums, ./configure; make; make install.

  • `Moe`

    Logged In: YES

    Uploads seem to work again, so here is the patch, in two different variants:

    If you are a user who wants to compile his own full-featured version, use dosbox-fullhq.diff. It includes hq2x software-scaling, opengl-hq hardware-scaling and VESA 16bit support.

    If you are a dosbox maintainer (hello qbix ;) ), you can use the three separate patches and integrate the ones that you already like.

    As I've written before, the three patches overlap, so you can't apply all three without conflicts, that's why I provide the all-in-one patch. I will update the remaining patches if some of the code is included in CVS.

    Today's patches already contain some cosmetic fixes and a possible fix for the VESA issue reported in the forums. They do no longer contain the adlib optimization, I will open a separate item for that.

  • `Moe`

    Logged In: YES

    Here are two new patches:

    • hq2x in a much simpler variant (the actual output has not changed), better suited for inclusion in CVS; for example, experience with the current trigger code has shown that key bindings or config file settings are not needed anymore;
    • VESA 16bit in a slightly updated variant from forum feedback

    The OpenGL-HQ patch has officially been discontinued. The SDL version is much better (simpler, faster, looks the same). It now includes a patch to have it enabled via dosbox.conf. Get it at http://garni.ch/Software/dosbox/

  • `Moe`

    hq2x simple version

  • `Moe`

    VESA VBE 16-bit support