#54 Hq2x patch

open
Sjoerd
None
5
2012-09-07
2004-05-19
`Moe`
No

This is a patch to add a Hq2x scaler to dosbox. See
http://www.hiend3d.com/hq2x.html for a generic description.

I've finished a first, experimental version of hq2x for dosbox. It
still has a fair number of limitations, and I need to do some
serious profiling to get the optimal cache-vs-cpu usage.
Moreover, it is only tested on GCC. The mmx code is disabled,
the mmx-less variant is (for now) faster.

Usage:

patch, recompile (with maximum optimizations!), set
scaler=hq2x. Use Ctrl-F3/Ctrl-F4 to turn the hq2x trigger value
up and down.

Limitation: Only works in 16 or 32 bit BGR mode (this includes
opengl, but not overlay - support for "everything" is on the todo).

Speed is roughly like advmame2x, but there are still some
sound-related issues (due to cache trashing, it seems).

I'm still profiling and experimenting, testing new variants and
everything, but it is 100% stable and looks so cool.

The patch is against dosbox CVS as of today (2004-05-19).
I will update the patch (mainly render_hq2x.*) as improvements
are made.

Discussion

1 2 3 > >> (Page 1 of 3)
  • Peter Veenstra
    Peter Veenstra
    2004-05-20

    Logged In: YES
    user_id=535630

    you need submit the diff again
    uploading files while submitting is broken.

     
  • Sjoerd
    Sjoerd
    2004-06-14

    Logged In: YES
    user_id=153968

    Needs an update for the current cvs.

     
  • `Moe`
    `Moe`
    2004-07-05

    Logged In: YES
    user_id=1045474

    I have updated the patch for today's CVS. The recent changes in the
    scaler code made life a lot easier for me.
    The hq2x code can now be considered true beta quality: It is supposed
    to work stable and correct in all cases, it just needs testing as I can
    only do surface rendering with 16 or 32 bpp.
    (I still left some gcc'isms inside, sorry. Non-GCC users need two lines:

    define attribute(x)

    define __builtin_expect(x,y) x

    )

    There is also a small proposed patch to ymf262.c inside. According to
    oprofile, it reduces CPU load of the opl2/3 code by about 50%. Stuff
    commented out with "#ifdef SMALL_CACHE" might or might not
    improve things on some CPUs, but I'm lacking definite numbers for
    now.

     
  • Sjoerd
    Sjoerd
    2004-07-05

    Logged In: YES
    user_id=153968

    Hmm is there still any need to have the mmx distance
    calculations stuff in, since we'll probably only ever need 8bpp
    input sources you can probably always do it faster with a
    distance lookup table.

     
  • `Moe`
    `Moe`
    2004-07-05

    Logged In: YES
    user_id=1045474

    Oops, m ade a few mistakes. The same patch again, minus one
    segfault, plus one "it actually compiles" and with added non-gcc
    workiness.

    I am keeping the MMX code, as the distance table is quite big. Since I
    blew up my p3 laptop shortly after getting it, I haven't yet found time to
    see if SSE's mmx can do better than the current code. The MMX code is
    close enough in runtime, so some of the newer unpacking instructions
    may pay off. The reduced cache trashing may well pay off (see my
    small ymf262.cpp patch, same principle).
    The mmx code is all #defined away at the moment, but I'd like it to stay
    in there until I get a better devel box and can decide on facts.

     
  • `Moe`
    `Moe`
    2004-07-05

    Logged In: YES
    user_id=1045474

    Oh, and I am pondering an adaptive distance calculation to catch even
    more edges. I have ideas how to solve that with the table, but it's quite
    possible MMX will win even more in that case. I fear my devel box is too
    slow for that, however, so don't hold your breath.

     
  • `Moe`
    `Moe`
    2004-07-06

    Logged In: YES
    user_id=1045474

    Another version, hopefully fixing the MS VS Net problem reported in
    the forums.

     
  • `Moe`
    `Moe`
    2004-07-07

    Logged In: YES
    user_id=1045474

    Another version. Now supports aspect ratio correction. Looks ugly, but
    some people seem to like that.

     
  • Sjoerd
    Sjoerd
    2004-07-10

    Logged In: YES
    user_id=153968

    Hmm the adlib patches might be nice too though, might also be
    able to use split each waveform into a table of 4 pointers and
    use the highest 2 bits from the frequency as an index there,
    and the pointers point to small pieces of the sine wave.

     
  • `Moe`
    `Moe`
    2004-08-04

    Logged In: YES
    user_id=1045474

    Here's the next version, for current CVS.

    New features: adaptive threshold calculation. It now scans the
    surroundings of the current pixel to find the maximum and minimum
    difference, and then sets the actual edge detect threshold to some
    average value (the exact averaging ratio is configurable). The old
    (static) threshold variant is also present, as both together give
    noticeably better results.

    See forums for user manual.

    It can be turned off via a #define, but the CPU cost is fairly low
    compared to the static algorithm. The mmx code is still inactive, but
    the user reports in the forum indicate it could have a big effect on
    newer CPUs. I hope to get at a bigger devel machine soon.

     
1 2 3 > >> (Page 1 of 3)