This is a patch to add a Hq2x scaler to dosbox. See
http://www.hiend3d.com/hq2x.html for a generic description.
I've finished a first, experimental version of hq2x for dosbox. It
still has a fair number of limitations, and I need to do some
serious profiling to get the optimal cache-vs-cpu usage.
Moreover, it is only tested on GCC. The mmx code is disabled,
the mmx-less variant is (for now) faster.
Usage:
patch, recompile (with maximum optimizations!), set
scaler=hq2x. Use Ctrl-F3/Ctrl-F4 to turn the hq2x trigger value
up and down.
Limitation: Only works in 16 or 32 bit BGR mode (this includes
opengl, but not overlay - support for "everything" is on the todo).
Speed is roughly like advmame2x, but there are still some
sound-related issues (due to cache trashing, it seems).
I'm still profiling and experimenting, testing new variants and
everything, but it is 100% stable and looks so cool.
The patch is against dosbox CVS as of today (2004-05-19).
I will update the patch (mainly render_hq2x.*) as improvements
are made.
Logged In: YES
user_id=535630
you need submit the diff again
uploading files while submitting is broken.
Logged In: YES
user_id=153968
Needs an update for the current cvs.
Logged In: YES
user_id=1045474
I have updated the patch for today's CVS. The recent changes in the
scaler code made life a lot easier for me.
The hq2x code can now be considered true beta quality: It is supposed
to work stable and correct in all cases, it just needs testing as I can
only do surface rendering with 16 or 32 bpp.
(I still left some gcc'isms inside, sorry. Non-GCC users need two lines:
define attribute(x)
define __builtin_expect(x,y) x
)
There is also a small proposed patch to ymf262.c inside. According to
oprofile, it reduces CPU load of the opl2/3 code by about 50%. Stuff
commented out with "#ifdef SMALL_CACHE" might or might not
improve things on some CPUs, but I'm lacking definite numbers for
now.
Logged In: YES
user_id=153968
Hmm is there still any need to have the mmx distance
calculations stuff in, since we'll probably only ever need 8bpp
input sources you can probably always do it faster with a
distance lookup table.
Logged In: YES
user_id=1045474
Oops, m ade a few mistakes. The same patch again, minus one
segfault, plus one "it actually compiles" and with added non-gcc
workiness.
I am keeping the MMX code, as the distance table is quite big. Since I
blew up my p3 laptop shortly after getting it, I haven't yet found time to
see if SSE's mmx can do better than the current code. The MMX code is
close enough in runtime, so some of the newer unpacking instructions
may pay off. The reduced cache trashing may well pay off (see my
small ymf262.cpp patch, same principle).
The mmx code is all #defined away at the moment, but I'd like it to stay
in there until I get a better devel box and can decide on facts.
Logged In: YES
user_id=1045474
Oh, and I am pondering an adaptive distance calculation to catch even
more edges. I have ideas how to solve that with the table, but it's quite
possible MMX will win even more in that case. I fear my devel box is too
slow for that, however, so don't hold your breath.
Logged In: YES
user_id=1045474
Another version, hopefully fixing the MS VS Net problem reported in
the forums.
Logged In: YES
user_id=1045474
Another version. Now supports aspect ratio correction. Looks ugly, but
some people seem to like that.
Logged In: YES
user_id=153968
Hmm the adlib patches might be nice too though, might also be
able to use split each waveform into a table of 4 pointers and
use the highest 2 bits from the frequency as an index there,
and the pointers point to small pieces of the sine wave.
Logged In: YES
user_id=1045474
Here's the next version, for current CVS.
New features: adaptive threshold calculation. It now scans the
surroundings of the current pixel to find the maximum and minimum
difference, and then sets the actual edge detect threshold to some
average value (the exact averaging ratio is configurable). The old
(static) threshold variant is also present, as both together give
noticeably better results.
See forums for user manual.
It can be turned off via a #define, but the CPU cost is fairly low
compared to the static algorithm. The mmx code is still inactive, but
the user reports in the forum indicate it could have a big effect on
newer CPUs. I hope to get at a bigger devel machine soon.
Logged In: YES
user_id=1045474
Just a quick update: Looks like I am finally getting a decent box, so
expect some performance improvements for newer CPUs (mine will
probably be an athlon64) in the near future.
Also, the longer I use the current patch (and compare it to stock hq2x
as found in, e.g., scummvm), the more I like it. I'm quite confident that
it won't change anymore (feature-wise). Hq3x/4x may appear, if they
are noticeably better than just using 640x400 fullscreen - I'll give that a
shot.
Logged In: YES
user_id=1045474
So here it is, the latest variant.
This time, the patch includes 3 things at once: The already well-known
software Hq2x inplementation, the hardware OpenGL-HQ scaler and
16-bit VESA SVGA support.
I didn't break them up as they depend on each other.
Changes in Hq2x:
- it now follows the template-style of the other scalers, though it's still
in it's own file (two actually) in order to get the important
320-pixel-source-width optimization
- it has become a little bit slower due to using the more generic pixel
depth conversion; previous code has been better optimized but less
flexible
- the 32-bit interpolation optimization has been added to all render
templates, see comment in render_templates.h
- the GCC way of marking conditionals as quite unlikely has been
added for all scalers, correctly #ifdef'ed
- threshold values are shared with the OpenGL-HQ code
About OpenGL-HQ:
- all OpenGL code has been broken out of sdlmain.cpp and placed
into sdl_opengl.cpp
- opengl rendering is threaded as some OpenGL calls seem to block
the process until the hardware is done, which defies the purpose of
hardware acceleration
- see the comment in sdl_opengl.cpp for some general information
- IMPORTANT: the code NEEDS SDL 1.3 (which is the current CVS),
as it uses the new platform-independent render targets; if you don't
have SDL 1.3 installed, the OpenGL-HQ code is automatically
disabled; the traditional OpenGL modes continue to work, of course
- rendering is done in 3 passes; I've tried hard to reduce that to just 2
passes, but joining pass 1+2 is impossible on my hardware, and
joining 2+3 is slower that doing 3 passes, as pass 2 uses the source
resolution and pass 3 runs at the destination resolution.
About 16-bit VESA SVGA:
- I have absolutely no idea what I did. It's really just cut'n'paste with
some educated guessing.
- It works fine. I can finally play Schleichfahrt, even speed is
sort-of-acceptable.
- It is untested with anything else.
- OpenGL output will try to exploit hardware 16 bit support.
- adding SBPP=16 required small changes to the scaler templates
- I haven't added anything but normal scaler yet, I see no point in
others, as programs using 16bpp probably need a lot of performance
themselves
- if you own something >4GHz and think you could spare some cycles
for scaling, OpenGL-HQ is probably your best option
The code can be considered stable. I'm using it for quite a while and
didn't find any need for modifications.
If anything breaks, send patches.
Have fun!
Logged In: YES
user_id=1045474
Just an additional comment: You can apply the patch without SDL 1.3,
you will still get the updated (better-looking) Hq2x and the 16bit VESA
support, OpenGL-HQ will silently be disabled, even multithreaded
rendering won't happen.
Logged In: YES
user_id=1039189
hey moe,
On trying to build with the standard libSDL 1.2.8 that I've
been using it would get stuck because certain OPENGL
functions were not in 1.2.8 or are different in the 1.3.x
branch. So in order to build with OPENGL at all you must
have libSDL 1.3.x branch otherwise it fails building.
/Ieremiou
Logged In: YES
user_id=1045474
Thanks for the report. I'll go and debug it on windows soon, resolving the crash reported in the forums and the 1.2 problem.
Logged In: YES
user_id=1045474
Another month goes by, another version is out.
Changes:
I've also attached my win32 build including SDL-1.3 dll. Creating your own in mingw is trivial: Fetch the sources from SDL CVS (you have to use the branch option to get the 1.3 branch), optionally search for the directx fix in the forums, ./configure; make; make install.
Logged In: YES
user_id=1045474
sourceforge seems to have problems with file uploads. Get the files here:
http://garni.ch/~jwalt/dosbox-openglhq.diff [179k]
http://garni.ch/~jwalt/dosbox-openglhq-win32.zip [4.3M]
(sorry, seems like I forgot to strip the binary ;)
Logged In: YES
user_id=1045474
Uploads seem to work again, so here is the patch, in two different variants:
If you are a user who wants to compile his own full-featured version, use dosbox-fullhq.diff. It includes hq2x software-scaling, opengl-hq hardware-scaling and VESA 16bit support.
If you are a dosbox maintainer (hello qbix ;) ), you can use the three separate patches and integrate the ones that you already like.
As I've written before, the three patches overlap, so you can't apply all three without conflicts, that's why I provide the all-in-one patch. I will update the remaining patches if some of the code is included in CVS.
Today's patches already contain some cosmetic fixes and a possible fix for the VESA issue reported in the forums. They do no longer contain the adlib optimization, I will open a separate item for that.
Logged In: YES
user_id=1045474
Here are two new patches:
The OpenGL-HQ patch has officially been discontinued. The SDL version is much better (simpler, faster, looks the same). It now includes a patch to have it enabled via dosbox.conf. Get it at http://garni.ch/Software/dosbox/
hq2x simple version
VESA VBE 16-bit support