#127 xscpu64 - horrible framerate in Metal Dust afer 2.4.1

general
closed
nobody
None
2013-06-16
2013-01-01
No

After upgrading to 2.4.1, I get bad framerate in-game in Metal Dust, I suspect this is because of the change-over to "sc" vic-ii yesterday's build - works perfect. Perhaps the only way to solve this is to make a separate non-sc version of xscpu64.

Discussion

  • Soci/Singular

    Soci/Singular - 2013-01-01

    I have to admit I'm not a great fan of x64sc, mostly because of it's performance problems and that the difference is very minor for the C64 emulation.

    However the badline handling hacks and the fact that the complete VICII emulation is partly 1 cycle off just because the 6502 CPU core implementation was 1 cycle off gave me real headaches. To make this work the rest of the system like CIA and SID emulation is hacked around to compensate for this 1 cycle difference in the x64 cpu core... Took some days to find it out ;)

    To cut it short there was no other reasonable complex way to get an accurate emulation without going with VICIIsc. Previous builds were good to run some non-timing intensive stuff, but were very inaccurate unless the SCPU was switched back to 1 MHz. Things like "badlines" while running from SRAM at 20 MHz is one example where it was wrong.

    Running at 20 MHz makes it possible to use every single 1 MHz cycle to do something with the VICII, which made me realise that to fix all the bugs comming up now (like this 1 cycle 985/1.02 MHz shift, which is not trivial to handle with 20 MHz cycles) is almost an equivalent of a complete rewrite.

    The design of the single cycle VICII emulation is not bad, and I like it's simplicity. But the major problem with it is that it's called every single 1 MHz cycle once to draw 8 pixels. To make this worse as far as I know it does not even have a video cache so it draw everything every time.

    There's no practical option to go with the old VICII core and have an accurate 20 MHz emulation. Because I also hate the current performance something must be done of course.

    Sometime later probably I have to hack around in VICIIsc too to make it at least similar to the drive emulation. Thereby I mean to collect VICII cycles together whenever possible and not execute them that much intermixed with the main cpu emulation. This could give some performance improvement. And having a simple cache would help as well.

    Unfortunately my holidays are over now and free time is short again ;-( I'll still have to fix some very minor timing problems in the SCPU emulation and finish some other projects before I could look on VICIIsc at all. I hope someone else could have a look earlier, but so far I didn't see much optimization in that area ;-( I would not mind at all if someone implements the ideas above ;-)

    All in all there's no easy fix for this performance problem. Maybe sometime later.

     
  • Soci/Singular

    Soci/Singular - 2013-01-01
    • status: open --> open-accepted
     
  • derrick inksley

    derrick inksley - 2013-01-01

    soci: thanks for the explanation. I _very much_ appreciate everything you have done so far, and, of course, anything you can do to help with the optimization of viciisc. I wish there were some way I could help out, but my skills are rather limited :(

     
  • Anonymous - 2013-01-02

    > The design of the single cycle VICII emulation is not bad, and I like it's
    > simplicity. But the major problem with it is that it's called every single
    > 1 MHz cycle once to draw 8 pixels.

    > Thereby I mean to collect VICII
    > cycles together whenever possible and not execute them that much intermixed
    > with the main cpu emulation.

    Before starting to hack on this, please consider how emulate sprite collision interrupts in a (1 MHz) cycle exact manner without cycle-by-cycle drawing. Then add the corner cases of the sprite bug area, Krestage 3 9th sprite, etc...

    I doubt much of the current simplicity would survive such optimization, but I'd be happy to be proven wrong. I'll claim the plans are rather premature while emulation bugs are still open [1][2].

    > I have to admit I'm not a great fan of x64sc, mostly because of it's
    > performance problems and that the difference is very minor for the C64
    > emulation.

    "Those who sacrifice accuracy for performance deserve neither" ;)

    [1] https://sourceforge.net/tracker/?func=detail&aid=3325466&group_id=223021&atid=1057617
    [2] https://sourceforge.net/tracker/?func=detail&aid=3325426&group_id=223021&atid=1057617

     
  • Daniel Kahlin

    Daniel Kahlin - 2013-01-02

    I agree with nojoopa. The whole point with x64sc/viciisc is that the simplicity of the implementation gives accuracy at the cost of performance. The simplicity in this case equals modelling the hardware piece by piece in the way it actually behaves. This also eases maintenance and implementation of new discoveries.

    In x64 there are rather complex special cases to handle many implementation details. You get much less inherent accuracy but gain performance. Maintenance and integration of new features gets much harder.

     
  • gpz

    gpz - 2013-01-02

    somehow that reminds me of why i wasnt really happy about the changes in the cartridge system...

    you have already said it yourself though, there is no reasonable way to get a really accurate emulation other than doing what x64sc does. personally my vote even goes for moving all emulators to the new architecture and completely killing the old ones, just to get rid of the mess and maintenance nightmare. just looking into the CIA and the magic required to make x64 work makes my head explode =P

     
  • Soci/Singular

    Soci/Singular - 2013-01-04

    Hmm. I think you've imagined that I'll go and plant in heaps of special cases which turns VICIIsc into an unmaintainable forest of ifs/elses which can only be fixed by a clearcut. No, actually not.

    What I planned is to have some heuristics in scpu64 to determine when it's safe to run bunch of VICIIsc cycles together independent of the main CPU. Or better when it's safe to skip a few cycles completely and just update the state to perform the time warp before calling it again.

    After all not all screens are built up by using special effects. And those which are would be still run cycle-by-cycle inline in the main CPU to have as much accuracy as possible.

    Speed and accuracy are not always either/or, only when we talk about the special cases, but then I choose the latter one of course.

     
  • Anonymous - 2013-01-05

    > Hmm. I think you've imagined that I'll go and plant in heaps of
    > special cases which turns VICIIsc into an unmaintainable forest
    > of ifs/elses which can only be fixed by a clearcut. No, actually not.

    Correctly guessed. Good to hear it's not the case. Please consider these rants as cautionary pointers on some of the pitfalls in such optimization plans.

    > What I planned is to have some heuristics in scpu64 to determine when
    > it's safe to run bunch of VICIIsc cycles together independent of the main CPU.

    The short answer (for regular C64): only when the CPU is halted. In these cases the (x64sc) CPU core calls vicii_steal_cycles.

    For the non-halted case, there are multiple reasons why the not-cycle based plan fails. The most obvious reason is that the VIC-II can cause interrupts at practically unpredictable times (the sprite collisions mentioned before, but also lightpen interrupt retriggering at frame boundary). A less obvious reason are fetches performed by the VIC-II chip (see VICII/gfxfetch/ in the SVN testprog repo).

    x64 is a fine example on the "run a bunch of cycles at once" approach. It's fast and surprisingly good, but some special cases break it. Clearly there's some compromise in between, but I doubt cycle-by-cycle can be relaxed without degrading accuracy.

    (Note that xvic, which shares the CPU core with x64sc, does cycle-by-cycle calls but does the drawing only at the end of each line. The lack of interrupts and sprites makes this possible without accuracy penalties. I assume the performance is good enough due to lack of complaints.)

    I can't imagine how to construct such heuristics without hacking src/viciisc/ or inspecting a nontrivial amount of its internal state elsewhere.

    > Or better when it's safe to skip a few cycles completely and just
    > update the state to perform the time warp before calling it again.

    IMHO the removal of clock rewinding was one of the better x64sc accomplishments.

    > After all not all screens are built up by using special effects.
    > And those which are would be still run cycle-by-cycle inline in the
    > main CPU to have as much accuracy as possible.

    I claim that the heuristics to make the distinction properly are complicated enough that they end up looking like src/vicii/.

    > Speed and accuracy are not always either/or, only when we talk
    > about the special cases, but then I choose the latter one of course.

    Yes, but maintainability should be considered. Video cache may seem like a low hanging fruit, but compare how it's used in x64 & xvic and consider the sprite special cases mentioned before and the way *-draw.c operate.

    As for the "bug" itself: keep the .exe that was fast enough and rename it to "xmetaldust.exe" and 95% of the users are happy ;)

     
  • gpz

    gpz - 2013-01-13

    i am moving this to feature requests, since the described problem is pretty much the logical consequence of a design decision and not an actual bug

     
  • gpz

    gpz - 2013-06-16
    • status: open-accepted --> closed
    • Group: --> general
     
  • gpz

    gpz - 2013-06-16

    .... that said, i am closing it now. "please make it faster" isnt even really a feature request - its implied anyway =)

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks