You can subscribe to this list here.
2002 |
Jan
(13) |
Feb
|
Mar
(18) |
Apr
(26) |
May
(8) |
Jun
(2) |
Jul
|
Aug
|
Sep
(8) |
Oct
(1) |
Nov
(11) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(8) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
(2) |
Oct
(13) |
Nov
|
Dec
(8) |
2004 |
Jan
|
Feb
|
Mar
(4) |
Apr
(4) |
May
(25) |
Jun
(4) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(40) |
Sep
|
Oct
(1) |
Nov
(9) |
Dec
|
2006 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(11) |
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Aja H. <aja...@gm...> - 2014-02-12 11:26:44
|
Hi, We are still working on our machine learning algorithms using Stella. Currently in my project on Montezuma's Revenge I need to know which bits of RAM correspond to taking the key, passing the door, etc. Is it possible to interpret the 2014 RAM bits on Montezuma's Revenge? Thanks, Aja Huang |
From: Aja H. <aja...@gm...> - 2013-06-04 17:22:13
|
Sorry, I was too quick to report it. The caching works with pong and breakout but breaks some other games. This might be my last effort to speed up the emulator, since it is getting quite difficult. Adding up all our improvements we got 46% speed-up overall. We will certainly let you know if we successfully update to a new version of Stella. Thank you for the helpful responses. Kind regards, Aja Huang 2013/6/4 Aja Huang <aja...@gm...> > I got an interesting improvement: Caching memory access for TIA::peek and > TIA::poke would work. > I did this > > class TIA > { > .. > *private:* > * uInt8 memo[65536]; // Initialized to {0}.* > }; > > uInt8 TIA::peek(uInt16 addr) > > { > > // Update frame to current color clock before we look at anything! > > updateFrame(mySystem->cycles() * 3); > > > *// If the value has been cached, return it immediately.* > > * if (memo[addr] != 0)* > > * return memo[addr];* > > .. > > } > > > void TIA::poke(uInt16 addr, uInt8 value) > > { > > *// Caching the value in 'addr'.* > > * memo[addr] = value;* > > > > addr = addr & 0x003f; > .. > } > > > I let it run with pong and breakout, the values in the cache were always > consistent. It is a 2% gain in speed, around 1 rollouts / sec faster. > > Kind regards, > Aja Huang > |
From: Aja H. <aja...@gm...> - 2013-06-04 16:26:04
|
I got an interesting improvement: Caching memory access for TIA::peek and TIA::poke would work. I did this class TIA { .. *private:* * uInt8 memo[65536]; // Initialized to {0}.* }; uInt8 TIA::peek(uInt16 addr) { // Update frame to current color clock before we look at anything! updateFrame(mySystem->cycles() * 3); *// If the value has been cached, return it immediately.* * if (memo[addr] != 0)* * return memo[addr];* .. } void TIA::poke(uInt16 addr, uInt8 value) { *// Caching the value in 'addr'.* * memo[addr] = value;* addr = addr & 0x003f; .. } I let it run with pong and breakout, the values in the cache were always consistent. It is a 2% gain in speed, around 1 rollouts / sec faster. Kind regards, Aja Huang |
From: Aja H. <aja...@gm...> - 2013-06-04 13:13:17
|
2013/6/4 Stephen Anthony <sa6...@gm...> > This code has been completely changed in the newer versions of Stella, so > I can't offer any opinion on how changes will affect it. I can say that > the reason it was removed was (a) it made changes more complex, and (b), it > didn't work in all cases. The new code is probably a little slower than > the old, but is actually correct and easier to follow. > > Also, wrt your last email on changing the sizes of various TIA lookup > tables, that code is also due to be replaced over the next few releases. > Basically, the TIA class has gone through (and continues to go through) a > massive overhaul. > Thanks. I will take a look at the current code and we will consider to upgrade to a newer version of Stella at some point. Kind regards, Aja Huang |
From: Stephen A. <sa6...@gm...> - 2013-06-04 12:29:38
|
On 04/06/13 08:46 AM, Aja Huang wrote: > Hi Stephen, > > I have one idea and wonder what's your opinion. In > TIA::updateFrameScanline this code updates the background > > // Background > > case 0x00: > > case 0x00 | ScoreBit: > > case 0x00 | PriorityBit: > > case0x00 | PriorityBit| ScoreBit: > > { > > memset(myFramePointer, myCOLUBK, clocksToUpdate); > > break; > > } > > > I found it basically does memset in a chunk of memory at different > locations, i.e. > > memset from address 0 to 10, size 10 > memset from address 11 to 15, size 5 > memset from address 16 to 25, size 10 > > Is it possible to copy the whole memory for just one time? For example, > > memset from address 0 to 25, size 25 This code has been completely changed in the newer versions of Stella, so I can't offer any opinion on how changes will affect it. I can say that the reason it was removed was (a) it made changes more complex, and (b), it didn't work in all cases. The new code is probably a little slower than the old, but is actually correct and easier to follow. Also, wrt your last email on changing the sizes of various TIA lookup tables, that code is also due to be replaced over the next few releases. Basically, the TIA class has gone through (and continues to go through) a massive overhaul. Thanks, Steve A. Stella maintainer |
From: Aja H. <aja...@gm...> - 2013-06-04 11:16:53
|
Hi Stephen, I have one idea and wonder what's your opinion. In TIA::updateFrameScanline this code updates the background // Background case 0x00: case 0x00 | ScoreBit: case 0x00 | PriorityBit: case 0x00 | PriorityBit | ScoreBit: { memset(myFramePointer, myCOLUBK, clocksToUpdate); break; } I found it basically does memset in a chunk of memory at different locations, i.e. memset from address 0 to 10, size 10 memset from address 11 to 15, size 5 memset from address 16 to 25, size 10 Is it possible to copy the whole memory for just one time? For example, memset from address 0 to 25, size 25 Kind regards, Aja Huang |
From: Aja H. <aja...@gm...> - 2013-05-31 16:20:23
|
Hi Stephen, I got another 4% speed-up (2 simulations / sec faster) by a simple change: make all the table sizes of TAI as power of 2, i.e. ourBallMaskTable[4][4][320] => ourBallMaskTable[4][4][512] ourDisabledMaskTable[640] => ourDisabledMaskTable[1024]; ourPlayerPositionResetWhenTable[8][160][160] => ourPlayerPositionResetWhenTable[8][256][256] .... Maybe you would be interested to do it in the current code. Kind regards, Aja Huang |
From: Aja H. <aja...@gm...> - 2013-05-31 14:21:33
|
> > You can remove the entire CartridgeAR class if you won't be using it (and > all other parts of the code that refer to it). So you'd either need to > edit Cart.cxx and remove references, or more easily, simply stub out the > methods in CartridgeAR. This is needed for Supercharger ROMs, but if you > won't be running them, then it's not a big deal to remove it. > Thanks for the advice. I got a few simulations / sec speed-up from removing them. Today I tried another idea and measured 1.5% speed up (0.7 simulation / sec faster): use a look-up table in M6502::PS. The code inline void M6502::PS(uInt8 ps) { N = ps & 0x80; V = ps & 0x40; B = true; // B = ps & 0x10; The 6507's B flag always true D = ps & 0x08; I = ps & 0x04; notZ = !(ps & 0x02); C = ps & 0x01; } was changed to inline void M6502::PS(uInt8 ps) { *PSPointer = PSLockupTable[ps]; } Where PSPointer is of type uint64_t* pointing to N, PSLockupTable[] precomputes the values of the flags N to C according to ps. It works by using the fact that member data in a class are contiguous in memory. To make it 8 bytes starting from N, I added a variable 'Unused' for padding. bool N; // N flag for processor status register bool V; // V flag for processor status register bool B; // B flag for processor status register bool D; // D flag for processor status register bool I; // I flag for processor status register bool notZ; // Z flag complement for processor status register bool C; // C flag for processor status register bool Unused; // *Make it 8-bytes(uInt64_t) starting from N.* Kind regards, Aja Huang |
From: Stephen A. <sa6...@gm...> - 2013-05-30 11:13:37
|
On May 30, 2013 8:22:17 AM Aja Huang wrote: > They maintain two variables: myNumberOfDistinctAccesses and > myLastAddress. These two variables are required only for the class > CartridgeAR, but in pong CartridgeAR > is never called. Removing myNumberOfDistinctAccesses and myLastAddress > in M6502High::peek and M6502High::poke is a solid gain of 5 > simulations / sec. Now I wonder can I remove > myNumberOfDistinctAccesses and myLastAddress, even CartridgeAR? You can remove the entire CartridgeAR class if you won't be using it (and all other parts of the code that refer to it). So you'd either need to edit Cart.cxx and remove references, or more easily, simply stub out the methods in CartridgeAR. This is needed for Supercharger ROMs, but if you won't be running them, then it's not a big deal to remove it. All that being said, the code should really be refactored a little so that stuff that only CartridgeAR needs is placed in that class. There's no reason it should slow down emulation for other ROM types. I'll look into this for a future release. > 2. The class Event is not inherited by any other class, so I removed > 'virtual' and inlined Event::get and Event set. It is a very small gain > since their run time is less than 0.1%. The current code already does this, since Event is now Event.hxx only; the cxx file has been removed. > I'm thinking another > possibility: caching memory access for System::peek and System::poke. > > System::peek can call TIA::peek or M6532::peek. > System::poke can call TIA::poke or M6532::poke. > > I tried to use a data structure memo[65536] to cache the memory access > inside System::peek and System::poke but the values are not consistent. > I'm wondering what's your opinion. This is probably more complicated than it first looks. System peek and poke can query *any* of the underlying devices, not just TIA and RIOT. So that means also the Cart address space. Also, when bankswitching comes into effect, the address space can change dynamically (ie, parts that you thought were read-only are now read-write, etc). Tracking all this, and marking things as dirty at the right time will probably get very complicated. Steve A. Stella maintainer |
From: Aja H. <aja...@gm...> - 2013-05-30 10:52:24
|
Hi Stephen, I found a possible improvement and I'm wondering what's your opinion. M6502High::peek and M6502High::poke are the most frequently executed (inline) functions in the whole emulator according to the profiler. They maintain two variables: myNumberOfDistinctAccesses and myLastAddress. These two variables are required only for the class CartridgeAR, but in pong CartridgeAR is never called. Removing myNumberOfDistinctAccesses and myLastAddress in M6502High::peek and M6502High::poke is a solid gain of 5 simulations / sec. Now I wonder can I remove myNumberOfDistinctAccesses and myLastAddress, even CartridgeAR? Here are some other results and a question: 1. Removing 'inline' in front of M6502High::poke and M6502High::peek made it about 1-2 simulations / sec slower. 2. The class Event is not inherited by any other class, so I removed 'virtual' and inlined Event::get and Event set. It is a very small gain since their run time is less than 0.1%. 3. Both two M6502::PS were inlined. With 2. and 3. plus disabling the audio instructions, it is now at 76-77 simulations / sec, around 2.5% speed up. I'm thinking another possibility: caching memory access for System::peek and System::poke. System::peek can call TIA::peek or M6532::peek. System::poke can call TIA::poke or M6532::poke. I tried to use a data structure memo[65536] to cache the memory access inside System::peek and System::poke but the values are not consistent. I'm wondering what's your opinion. Kind regards, Aja Huang |
From: Aja H. <aja...@gm...> - 2013-05-29 14:56:45
|
2013/5/29 Stephen Anthony <sa6...@gm...> > > One thing I forgot is, do you need the game to still function correctly > wrt collision detection? I assume yes, since otherwise the game won't be > playable. The problem is, the collision detection is based on whether > colours differ in the LUT, and as such, we can't skip generating the LUT. > And if that's the case, then I can't get any more performance. So we need > the drawing (for collisions), and we need the timing (for synchronization > with the CPU). So there may not be anything I can remove. Yes, we need collision detection. That's unfortunately a bad news to us. If you have any idea to make TIA faster please let us know. True, but can you test placing inline for the System:: methods in the > source file and confirm whether or not it speeds things up? I just tried it. As expected, inlining System:: methods in the source file won't link, because they are called from other source files as well. Inlining M6502::peek and M6502::poke in the source file would link because they are called within M6502Hi.cxx only. TBH, I don't think this was ever really an option. Stella has been > optimized over the years, and while there are small %'s here and there to > gain, getting 10-20x speedups probably won't be possible. Thanks. That's a clear message to us. I think the speedups you found are good. 25% here and there is very good, > considering the project is 15+ years old and has been optimized a lot over > the years. > Glad that we could help. We will certainly let you know our improvements. Kind regards, Aja Huang |
From: Stephen A. <sa6...@gm...> - 2013-05-29 13:48:38
|
On May 29, 2013 10:55:47 AM Aja Huang wrote: > We really appreciate your help. Please find our TIA.cxx in the > attachment. I have disabled the audio instructions 0x15 to 0x1A by the > macro SOUND_SUPPORT. OK, so this is a much older version of TIA, and I will need to look at it a little to remember how things were done. I will get back to you later this evening with it. But one major issue below ... One thing I forgot is, do you need the game to still function correctly wrt collision detection? I assume yes, since otherwise the game won't be playable. The problem is, the collision detection is based on whether colours differ in the LUT, and as such, we can't skip generating the LUT. And if that's the case, then I can't get any more performance. So we need the drawing (for collisions), and we need the timing (for synchronization with the CPU). So there may not be anything I can remove. > Inlining functions in header files is a common C and C++ practice. > Inlining in cxx files might have a different meaning even no effect > depending on the compiler. In my compiler (Apple LLVM 4.1) placing > 'inline' in cxx has no effect at all. True, but can you test placing inline for the System:: methods in the source file and confirm whether or not it speeds things up? > I have tried to inline M6502::peek and M6502::poke to header but didn't > get any speed up. Hmm, this seems to contradict the previous observation, and that inlining does indeed work in the source file. > Yes that is what I meant. So looks like it's not possible to > re-organize instructions. Now I think speeding up the emulator in a > manner like 1000% is really not possible. TBH, I don't think this was ever really an option. Stella has been optimized over the years, and while there are small %'s here and there to gain, getting 10-20x speedups probably won't be possible. > We will try to make small > improvements such as replacing 3 and 4-dimenional arrays and avoiding > memory copy. I'd be interested in feedback on this, both the code you change and the effects. I try to avoid memory copies whenever possible, and use pointers (and more recently, references) whenever I can. > We have tried to use M6502Low already. It was about 20-25% speed up, > not huge but definitely help. Thanks for the advice. I think the speedups you found are good. 25% here and there is very good, considering the project is 15+ years old and has been optimized a lot over the years. I suspect that any further huge improvements would come from a faster 6502 emulation, perhaps by moving to assembly language. But this is not something that I will have time to work on. Steve A. Stella maintainer |
From: Stephen A. <sa6...@gm...> - 2013-05-29 12:06:56
|
On May 29, 2013 8:59:15 AM Aja Huang wrote: > Inlining System::peak() and System::poke() is clearly a 25% gain in > speed. In pong I got 75 simulations / sec after inlining them by > moving the code into System's class definition. I wonder, would simply placing 'inline' in front of the current method definition work, while leaving the actual code in the cxx file? There is a similar optimization in M6502::peek and M6502::poke already. > I understand that in atari the information of frames may be required > for the game logic, but is it possible to ignore all the frame-drawing > stuff without breaking a game's progress? The current rendering is actually completely abstracted. The TIA runs the logic and creates a 160x300 colour lookup table. The various FrameBuffer classes then take that LUT and render it to the screen, using either software or OpenGL mode. My previous suggestion is to simply ignore the second step, and not output anything to the screen. At the TIA level, no optimization is done to check whether something has changed or not, and whether instructions can be skipped because of it. The 2600 is very weird (compared to other systems) in that the 6507 and TIA are tightly integrated. The TIA can actually interrupt the CPU, and timing is crucial. So I'm not sure how you'd modify the TIA class to 'skip' certain things. In fact, this is also why there's no frameskip ability in Stella; skipping a frame can actually cause the emulation to fail. > Let me use an example to describe my point. Suppose we have a frame F > of pong. The next frame is F'. Suppose the difference between F' and F > is only that the ball moves exactly one pixel from the original > position. Is it possible to re-draw only the two pixels, i.e. clean > the ball's pixel at F and draw the new pixel at F'? I understand what you're saying, but this isn't the way the TIA works and hence it wasn't modelled in this fashion. The TIA has no knowledge of what came before it; it is entirely scanline based. The 'drawing' you speak of is simply an update at the appropriate place in the LUT. And not doing that update wouldn't save any measurable time. I suspect what you really mean is that once you (somehow) know that all scanlines past a certain point will produce no graphical changes, then we simply skip them entirely. However, this isn't possible, since (a) we'd need to run the code to know that, and (b) it would completely screw up the timing. > We did some experiments and found that over 95% of the emulator's run > time is spent on the big loop of M6502High::execute() where on average > 7000 instructions are executed for each frame. This makes complete sense, but I'm not sure how you can reduce it any further in the current implementation. I guess you could try editing TIA::updateFrame and completely comment out some of the non-timing logic. You'd need to send me your current TIA.cxx file so I can point out the areas to remove. Also, in older versions of Stella, there was a M6502Low class that was faster than the current one, but less compatible. Perhaps using this will save more time?? It's not in the current code, but should still be in the repository. Now, how much work will be required to re-integrate it will depend on what version of Stella you're currently using. Good luck, Steve A. Stella maintainer |
From: Aja H. <aja...@gm...> - 2013-05-29 11:29:21
|
Hi Stephen, Thank you for the reply. 2013/5/28 Stephen Anthony <sa6...@gm...> > First of all, I'd like to hear about your testing, and how you determine > that whether there have been any improvements in speed. After all, if > Stella can be made faster, it would benefit everyone to have the required > changes in the main codebase. > In our machine learning algorithm, we do simulations starting from the current frame. Each simulation continues for totally L frames (L = 100, for example) then stops. An agent plays the game during the simulations. The simulations can be different because the agent's behavior is not deterministic. The speed is measured by simulations / sec. In my laptop, I got 60 simulations / sec for pong. > > 1. The 25% speed up was achieved by simply moving System.peek() and > > System.poke() to the header file and inlining them. > > I'll consider doing this if the increase in speed is that great. Again, > how do you measure it? On what systems?? Inlining System::peak() and System::poke() is clearly a 25% gain in speed. In pong I got 75 simulations / sec after inlining them by moving the code into System's class definition. > The timing of the CPU and the drawing through TIA is intimately related, > and simply turning off the generation of TIA serial output is likely to > break things (as you've seen). Perhaps turn off the actual > rendering/blitting to the screen itself. This can be found in the > FrameBufferXXX::drawTIA methods. Thanks for the explanation. In fact, we don't need to *draw* the screen in the simulations. Suppose the *real* game is currently in frame F. N simulations are ran starting from F without being drew in the screen. The agent then learns what is the best next action of F by playing in these N simulations. After these N simulations are finished, the agent picks the best action for F, then we draw the screen to progress the real game. I understand that in atari the information of frames may be required for the game logic, but is it possible to ignore all the frame-drawing stuff without breaking a game's progress? Software mode has rudimentary 'dirty-rectangles' support, but in many > cases it's faster to just do the blit (tracking if something has changed > is slower than just assuming it has and dealing with the results). For > OpenGL rendering, no change tracking is done, as it's faster to just send > a 160x200 texture as-is. > Let me use an example to describe my point. Suppose we have a frame F of pong. The next frame is F'. Suppose the difference between F' and F is only that the ball moves exactly one pixel from the original position. Is it possible to re-draw only the two pixels, i.e. clean the ball's pixel at F and draw the new pixel at F'? We did some experiments and found that over 95% of the emulator's run time is spent on the big loop of M6502High::execute() where on average 7000 instructions are executed for each frame. If we can draw the difference only, probably we can reduce the number of instructions to 70 on average. > BTW, related to this, Stella 4.0 will be moving to SDL 2.0, which will use > Direct3D/OpenGL behind the scenes. So the entire software rendering > infrastructure currently in Stella will be removed. Good to hear that. In ALE we are using an old version of Stella, but we might update to the new version at some point. > I would turn off all compile-time options that you don't need. The biggest > would be debugger support, since when it is compiled in, there is work > done on every instruction emulated/executed. You obviously don't need > this if you don't use the debugger. > > Similarly, the joystick and cheatcode systems can be compiled out, as they > have a per-frame runtime penalty. > Yes, we had turned off the debugger, joystick and cheatcode support already. Thanks for the advices. Kind regards, Aja Huang |
From: Stephen A. <sa6...@gm...> - 2013-05-28 15:35:32
|
On May 28, 2013 7:45:04 AM Aja Huang wrote: > Dear developers of Stella, > > We are currently using Stella in our research with the arcade learning > environment (ALE) (http://www.arcadelearningenvironment.org) on machine > learning. We need to speed up the emulator as more as possible (like 10 > even 20 times faster) in order to run more efficient experiments. After > trying for a week, so far we got around 25% speed up. We are wondering > what would be your opinions about how to further massively speed up > the emulator in Stella. First of all, I'd like to hear about your testing, and how you determine that whether there have been any improvements in speed. After all, if Stella can be made faster, it would benefit everyone to have the required changes in the main codebase. > 1. The 25% speed up was achieved by simply moving System.peek() and > System.poke() to the header file and inlining them. I'll consider doing this if the increase in speed is that great. Again, how do you measure it? On what systems?? > 3. We tried to turn off screen drawing but let the game keep running by > some modifications in TIA:peek() and TIA:poke(). This is around 20% > speed up, but it broke some games. We are still trying to figure out > the reason. The timing of the CPU and the drawing through TIA is intimately related, and simply turning off the generation of TIA serial output is likely to break things (as you've seen). Perhaps turn off the actual rendering/blitting to the screen itself. This can be found in the FrameBufferXXX::drawTIA methods. > 4. Is it possible to avoid re-drawing the same pixels (even avoid > running the same instructions) between two contiguous frames? We did > some experiments and found that in some games, for example pong, most > of the contiguous frames are highly similar with over 99% same pixels. > Looks like it will be a big gain if we can draw the different pixels > only. Software mode has rudimentary 'dirty-rectangles' support, but in many cases it's faster to just do the blit (tracking if something has changed is slower than just assuming it has and dealing with the results). For OpenGL rendering, no change tracking is done, as it's faster to just send a 160x200 texture as-is. BTW, related to this, Stella 4.0 will be moving to SDL 2.0, which will use Direct3D/OpenGL behind the scenes. So the entire software rendering infrastructure currently in Stella will be removed. > 5. Is it possible to fit all the registers into a single 64-bit > register? I don't think so, but I'm not really sure it would give a speedup anyway. If it would, then I'd consider it. > We will really appreciate if you could give us some hints. I would turn off all compile-time options that you don't need. The biggest would be debugger support, since when it is compiled in, there is work done on every instruction emulated/executed. You obviously don't need this if you don't use the debugger. Similarly, the joystick and cheatcode systems can be compiled out, as they have a per-frame runtime penalty. Finally, you could also delete or prune the src/emucore/DefProps.hxx file, which contains the built-in ROM database. While this doesn't have a runtime penalty per-se, it would result in a smaller executable that could give some performance improvements. That's about all I can think of right now. Again, I'm very interested to hear how code re-arrangements can make things faster, as this would benefit everyone using Stella. Thanks, Steve A. Stella maintainer |
From: Aja H. <aja...@gm...> - 2013-05-28 10:15:12
|
Dear developers of Stella, We are currently using Stella in our research with the arcade learning environment (ALE) (http://www.arcadelearningenvironment.org) on machine learning. We need to speed up the emulator as more as possible (like 10 even 20 times faster) in order to run more efficient experiments. After trying for a week, so far we got around 25% speed up. We are wondering what would be your opinions about how to further massively speed up the emulator in Stella. Here are our thoughts and questions: 1. The 25% speed up was achieved by simply moving System.peek() and System.poke() to the header file and inlining them. 2. We noticed that using low-resolution frames is about 20% speed up. 3. We tried to turn off screen drawing but let the game keep running by some modifications in TIA:peek() and TIA:poke(). This is around 20% speed up, but it broke some games. We are still trying to figure out the reason. 4. Is it possible to avoid re-drawing the same pixels (even avoid running the same instructions) between two contiguous frames? We did some experiments and found that in some games, for example pong, most of the contiguous frames are highly similar with over 99% same pixels. Looks like it will be a big gain if we can draw the different pixels only. 5. Is it possible to fit all the registers into a single 64-bit register? We will really appreciate if you could give us some hints. Kind regards, Aja Huang |
From: Stephen A. <sa6...@gm...> - 2010-05-23 13:45:36
|
On May 16, 2010 12:47:58 pm Justin Chevrier wrote: > Hey Guys, > > I've noticed a loud static sound in all games (only tested in 3.0.0 > and 3.1.2). Changing various audio settings in the sound dialog > changes the sound of the static but never eliminates it. It's so loud > you can barely hear the effects from the games. > I did some searching and found that there have been scattered reports > of the same thing from time to time as well. > > I decided to mess around with the code a bit and found that making > the change below (attached) completely eliminates the static and > sound seems to work fine. > > Obviously switching from unsigned 8bit to unsigned 16bit fixes the > issue on my system, but as I'm not familiar with SDL I'm really not > sure why that is, or why it would be required on my system. > > Here are the details on my system: > > Gentoo Linux 64bit > 2.6.33 SMP kernel > Phenom II X2 550 > Creative SBLive (03:07.0 Multimedia audio controller: Creative Labs > SB Live! EMU10k1 (rev 04)) > Gefore 8800 GTS (01:00.0 VGA compatible controller: nVidia > Corporation G80 [GeForce 8800 GTS] (rev a2)) > > gcc-4.3.4 > libSDL-1.2.13 > alsa-1.0.20 > > Hope this helps. If there's any more information I can provide or > patches I can test I'd be more than happy to help. > > Thanks for the great emulator! > > Justin Thanks for the patch, and sorry for the long delay in responding. I tried your patch, and at least on my system it results in sound being generated with a higher 'pitch'. That's the only word I can think to describe it, as the sounds are much 'higher' than they should be. So while you're getting some output, it doesn't sound as it should. Now, this got me thinking. Perhaps the real reason you didn't get correct sound in the first place is that the sound card/operating system doesn't know how to deal with 8-bit sound, and only really works with 16-bit sound (which is what your patch activates). So I may try to convert Stella to use 16-bit sound by default, which might fix both these problems. I'll get back to you over the next month or so, once I find some time to work on this part of Stella. Thanks, Steve A. Stella maintainer |
From: Justin C. <jch...@gm...> - 2010-05-16 15:18:06
|
Hey Guys, I've noticed a loud static sound in all games (only tested in 3.0.0 and 3.1.2). Changing various audio settings in the sound dialog changes the sound of the static but never eliminates it. It's so loud you can barely hear the effects from the games. I did some searching and found that there have been scattered reports of the same thing from time to time as well. I decided to mess around with the code a bit and found that making the change below (attached) completely eliminates the static and sound seems to work fine. Obviously switching from unsigned 8bit to unsigned 16bit fixes the issue on my system, but as I'm not familiar with SDL I'm really not sure why that is, or why it would be required on my system. Here are the details on my system: Gentoo Linux 64bit 2.6.33 SMP kernel Phenom II X2 550 Creative SBLive (03:07.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 04)) Gefore 8800 GTS (01:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTS] (rev a2)) gcc-4.3.4 libSDL-1.2.13 alsa-1.0.20 Hope this helps. If there's any more information I can provide or patches I can test I'd be more than happy to help. Thanks for the great emulator! Justin |
From: Stephen A. <sa6...@gm...> - 2010-04-27 16:43:14
|
On December 26, 2009 05:27:24 pm Vincent Chapman wrote: > I have run Stella for years under Ubuntu but recently switched to a > 64 bit threading core. Stella is grabbing 100% cpu usage :( I > have downloaded Stella and depends and build it from scratch on my > machine, but the problem remains the same. Sound also goes out. > Since building from source on my machine the emulator might run > correctly for a game or two, but then fails and sound goes out and > CPU becomes overloaded with CPU fan spinning up and everything. Has > anyone heard of this problem or posted any fixes? > > Vincent Chapman Stella 3.1.1 has just been released, which potentially fixes these problems. There are two things going on. One has definitely been fixed, but the other is related to Pulseaudio in Ubuntu. If it works for you, then fine, problem solved. Otherwise, you'll have to disable Pulseaudio (my recommendation) or disable sound in Stella until a workaround can be found. Thanks, Steve A. Stella maintainer |
From: Tom M. <deb...@go...> - 2010-02-07 22:44:32
|
Hi, I'm currently porting stella to Palm Pre webOS (an arm7 based linux based mobile phone). I'm using version 2.5 because 3.0 crashed under any circumstances. The problem is, that I do not have such thing like a joystick on the device, therefore I tried to re-map the joystick0 events up,down,left,right to keyboard w,s,a,d.; and several other stuff to keys too (e.g. c to command menu, space to select, delete to game menu and so forth) and diabled joystick support (--disable-joystick). I had to do the mapping in the sourcecode (EventHandler.cxx) because the keymapping menu appears but is unusable because it cycles through the tabs extremely fast from alone (so in ~ 0.01 second each tab appears once or something). However at one time I managed to get into the Input preferences tab and scroll down to the up,down etc keys and it showed the correct mapping (w,s,a,d). The problem now is, that all keymappings are working beside the up,down,left,right mappings. Nothing happens If I press them. Makefile used for building, modfied parts of EventHandler.cxx, run output and config file can be found here: http://webos.pastebin.com/m40d6ca4f (I changed the --disable-joystick parameter meanwhile). Thanks in advance! - Tom |
From: Stephen A. <sa6...@gm...> - 2009-12-27 16:07:07
|
On December 26, 2009 05:27:24 pm Vincent Chapman wrote: > I have run Stella for years under Ubuntu but recently switched to a > 64 bit threading core. Stella is grabbing 100% cpu usage :( I > have downloaded Stella and depends and build it from scratch on my > machine, but the problem remains the same. Sound also goes out. > Since building from source on my machine the emulator might run > correctly for a game or two, but then fails and sound goes out and > CPU becomes overloaded with CPU fan spinning up and everything. Has > anyone heard of this problem or posted any fixes? > > Vincent Chapman I personally develop Stella in Kubuntu 64-bit (currently Karmic), and I haven't experienced any problems like this so far. Without further info, I suggest the following: 1) Download the very latest version of Stella, either to run or re- compile. This means version 3.0. The one included with Debian and Ubuntu is very old (something like 2.2, I think). The webpage is stella.sf.net. 2) Are you using software or OpenGL video mode? If the latter, what type of video card (Nvidia, Intel, ATI). In general, Nvidia is best supported, followed by Intel and then ATI (at least in my experience). 3) If you're running graphical effects (aka Compiz), have you tried turning them off? These effects have been known to cause problems in various games and emulators. Whether your video card is fully supported also determines how well Compiz will work. That's all I can suggest for now. Let me know if this fixes anything. Steve A. Stella maintainer |
From: Vincent C. <n1...@co...> - 2009-12-26 20:57:24
|
I have run Stella for years under Ubuntu but recently switched to a 64 bit threading core. Stella is grabbing 100% cpu usage :( I have downloaded Stella and depends and build it from scratch on my machine, but the problem remains the same. Sound also goes out. Since building from source on my machine the emulator might run correctly for a game or two, but then fails and sound goes out and CPU becomes overloaded with CPU fan spinning up and everything. Has anyone heard of this problem or posted any fixes? Vincent Chapman |
From: Thomas J. <tje...@we...> - 2008-08-11 18:33:22
|
Hi Marc wrote: > That's the line I commented out and it didn't seem to make any > difference. Obviously I must be doing something wrong? I only commented > out the SDL_Delay part, so I'm assuming the if statement isn't the cause :) Maybe your SDL is setup to sync to your monitors vertical sync? I know that you can do this in DirectX, not sure if it has effects on SDL too. Have fun! Thomas _______________________________________________________ Thomas Jentzsch | *** Every bit is sacred ! *** tjentzsch at web dot de | |
From: Marc L. <la...@ua...> - 2008-08-11 17:32:38
|
Stephen Anthony wrote: > On August 10, 2008 03:02:37 pm Marc Lanctot wrote: >> Hello, >> >> I'm a researcher at University of Alberta using a modified Stella >> that hooks up the joystick inputs to an AI player. We're planning to >> eventually have a player automatically learn how to play Atari games. > > An interesting topic. Do you have any links where I could take a look > at your findings? Well I've just started. All we have done right now is added an RL-Glue [1] module that can hook up to any RL-Glue agent, but right now we only have one that takes random actions. We're still conducting some initial tests on the games and simulation to see what's feasible before we delve into the learning algorithms. I can elaborate on our ideas we have if you really want me too :) I'm working with a team from Rutgers (Andre Cohen and Carlos Diuk) and we're re-using the code that they have implemented. They published a paper about their method at ICML [2]. They presented it in Finland recently. Andre has put up a wiki[3] for our project but already it is out of date, so at the moment the best way to find out what we are up to is to contact us. Part of what we might do is use our developments for a challenge problem in next year's reinforcement learning competition [4]. The idea is we'd give the agents several games to learn from... and then we'd test their agent on an entirely new game to see how well they can transfer the knowledge they learned. >> Anyway, I have a technical question about the code. I would like to >> remove all the artificial delays added by the emulator. We want our >> algorithms to be able to play an entire game of something in as >> little time as possible (eg. a few seconds). > > Comment out the following code in OSystem::mainLoop(): > > if(myTimingInfo.current < myTimingInfo.virt) > SDL_Delay((myTimingInfo.virt - myTimingInfo.current) / 1000); > > This runs the emulation with delay between frames. On my system, the > framerate went from 60 fps to 1250+ fps! That's the line I commented out and it didn't seem to make any difference. Obviously I must be doing something wrong? I only commented out the SDL_Delay part, so I'm assuming the if statement isn't the cause :) How do you measure the FPS? Maybe FPS is not what we want because we don't really care what actually gets displayed... what we'd like is number of mainloop iterations per second to increase drastically. >> In addition: is there any way I could implement "turning off" the SDL >> graphics and still run the emulation? Maybe this is what consumes a >> lot of time? > > In FrameBuffer::update(), comment out drawMediaSource(). Or > alternatively, place a return at the very beginning of > FrameBufferSoft::drawMediaSource() and/or > FrameBufferGL::drawMediaSource() (depending on which rendering mode you > use). > > Doing this in addition to the above resulted in 2700+ fps! I will try this. I'll take some measurements and get back to you. I should mention that what I will be measuring is the number of iterations of the main loop per second.. I've only been able to get somewhere around 50-60. > Thanks for the interest, Thanks for such a well-written emulator. I've never seen code this clean in open-source projects before. It's been pleasant :) Marc [1] http://rlai.cs.ualberta.ca/RLBB/top.html [2] http://paul.rutgers.edu/~cdiuk/papers/OORL.pdf [3] http://www.contentfull.com/atari [4] http://www.rl-competition.org/ -- Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe. -- Albert Einstein |
From: Stephen A. <sa6...@gm...> - 2008-08-11 15:14:18
|
On August 10, 2008 03:02:37 pm Marc Lanctot wrote: > Hello, > > I'm a researcher at University of Alberta using a modified Stella > that hooks up the joystick inputs to an AI player. We're planning to > eventually have a player automatically learn how to play Atari games. An interesting topic. Do you have any links where I could take a look at your findings? > Anyway, I have a technical question about the code. I would like to > remove all the artificial delays added by the emulator. We want our > algorithms to be able to play an entire game of something in as > little time as possible (eg. a few seconds). Comment out the following code in OSystem::mainLoop(): if(myTimingInfo.current < myTimingInfo.virt) SDL_Delay((myTimingInfo.virt - myTimingInfo.current) / 1000); This runs the emulation with delay between frames. On my system, the framerate went from 60 fps to 1250+ fps! > In addition: is there any way I could implement "turning off" the SDL > graphics and still run the emulation? Maybe this is what consumes a > lot of time? In FrameBuffer::update(), comment out drawMediaSource(). Or alternatively, place a return at the very beginning of FrameBufferSoft::drawMediaSource() and/or FrameBufferGL::drawMediaSource() (depending on which rendering mode you use). Doing this in addition to the above resulted in 2700+ fps! Thanks for the interest, Steve A. |