Thread: Re: [Tuxnes-devel] more help w/ asm
Brought to you by:
tmmm
From: Rigel <ri...@an...> - 2001-04-11 20:42:04
|
-- On Wed, 11 Apr 2001 12:20:56 Jim Ursetto wrote: >At 10:45am on 2001 April 11, Rigel did write: >> Hi again, >> All right, I have another one. What's the difference between, say: >> leal 0(%edi),%ebx >> and, >> movl 0(%edi),%ebx. >> >> This is my guess. leal loads value held in mem loc edi into ebx, and movl loads value of mem loc held in edi to ebx. If you understand that, am I right? >Yes, although for "leal" I think you mean "register edi". The leal is >equivalent to movl %edi,%ebx. [At least, I'm 99.9% sure >there isn't any difference.] > >Note also that leal 3(%esi),%esi and addl 3,%esi are >used interchangably in table.x86, which can be a little >confusing. > >-- I was wrong, then, because I meant load into ebx the value at the mem location held in edi. So if I say, leal 4(%ebx),%edi then I mean (in C), edi = ebx + 4 ? Who needs Cupid? Matchmaker.com is the place to meet somebody. FREE Two-week Trial Membership at http://www.matchmaker.com/home?rs=200015 |
From: Rigel <ri...@an...> - 2001-04-11 22:17:57
|
-- On Wed, 11 Apr 2001 16:51:44 Jim Ursetto wrote: >At 03:40pm on 2001 April 11, Rigel did write: >> I was wrong, then, because I meant load into ebx the value at the mem location held in edi. > >> So if I say, >> leal 4(%ebx),%edi >> then I mean (in C), >> edi = ebx + 4 > >Exactly. With lea you can do a mul, add, AND a shl (by >2,4,or 8) correction here. I think you mean a mult by 2, 4, or 8 (which is really shl by 1, 2, 3). I only mention this because I confuse the two all the time... >in only one instruction, but it can also be >used to just do a single mov, add, or shl. lea is a >special instruction that performs all address >computations, then moves this address result into a >register. > >For example, > leal 4(%ebp,%ebx,2),%edi >does (in C) > edi = ebp + ebx*2 + 4; > >What you want is movl (%edi), %ebx. This is like saying ebx = *edi. >This is also equivalent to movl 0(%edi), %ebx -- the 0 is superfluous. > >In intel syntax, > lea eax, [ebx] == mov eax, ebx. > >You may just be confused by the AT&T syntax, but here are some resources: > all right, I think I've got it (all) figured out now (haha). I understood mov all along (promise), just not lea. I do like Intel syntax much better though, as it's a little more like other *real* (RISC) assembly languages. This whole gaffe of mine just illustrates another reason I don't like x86 asm, which is that the instructions are misnamed. I would have called lea, cea, for calculate effective addr. or maybe just wpos for weird piece of sh**. I'll definitely look up that Art of Assembly stuff. It should be a good reference. Okay, with the zero and sign flag (edi) you just give it the latest result from the ALU, right? But what about the carry flag (ebp) ? do I just set it to 1 if there was a carry involved in the most recent instruction? Thanks much, Rigel Who needs Cupid? Matchmaker.com is the place to meet somebody. FREE Two-week Trial Membership at http://www.matchmaker.com/home?rs=200015 |
From: Eric J. <ea...@ri...> - 2001-04-11 23:41:52
|
"Rigel" <ri...@an...> tapped some keys and produced: > all right, I think I've got it (all) figured out now (haha). I > understood mov all along (promise), just not lea. I do like Intel > syntax much better though, as it's a little more like other *real* > (RISC) assembly languages. > > This whole gaffe of mine just illustrates another reason I don't like > x86 asm, which is that the instructions are misnamed. I would have > called lea, cea, for calculate effective addr. or maybe just wpos for > weird piece of sh**. The way I think about it is that lea is like taking a C "&" operator of the corresponding mov > > movl 4(%ebx),%edi is like edi = *(ebx+4) > > leal 4(%ebx),%edi is like edi = &(*(ebx+4)) or edi = ebx+4 (Or, perhaps a more exact analogy is that the target register is a reference variable, C++ style, if you understand that sort of thing.) BTW, lea exists in Motorola 68K, and is used in much the same way. > > I'll definitely look up that Art of Assembly stuff. It should be a good > reference. > > Okay, with the zero and sign flag (edi) you just give it the latest > result from the ALU, right? But what about the carry flag (ebp) ? do I > just set it to 1 if there was a carry involved in the most recent > instruction? You set it the way that a 6502 would set it, as you should do with Z, N and V as well. You don't have to set up the register mapping the same way as it is for the x86; you should use whatever mapping is the most efficient for your architecture. The 6502 instructions that modify the carry flag are ADC, SBC, ASL, LSR, ROL, ROR, CMP, CPX, CPY, PLP, SEC, CLC. (Did I miss any?) Eric -- |
From: Jim U. <ji...@3e...> - 2001-04-11 23:42:15
|
At 05:17pm on 2001 April 11, Rigel did write: > >Exactly. With lea you can do a mul, add, AND a shl (by > >2,4,or 8) > correction here. I think you mean a mult by 2, 4, or > 8 (which is really shl by 1, 2, 3). I only mention > this because I confuse the two all the time... What the...? That's what I get for re-editing my paragraph too many times. I meant, you can do a MOV, add, and a MUL (by 2, 4, or 8). > This whole gaffe of mine just illustrates another > reason I don't like x86 asm, which is that the > instructions are misnamed. LEA confused me for a long time, probably up until I looked it up in the Art of Assembly. > Okay, with the zero and sign flag (edi) you just give > it the latest result from the ALU, right? But what > about the carry flag (ebp) ? do I just set it to 1 > if there was a carry involved in the most recent > instruction? No, ebp is -1 on carry set, and 0 on carry clear. See SEC, CLC, and ROR - Immediate in table.x86 for examples. I am curious---what exactly are you doing? Are you trying to port the dynamic recompiler? I don't mean to discourage you, but surely it would be easier to use a C or asm interpreter, because the dynamic recompiler, although really cool, is also really hard to understand. For the SH-4 arch, I ripped out the recompiler and replaced it with a GPL'ed C core taken from nosefart, which was substantially easier, although with some speed cost. This could be reimplemented in assembly for some speedup (though it'd still be interpreted). You could always port the recompiler later. Jim -- "There is a very hollow echo of a gaur in the birth of that animal to a cow in Iowa. To say that is a gaur is to disrespect all gaurs in all the places where gaurs live. That animal will never live its life in true gaurdom, to wander in the forests of India and frolic with other gaurs and die and let teak trees grow out of it. That's the gaur I'm working to save." -K. Redford ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |
From: Mike M. <mel...@pc...> - 2001-04-12 00:48:32
|
On Wed, 11 Apr 2001, Jim Ursetto wrote: > I am curious---what exactly are you doing? Are you > trying to port the dynamic recompiler? I don't mean to > discourage you, but surely it would be easier to use > a C or asm interpreter, because the dynamic recompiler, > although really cool, is also really hard to > understand. For the SH-4 arch, I ripped out the I already tried to discourage him and he still expressed interest in "porting" the dyn-rec engine to the MIPS ISA. Thus, I admire his motivation...:) Actually, it's not really porting. The primary issue involved is writing a new opcode translation table as well as a new ASM linkage file (such as x86.S). > recompiler and replaced it with a GPL'ed C core taken > from nosefart, which was substantially easier, although > with some speed cost. This could be reimplemented in > assembly for some speedup (though it'd still be interpreted). > You could always port the recompiler later. But people really, really like the dyn-rec approach because, as you noted, it's cool. It's difficult, sure, but if someone wants to go for it, great. After the next release, I want to take TuxNES in a direction that utilizes the dyn-rec approach, if available, and can fall back on a portable core. Also, having a portable core will be the default option for games that use certain mappers (such as MMC5) that currently can't be emulated in TuxNES due to limitations in the dyn-rec approach. -- -Mike Melanson |
From: Jim U. <ji...@3e...> - 2001-04-12 03:54:05
|
At 07:50pm on 2001 April 11, Mike Melanson did write: > On Wed, 11 Apr 2001, Jim Ursetto wrote: > Actually, it's not really porting. The primary issue > involved is writing a new opcode translation table as well as a new > ASM linkage file (such as x86.S). Hmmm... if translating assembly language between architectures is not porting, then I don't know what is ;) > But people really, really like the dyn-rec approach > because, as you noted, it's cool. It's difficult, > sure, but if someone wants to go for it, great. More power to Rigel and I hope he succeeds. While he's in there, it'd be great if he could look at some issues I noticed with dynrec: 1) Arbitrary writes/reads to IO ports don't work, only special cases. I'm sure the common cases are covered, but the occasional ROM may do something unexpected. Probably difficult or impossible to get 100% right without sacrificing speed. 2) In the same vein, some opcodes (ASL, ROL) don't use the mapper. 3) Some opcodes consider the range $2000-$5FFF as I/O space, others use a different range. This may be an issue with the area around $3F00 as well. Unfortunately, I can't point to any examples of this offhand :( 4) Some x86 instruction use/comments inconsistent, such as addl/leal used interchangeably. Probably just a function of being written over a long period, also a very minor point. The main point is that these issues confused me when I was trying to understand the dynrec code. There could be compatibility implications as well, probably extremely uncommon. Maybe we should consider contributing any interesting tidbits about the dynrec code to HACKING, or comment the code in especially difficult places. All in all, the dynrec is incredibly cool, and it must have taken a huge effort to get it working and find all those special translation cases... and it's very fast, nearly 10 times faster than the Nofrendo core, using the none renderer. > After the next release, I want to take TuxNES in a direction that > utilizes the dyn-rec approach, if available, and can fall back on a > portable core. I implemented it this way in my copy of the tree, except dynrec/interpreted mode was a compile-time option. It should be possible to do it on the fly, though. -- "... [WM97/Melissa-V] triggers immediately and attempts to delete data on your M:, N:, O:, P:, Q:, S:, F:, I:, X:, Z:, H:, and L: network drives." ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |
From: Mike M. <mel...@pc...> - 2001-04-12 13:08:24
|
On Wed, 11 Apr 2001, Jim Ursetto wrote: > Hmmm... if translating assembly language between architectures is > not porting, then I don't know what is ;) Okay, you've got me there...:) > More power to Rigel and I hope he succeeds. While he's > in there, it'd be great if he could look at some > issues I noticed with dynrec: [issues] Right. These are all well-known issues with the dyn-rec approach. Basically, it doesn't cover all the bases with respect to reading from and writing to the memory space. Naturally, this isn't a big issue for 90% of the games out there, only the coolest 10%, such as Castlevania III which uses MMC5 which maps its registers in the lower half of memory. Also, don't forget another big problem with the dyn-rec approach: It absolutely cannot handle self-modifying code. That's why it also contains a translator that handles the most common instructions empirically discovered in existing self-modifying routines. > extremely uncommon. Maybe we should consider > contributing any interesting tidbits about the dynrec > code to HACKING, or comment the code in especially > difficult places. Since I do understand the dyn-rec engine to a decent extent (as opposed to, say, the sound engine), I plan to document that well in the HACKING file. > All in all, the dynrec is incredibly cool, and it must > have taken a huge effort to get it working and find all > those special translation cases... and it's very fast, > nearly 10 times faster than the Nofrendo core, using > the none renderer. 10 times faster? Really? Finally, some real speed comparisons. However, how much time does the program spend simulating the 6502 in comparison to, say, rendering the graphics 60 times per second? > I implemented it this way in my copy of the tree, > except dynrec/interpreted mode was a compile-time > option. It should be possible to do it on the fly, > though. This is good. I'm thinking that we'll have to add another field to the mapper table in order to indicate whether that mapper has to use the portable core in order to work because it accesses lower memory. Whatever the program chooses to use could be overridden by a run-time switch. -- -Mike Melanson |
From: Jim U. <ji...@3e...> - 2001-04-12 14:52:13
|
At 08:10am on 2001 April 12, Mike Melanson did write: > On Wed, 11 Apr 2001, Jim Ursetto wrote: > Right. These are all well-known issues with the dyn-rec > approach. Well, don't I feel silly now. I thought the dyn-rec engine was poorly understood except by the original author. Glad to hear that. > > and it's very fast, nearly 10 times faster than the Nofrendo core, > > using the none renderer. > > 10 times faster? Really? Finally, some real speed comparisons. > However, how much time does the program spend simulating the 6502 in > comparison to, say, rendering the graphics 60 times per second? I think it'd be difficult to get a pure measure of 6502 emulation speed, unless someone has a long sequence of varied instructions that don't depend on input or output. Barring this, I simply used the none renderer, decoupling it from 60fps vertical refresh by commenting out the sync code. Whenever UpdateDisplayNone() is called, it increments a global frame count. Input and output routines are still called normally, because otherwise the cart wouldn't progress. The I/O routines call drawimage, but this is a no-op. So the approximate overhead is: - input and output routines - CLOCK, CTNI, VBL, etc. variables updated - fiddling based on register - drawimage no-op - interrupt routine - CLOCK etc. variables updated - UpdateDisplayNone no-op No rendering is done at all. The remaining overhead, I suspect is concentrated in the I/O routines (the "fiddling" part). Speed-wise, on a Celeron 366 I get: dyn-rec engine: 4250 fps nes6502 core: 600 fps M6502 core: 440 fps So let's say 7 times faster. This was timed several times with Super Mario Brothers, using the eyeball-and-watch method, over a 1 minute period, dividing total frames by 60. Admittedly not the most accurate method, but gives you a general idea. At one time I had implemented an accurate fps counter, but not in this copy of the code. When rendering is turned on, and synced to refresh, there is no appreciable difference in rendering speed on my system. I.e., it runs at a full 59 or 60 fps at 400x300 (unscaled), when run with full-screen DGA or GGI. However, I haven't done any speed comparisons with rendering on but without vertical sync. That might be interesting. It's too bad I still have that slowdown problem with my Maestro2 soundcard. It's not as fun without sound... -- "... [WM97/Melissa-V] triggers immediately and attempts to delete data on your M:, N:, O:, P:, Q:, S:, F:, I:, X:, Z:, H:, and L: network drives." ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |
From: Jeroen Ruigrok/A. <as...@wx...> - 2001-04-13 09:25:14
|
-On [20010412 03:00], Mike Melanson (mel...@pc...) wrote: > I already tried to discourage him and he still expressed interest >in "porting" the dyn-rec engine to the MIPS ISA. MIPS ISA is pretty sweet if it is the one used in the v4x00 and subsequent family members. I think that was MIPS language III or IV. -- Jeroen Ruigrok van der Werven/Asmodai .oUo. asmodai@[wxs.nl|freebsd.org] Documentation nutter/C-rated Coder BSD: Technical excellence at its best D78D D0AD 244D 1D12 C9CA 7152 035C 1138 546A B867 Once all struggle is grasped, miracles are possible... |
From: Rigel <ri...@an...> - 2001-04-12 15:28:13
|
-- On Wed, 11 Apr 2001 19:40:40 Eric Jacobs wrote: >"Rigel" <ri...@an...> tapped some keys and produced: > > >> all right, I think I've got it (all) figured out now (haha). I >> understood mov all along (promise), just not lea. I do like Intel >> syntax much better though, as it's a little more like other *real* >> (RISC) assembly languages. >> >> This whole gaffe of mine just illustrates another reason I don't like >> x86 asm, which is that the instructions are misnamed. I would have >> called lea, cea, for calculate effective addr. or maybe just wpos for >> weird piece of sh**. > >The way I think about it is that lea is like taking a C "&" operator >of the corresponding mov > >> > movl 4(%ebx),%edi is like edi = *(ebx+4) > >> > leal 4(%ebx),%edi is like edi = &(*(ebx+4)) > or edi = ebx+4 > >(Or, perhaps a more exact analogy is that the target register is a >reference variable, C++ style, if you understand that sort of thing.) > >BTW, lea exists in Motorola 68K, and is used in much the same way. > > >> >> I'll definitely look up that Art of Assembly stuff. It should be a good >> reference. >> >> Okay, with the zero and sign flag (edi) you just give it the latest >> result from the ALU, right? But what about the carry flag (ebp) ? do I >> just set it to 1 if there was a carry involved in the most recent >> instruction? > >You set it the way that a 6502 would set it, as you should do with Z, >N and V as well. You don't have to set up the register mapping the >same way as it is for the x86; you should use whatever mapping is the >most efficient for your architecture. The 6502 instructions that modify >the carry flag are ADC, SBC, ASL, LSR, ROL, ROR, CMP, CPX, CPY, PLP, >SEC, CLC. (Did I miss any?) You're right, I don't *have* to keep similar reg mapping, but it will be much easier to code, as I will have to rewrite less of dynrec.c (which, as mentioned before, is quite complex). heads up here. I don't understand the implementation in table.x86. So on all these listed instructions I set carry flag if there was a carrry (and clear it otherwise)? Yes I am trying to port the sucker to mips. Wouldn't it be best though, to write the linkage (x86.S) in C? Has anyone noticed that most of the jumps are given as offsets, not labels? Also that it's almost completely undocumented? I'm not relishing the thought of translating that to mips. I'm beginning to see a conspiracy. Who does this Quor guy work for? I may have to mail him, eh? Whoever it was who said they could get me a cheesy rom (I forget who) that would be quite nice, though I need to get this all running on the SGI first. Who needs Cupid? Matchmaker.com is the place to meet somebody. FREE Two-week Trial Membership at http://www.matchmaker.com/home?rs=200015 |
From: Jim U. <ji...@3e...> - 2001-04-12 20:05:27
|
At 10:28am on 2001 April 12, Rigel did write: > heads up here. I don't understand the implementation in table.x86. So > on all these listed instructions I set carry flag if there was a carrry > (and clear it otherwise)? Yes. What trouble are you having with the implementation? When carry is set, ebp is -1. When carry is clear, ebp is 0. For instructions which may carry, the sequence is: addl -1, %ebp # set i386 carry if %ebp was set <instruction which may carry> # set i386 carry according to 6502 instruction sbbl %ebp, %ebp # set %ebp to -1 on carry, else clear Like I said, check out SEC, CLC and ROR - Accumulator, these are the simplest examples. > Yes I am trying to port the sucker to mips. Wouldn't it be best though, > to write the linkage (x86.S) in C? There are a few things x86.S does: 1) acts as a call gate between asm and C -- e.g. most of the memory mappers. This needs to stay in assembly by definition. 2) Brings the global CLOCK, CPF, CTNI, etc. variables back in sync with the in-register versions -- e.g. in the INPUT:, OUTPUT:, NMI: routines, and in a couple mappers (mmc1, mmc3, aorom to be specific). 3) Distinguishes between IRQ and NMI in routine NMI:. Then it sets up the 6502 for interrupt (push return address, php, setup flags). 4) Handles self-modifying code. Also must stay in assembly. You can rewrite 2) and 3) in high-level code. ** SPOILERS AHEAD ;) ** For example, the OUTPUT: routine looks like this in C: void output_shim(int addr, int val) { int cesi = ESI; cesi = cesi - CTNI + CLOCK; CTNI = ESI; if (cesi - CPF >= 0) { cesi -= CPF; } CLOCK = cesi; output(addr,val); ESI = CTNI; } NMI: has extra logic to handle 3)--specifically, the distinguishing between IRQ and NMI. Interpreter cores should implement the 6502 interrupt setup themselves, so this can be ignored. This is required for a C-based core. However, there's no reason to rewrite these in C if you're using an assembly language core, since the linkage needs to stay anyway. -- "But if food is so good for you, how come the body keeps trying to get rid of it?" -- breatharian.com ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |
From: Mike M. <mel...@pc...> - 2001-04-13 03:41:23
Attachments:
mirror.nes.gz
sourceC000.asm
|
On Thu, 12 Apr 2001, Rigel wrote: > Yes I am trying to port the sucker to mips. Wouldn't it be best though, to write the linkage (x86.S) in C? Has anyone noticed that most of the jumps are given as offsets, not labels? Also that it's almost completely undocumented? I'm not relishing the thought of translating that to mips. I'm beginning to see a conspiracy. Who does this Quor guy work for? I may have to mail him, eh? Actually, Quor is on this list, or at least the address from which he used to post is still on the list. It's been awhile since we heard from him. Regarding your question about why there's all that C-ASM linkage written in ASM...umm, well good question...:) The best answer I have right now is, "That's the way Quor did it originally." Fortunately, I see that Jim has posted a more technical answer. > Whoever it was who said they could get me a cheesy rom (I forget who) that would be quite nice, though I need to get this all running on the SGI first. I've attached said cheesy ROM as well as the main source code for it. The ROM simply tiles the screen with a simple pattern (like all one letter) and allows you to scroll around. There are also some simple sprites in the upper left corner. Use the '-m' option in TuxNES to play with the mirroring. I wrote this ROM to help me understand mirroring. The patterns should change depending on which mirroring you choose. -- -Mike Melanson |
From: Jim U. <ji...@3e...> - 2001-04-11 21:17:13
|
At 03:40pm on 2001 April 11, Rigel did write: > I was wrong, then, because I meant load into ebx the value at the mem location held in edi. > So if I say, > leal 4(%ebx),%edi > then I mean (in C), > edi = ebx + 4 Exactly. With lea you can do a mul, add, AND a shl (by 2,4,or 8) in only one instruction, but it can also be used to just do a single mov, add, or shl. lea is a special instruction that performs all address computations, then moves this address result into a register. For example, leal 4(%ebp,%ebx,2),%edi does (in C) edi = ebp + ebx*2 + 4; What you want is movl (%edi), %ebx. This is like saying ebx = *edi. This is also equivalent to movl 0(%edi), %ebx -- the 0 is superfluous. In intel syntax, lea eax, [ebx] == mov eax, ebx. You may just be confused by the AT&T syntax, but here are some resources: LEA documentation in intel syntax at: http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/CH06/CH06-1.html#HEADING1-136 Scaled indexed addressing modes at: http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/CH04/CH04-3.html#HEADING3-49 Register indirect and indexed addressing modes at: http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/CH04/CH04-2.html#HEADING2-35 -- 'I came to the 3-day "breatharian" seminar in Hawaii, but without the $300 fee to attend. Wiley asked me: "If you can't find $300, then how do you expect to find God?"' -- breatharian.com ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |