You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(13) |
Sep
(42) |
Oct
(17) |
Nov
(7) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(14) |
Feb
(8) |
Mar
(13) |
Apr
(10) |
May
(28) |
Jun
(28) |
Jul
(23) |
Aug
(7) |
Sep
(2) |
Oct
(24) |
Nov
(9) |
Dec
(2) |
2002 |
Jan
(58) |
Feb
(15) |
Mar
(57) |
Apr
(26) |
May
(7) |
Jun
|
Jul
(10) |
Aug
|
Sep
(19) |
Oct
(9) |
Nov
(6) |
Dec
(4) |
2003 |
Jan
(4) |
Feb
(1) |
Mar
(3) |
Apr
(5) |
May
(14) |
Jun
(3) |
Jul
(7) |
Aug
(4) |
Sep
(7) |
Oct
(4) |
Nov
(11) |
Dec
(3) |
2004 |
Jan
(32) |
Feb
(21) |
Mar
(3) |
Apr
(11) |
May
(33) |
Jun
(42) |
Jul
(46) |
Aug
(2) |
Sep
(3) |
Oct
|
Nov
(42) |
Dec
(23) |
2005 |
Jan
(5) |
Feb
(2) |
Mar
(12) |
Apr
(26) |
May
(8) |
Jun
(18) |
Jul
(21) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(10) |
Dec
(1) |
2006 |
Jan
(17) |
Feb
(17) |
Mar
(3) |
Apr
(2) |
May
(2) |
Jun
(7) |
Jul
(6) |
Aug
(4) |
Sep
|
Oct
(3) |
Nov
(7) |
Dec
(4) |
2007 |
Jan
(6) |
Feb
(4) |
Mar
|
Apr
(3) |
May
(7) |
Jun
(17) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(2) |
Dec
(5) |
2008 |
Jan
(14) |
Feb
(2) |
Mar
(2) |
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2009 |
Jan
(2) |
Feb
(22) |
Mar
(3) |
Apr
|
May
(7) |
Jun
|
Jul
|
Aug
(15) |
Sep
|
Oct
(32) |
Nov
(9) |
Dec
|
2010 |
Jan
(18) |
Feb
(2) |
Mar
(14) |
Apr
(1) |
May
|
Jun
|
Jul
(2) |
Aug
(7) |
Sep
(6) |
Oct
(35) |
Nov
(4) |
Dec
|
2011 |
Jan
(4) |
Feb
|
Mar
(9) |
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
(9) |
Oct
|
Nov
|
Dec
(4) |
2012 |
Jan
(4) |
Feb
|
Mar
(8) |
Apr
(9) |
May
|
Jun
(176) |
Jul
(86) |
Aug
(20) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(4) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
(13) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(11) |
Aug
|
Sep
(5) |
Oct
(2) |
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
From: <gb...@di...> - 2000-09-22 17:44:17
|
Hi, As for real addressing mode, I stepped down so direct addressing is now the default addressing mode. To enable real addressing, please use the --enable-addressing=real configure option. Real addressing doesn't work on Solaris ; anyway its manpage states that using MAP_FIXED in mmap(2) implies undefined behavior for further sbrk(), malloc() calls... In rom_patches.cpp and rsrc_patches.cpp, I mainly made the following two changes for real addressing - changed type of ScratchMem area from uint32 to uint8 * and use Host2MacAddr() to get the corresponding mac address. Since I don't have an Amiga, please check if my changes in main_amiga.cpp and native_cpu/cpu_emulation.h broke something. - The code for moving the System Zone now patches ROM in order to insert: RAMBaseMac+0x2000 and RAMBaseMac+0x3800 instead of: RAMBaseMac and RAMBaseMac + 0x1800 Thanks. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-21 09:58:45
|
Hi, So far, I got only two crashes with B2 in real addressing mode under Linux/i386: - the one I told you yesterday - another one related with AppleShare DRVR 41 1) The "Lo3Bytes" bug I determined that the unfriendly code starts at ROMBase + 0xdb9c. Bit 0 of the byte located at 0x1efc (what is it ?) is always tested. Then two branches are possible: one with the address stripped down to 24-bits and the other keeping all the bits of the address. In normal operation, the latter branch is always taken but, for some reason, at the end of the Speedometer Graphics test, the branch with the stripping by AND'ing with Lo3Bytes is taken! My fix that has to be improved: Changing the BNE to a BRA... A little barbarous, isn't it ? ;-) 2) The AppleShare bug This morning I tried to boot with extensions on, it failed at the AppleShare extension from MacOS 8.1 (vers 3.7.4) because it tried to read some data from 0x3fff. I finally found that the unfriendly code is located in resource DRVR 41 (Driver: .AFPTranslator) at address 2372. The problem: A word at 0x28e (ROM85) is sign-extended to an address register. Then, a word at that address is fetched. But as that address turns out to be 0x3fff on my system, so B2 crashes. My fix: Replacing six of its instructions with: movea.w ROM85,%a0 adda.l #RAMBaseMac,%a0 movea.l %a0,%a2 nop In the original code: 0x48e7, 0x1c20, // movem.l d3-d5/a2,-(a7) 0x382e, 0x0008, // move.w $0008(a6), d4 0x3a2e, 0x000a, // move.w $000A(a6),d5 0x554f, // subq #$2,a7 0x3eb8, 0x028e, // move.w ROM85,(a7) 0x301f, // move.w (a7)+,d0 0x48c0, // ext.l d0 0x2440, // movea.l d0,a2 0x2040, // movea.l d0,a0 0x3010, // move.w (a0),d0 0x0c40, 0x3fff, // cmpi.w #$3fff,d0 0x6316 // bls.s #$00000016 Stack seems to be used just as a temporary since a push is immediately followed by a pop. I am therefore assuming the value stored in is not used afterwards and directly move the address in the required address registers. Do you have other ideas ? I will probably commit the changes after I tried real addressing under Solaris. Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-20 14:40:52
|
Hi, > [*] Looks specific to Speedometer. [live debugging ;-)] Actually, it > further seems to be the result of a TST.B (A0). Not very informative... That piece of code is at ROMBase + 0xdbae on my Quadra 630 ROM. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-20 14:00:43
|
Hi, [Real Addressing rules] > Ability for unaligned accesses? Actually, this is the job of the do_get_mem/do_put_mem functions. If the host provides unaligned access, that's fine. > And does real addressing work on little-endian machines? Unfortunately, while I was running a Speedometer Graphics performance test (16-bit mode), Basilisk exitted at the end of that test, just before displaying the result or beepin). It seems that it tried to access some function located at 0x00ED466A [*]. How come previous tests ran well ? The only thing I haven't enabled in regards to Real Addressing on the AmigaOS, is the patch to disable overwriting of SysBase. What is it ? I tried to apply this patch but with no avail :-/ [*] Looks specific to Speedometer. [live debugging ;-)] Actually, it further seems to be the result of a TST.B (A0). Not very informative... Real addressing will work on platforms that provide VOSF, especially the ability to retrieve the faultive address when a SEGV signal is caught. This is implemented through the use of the si_addr field of siginfo_t on platforms that support extended signal handlers, or through a sigcontext hack on Linux/i386. Once the faultive address is known, write permission is enabled back to the corresponding page. Later, on screen update, I retrieve the dirty pages and update the host frame buffer. I will also experiment an hybrid between the update in DGA mode and Windowed mode. In DGA mode, there are two temporary frame buffers, one to which data is actually written, the other that is a copy. Then, I use a variant of the static_update method to update the screen. This approach is actually faster than having just one temporary frame buffer and blitting the complete dirty pages. > > I will probably include config.guess and config.sub BTW, I took those from the SDL library, not those from the latest devel autoconf package. Bye. -- Gwenolé Beauchesne |
From: Christian B. <cb...@st...> - 2000-09-19 18:10:48
|
Hi! On Tue, Sep 19, 2000 at 08:11:03PM +0200, Gwenole Beauchesne wrote: > It does work! ;-) Cool! :-) > * real addressing: > - ability to mmap 0x2000 bytes from 0x0000 > - ability for Video on SEGV signals (VOSF) on little endian machines or > big endian ones in a video mode other that 16 bits per pixel Ability for unaligned accesses? And does real addressing work on little-endian machines? > I will probably include config.guess and config.sub Ok. Maybe some of the CPU/OS detection in the configure.in can be cleaned up this way. Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: <gb...@di...> - 2000-09-19 18:02:02
|
Hi, > I will try that new scheme under Solaris this afternoon. It does work! ;-) I will commit soon the diffs. In fact, it might also not be necessary to move RAM/ROM base address determination from uae_cpu/basilisk_glue.cpp to main.cpp The different addressing modes will be configured through --enable-addressing=mode where mode is one of: fastest, real, direct, banks. fastest is the default mode and will try to choose the best addressing mode available for the target system. Tests are: * real addressing: - ability to mmap 0x2000 bytes from 0x0000 - ability for Video on SEGV signals (VOSF) on little endian machines or big endian ones in a video mode other that 16 bits per pixel * direct addressing: - VOSF * banks: - no test, default mode * VOSF: - siginfo_t and si_addr set correctly - sigcontext hack I will probably include config.guess and config.sub in order to retrieve canonical information about the host or the target (cpu, os, vendor) and set macros such as OS_linux, CPU_i386 instead of relying on compiler-specific macros. What do you think about it ? Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-19 11:44:36
|
Hi, > windowed mode. Despite all my efforts, I could only get direct > addressing + DGA + temporary frame buffer 3% slower than memory banks + > DGA with no temporary frame buffer (the original B2 without my patches) With the new direct addressing scheme, DGA is as fast as Windowed mode on raw cpu performance. For graphics performace in DGA mode, it is now as fast as without direct addressing. The reason of that is probably because of a better locality in accesses to memory. The Mac address space was allocated at once (RAM + ROM + VID). I will try that new scheme under Solaris this afternoon. Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-19 10:18:44
|
Hi, > A solution for Solaris and other systems that don't allow such a mapping > is to use a table of offsets. That table is only 16 KB long if I take > 1 MB pages and a 32-bit addressing host system. Hmm, a much simpler solution would be to wipe out the "triple allocation" process and its heuristics. Indeed, I have just realized that the framebuffer is relocatable too... ;-) Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-18 22:38:54
|
Hi, [Direct Addressing] > > Implemented. And indeed, what a nice piece of improvement: near twice > > the speed! In the event that a temporary frame buffer is needed in DGA mode, the user will notice that this mode will be up to 50% slower than in windowed mode. Despite all my efforts, I could only get direct addressing + DGA + temporary frame buffer 3% slower than memory banks + DGA with no temporary frame buffer (the original B2 without my patches) whereas the windowed mode in direct addressing is 50-60% faster than without direct addressing. I told you earlier that I will try direct addressing for Solaris. I tried and it failed :-/ Actually, under Solaris, the only reasonnable mappable region is above 0xE0000000. That's not enough to "triple allocate" the Mac address space. By "triple allocation" I mean allocating RAM, ROM, VID so that the gaps between each of them in the host address space are the same as those in the mac address space. In fact triple allocation could come down to "double allocation" since ROM is relocatable and can be for example next to the RAM region. A solution for Solaris and other systems that don't allow such a mapping is to use a table of offsets. That table is only 16 KB long if I take 1 MB pages and a 32-bit addressing host system. I would like to commit the patches but be advised that: - I altered main.cpp so that RAM and ROM base addresses are known before invokation of VideoInit(). This is enabled only in emulated_68k mode. That code comes from uae_cpu/basilisk_glue.cpp as you suggested. - I added a few lines in rom_patches and rsrc_patches for the ScratchMem hook in direct addressing. This works great, thanks Christian for that trick! - I have an extra file (video_vosf.cpp) for Video on SEGV signals that I will probably split into two files: video_vosf.h and blitters.h. Those files are only meant to be included from video_x.cpp. - I have specialised VideoRefresh functions to handle different sort of display updates. I also had a genvideo module that would generate specialised C handlers according to depth, xshm presence, static/dynamic updates of the screen, etc. I removed the genvideo thing because the benefits are not so impressive. There are more and more #defines :-/ - enable_vosf for Video on SEGV signals - direct_addressing as its name indicates - maybe direct_addressing_lt (lite) for the hack with a table of offsets - have_siginfo_t - have_sigcontext_subterfuge The last two macros are defined in acconfig.h and set at configure time for the config.h file. Besides, those tests are quite huge: ~45 lines each in configure.in... > > 4. Reversed Address Space > > [...] > > Note however that I didn't checked all the patches for "correctness" > > yet, especially audio that makes strange sounds. > > It may not be obvious, but part of the intention behind the Mac2Host_memcpy() > etc. functions in cpu_emulation.h was to allow for a reversed address space. > It's not used consequently in all modules, however. The reason why I chose the prepare,reverse,reverse-back approach is that the reversal can take place without any extra memory. I reckon that the Mac2Host_* functions are cleaner. Maybe next step would be to use them in all places that make use of Mac2HostAddr() and assume a forward address space. The point with direct addressing under Linux/i386 is that in some conditions, it could turn out to real addressing. Then, the JIT compiler will only have to generate load/stores without any other offset... I am pretty sure that this is where the major speed increase will come from. More if no byteswapping at all is needed. Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-18 22:38:52
|
Hi, > On Thu, Sep 14, 2000 at 02:19:45PM +0200, Gwenole Beauchesne wrote: > > It currently achieves only +30% more speed according to Speedometer. > > Not too hot... I am confident that with longer compiled blocks and faster memory accesses (direct addressing), the JIT compiler will generate really faster code. Another point would be not to systematically compile the block that was just executed. > > - In 68040 mode, the translation cache tends to be flushed up to three > > times more than in 68020/68030 mode. > > That's probably because MacOS issues more cache flushes on the '040. Yes, that is why I wondered about finer detection of cache flushes in order to flush only the parts that need to. Actually, when say CINVP occurs, all the cache is flushed... -- Gwenolé Beauchesne |
From: Christian B. <cb...@st...> - 2000-09-18 21:19:52
|
Hi! This is just to let you know that anonymous access to the Basilisk II CVS is now available, too. Set your CVSROOT to :pserver:an...@do...:/cvs The password is "anoncvs". Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: Christian B. <cb...@st...> - 2000-09-15 12:54:07
|
Hi! On Thu, Sep 14, 2000 at 02:19:45PM +0200, Gwenole Beauchesne wrote: > It currently achieves only +30% more speed according to Speedometer. Not too hot... > - 68040 support is broken. Have you ever seen a green scrollbar at > startup ? ;-) "B2 Appearence Manager" :-) > - In 68040 mode, the translation cache tends to be flushed up to three > times more than in 68020/68030 mode. That's probably because MacOS issues more cache flushes on the '040. Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: Christian B. <cb...@st...> - 2000-09-15 12:51:23
|
Hi! On Wed, Sep 13, 2000 at 07:38:02PM +0200, Gwenole Beauchesne wrote: > I am using "cachesize" but if you prefer another name (e.g. > "jit_cachesize") just tell me so. "jitcachesize" :-) Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: Christian B. <cb...@st...> - 2000-09-15 12:49:41
|
Hi! On Thu, Sep 14, 2000 at 02:19:46PM +0200, Gwenole Beauchesne wrote: > In fact, I will now cut that set into two sets: > - one for compiled code that can be flushed > - one for compiled code from ROM Maybe you can use the "illegal MOVEQ" opcodes, some of which are already used for the EMUL_OPs: 71xx, 73xx, 75xx etc. > Then, it's not impossible that we get a big portion of ROM compiled > afterwards... Unfortunately, ROM A-Traps are probably not the most > useful (except parts of QuickDraw), are they ? It depends. Many ROM A-Traps get replaced by newer versions of the system software. > If I remember well, ROM is patched only once whereas rsrc patches could > be rerun at anytime. Yes. Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: <gb...@di...> - 2000-09-14 12:10:55
|
Hi, > On Wed, Sep 06, 2000 at 10:18:55AM +0200, Gwenole Beauchesne wrote: > > get written so we could just use MAE's trick and therefore reserve a set > > of illegal opcodes to patch in the ROM. > Which opcodes? Those that don't fall into the following sets: - the real illegal opcode - {EmulOp,A,F}-line traps - whose handler is not op_illg_1 As a reseult, there are around 9500 opcodes remaining. In fact, I will now cut that set into two sets: - one for compiled code that can be flushed - one for compiled code from ROM Then, it's not impossible that we get a big portion of ROM compiled afterwards... Unfortunately, ROM A-Traps are probably not the most useful (except parts of QuickDraw), are they ? If I remember well, ROM is patched only once whereas rsrc patches could be rerun at anytime. So that approach could work. Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-14 12:10:55
|
Hi, > the optlev thing I don't know what it is used for...) Optimization Level but as I don't and probably won't use the optimizer, I don't need it. The JIT still requires have_get_mem_word disabled. It currently achieves only +30% more speed according to Speedometer. Some drawbacks: - The JIT-enabled BasiliskII does not look as responsive as before - 68040 support is broken. Have you ever seen a green scrollbar at startup ? ;-) - In 68040 mode, the translation cache tends to be flushed up to three times more than in 68020/68030 mode. 320 flushes per second opposed to 125 flushes per second. Probably not that costly after all... - Memory accesses are slow (read: the generated code for it is big) but this will change when I integrate Direct Addressing. I have also been experimenting apart what I will name "pessimistic dead flag calculation elimination". This could provide a one-pass compilation process since live analysis of flags won't be needed anymore. My todo list: - fix 68040 support - fix support for have_get_mem_word_unswapped - integrate direct addressing - integrate, as an option, pessimistic dead flag calculation elimination - integrate MAE-style cpufunctbl patches - enable flag calculation through the old pushfl/pop method because the current one (sahf/seto) is slower on my AMD K6-2. It is activated through the SAHF_SETO_PROFITABLE flag. - merge m68k_execute() and m68k_compile_execute() [m68k_go() replacements] into one function that takes a boolean telling whether or not the JIT compiler can do its job. Bye. -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-13 17:29:12
|
Hi, I managed to make the JIT compiler working. The problem is that it is still unstable and slow. In fact, by disassembling the generated code, I noticed that it just generates calls to the instructions handlers. I know where this may come from (the optlev thing I don't know what it is used for...) and I am trying to solve that problem. It currently requires have_get_word_unswapped *disabled*. When enabled, the compiler-execute process seems to hang when I start Speedometer tests. This may come from the the fact that I have overlooked at MOVEM handlers that explicitely bswap memory. I fixed support for unswapped_get_word in the main compile loop, however. I got rid of the pissoff thing but use an extra SPCFLAG in order to make the jitted code returning when necessary, if spcflags are tested, of course. Another rule is that the compiler is disabled when processing an emul_op. The reason of that is because an emul_op might flush the translation cache while the execution of that emul_op was required from compiled code. Though an emul_op marks the end of a block, I will let it as is for now. i.e. not compiling blocks while emul_op'ing. BTW, the Makefile.in grew up a lot, maybe should we find a more clever rule to compile all parts at once... I also prepared a new table68k with better flag usage and control flow information. This should make longer compiled blocks as flag usage information for instructions such as BFEXT was unknown thus making BFEXT an end-block marker. I would like a new prefs item, for at least the translation cache size. I am using "cachesize" but if you prefer another name (e.g. "jit_cachesize") just tell me so. One more thing: the resulting BasiliskII file is quite big (~3 MB) and the time to compile it is not negligible on my poor computer (AMD K6-2/300). Just hope this won't hurt the host caches... Bye. -- Gwenolé Beauchesne |
From: Christian B. <cb...@st...> - 2000-09-13 17:16:25
|
Hi! On Wed, Sep 06, 2000 at 10:18:55AM +0200, Gwenole Beauchesne wrote: > get written so we could just use MAE's trick and therefore reserve a set > of illegal opcodes to patch in the ROM. Which opcodes? > A problem arises then: is there some very weird programs that rely on > really illegal opcodes ? I don't know any. > Suggested name: FindROMTrap if I understand well Christian's coding > standard. Very well. :-) Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: Christian B. <cb...@st...> - 2000-09-12 18:34:42
|
Hi! On Tue, Sep 05, 2000 at 01:08:01PM +0200, Gwenole Beauchesne wrote: > MacOS seems to try to write to ROM then read back for some testing > purposes, right ? There are two places I know of where the MacOS writes to the ROM: 1. The resource manager does it when using ROM resources. But it doesn't seem to do any harm when the writes are allowed to take place. 2. The longword at address 0 is set to point into the ROM (probably to make broken programs that write to a dereferenced NULL pointer not destroy any RAM contents). Unfortunately, some versions of MacOS itself are also "broken" in that sense and that is the main reason for the "ScratchMem" handling on the Amiga (otherwise, MacOS would crash on boot). This approach seems to be safe. > Yes, Bernie's compiler does that but I was just wondering about > self-modifying code and other ways to detect it and avoid complete > checksuming of basic blocks. MacOS will flush the cache in all instances where self-modifying code is used (or code is loaded) when the CPU is a 68040. > > > Should I make it the default one when an i386 cpu is detected ? > > > > If it's an improvement, then yes. > > Is "it works (enable scrollbars)" a right answer ? ;-) It is. :-) Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: Christian B. <cb...@st...> - 2000-09-12 18:34:37
|
Hi! On Tue, Sep 05, 2000 at 09:46:07PM +0300, Lauri Pesonen wrote: > we should probably do: > if(srcreg != dstreg) m68k_areg(regs, srcreg) += 16;\r\n"); > m68k_areg(regs, dstreg) += 16;\r\n"); Ok. Bye, Christian -- / Coding on PowerPC and proud of it \/ http://www.uni-mainz.de/~bauec002/ |
From: <gb...@di...> - 2000-09-06 23:01:58
|
Hi, I tried to run in direct addressing mode with DGA with no avail :-/ Basically, all I needed to do was just retrieve the host real frame buffer to buffer_copy, then allocate the_buffer later as I did for windowed mode. Unfortunately, for some reason, I don't even reach that part of code. In video_x.cpp/init_xf86_dga, the third XSync() seems to completely lock BasiliskII. As I tried to run B2 with strace, the last system call that got traced was indeed a select(). Commenting out the line, would just make Basilisk hang a few X calls later. That problem was encountered with an i740 X server. Using a traditional S3 Virge (I have two cards in the same box :), the problem is different: everything goes well up to "Starting the emulation..." then, Basilisk simply quits. There was no SEGV signal, otherwise the Screen_fault_handler would have caught it and reported it. Removing direct addressing automagically enables Basilisk back to run in fullscreen DGA mode. Do you happen to have any idea about that ? -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-06 08:10:26
|
Hi, When the compiler comes, it would be interesting to precompile some A-Trap handlers located in ROM. The "some" are yet to be defined but this could be the 20-30 most frequently used traps. Those handlers can be assigned permanent translation because ROM won't get written so we could just use MAE's trick and therefore reserve a set of illegal opcodes to patch in the ROM. A problem arises then: is there some very weird programs that rely on really illegal opcodes ? Suggested name: FindROMTrap if I understand well Christian's coding standard. -- Gwenolé Beauchesne |
From: Lauri P. <lpe...@ni...> - 2000-09-05 18:45:55
|
Hello, One of the Motorola errata files ("040de.txt" in my Motorola archive, "DOCUMENTATION CLARIFICATION FOR MC68040, MC68EC040 & MC68LC040") states: "MOVE16 (Ax)+,(Ay)+ where Ax=Ay is functionally the same as MOVE16 (Ax),(Ay)+. The address register only gets incremented once and the line is copied over itself instead of copied into the next line." So in op op_f620_0(), instead of: m68k_areg(regs, srcreg) += 16;\r\n"); m68k_areg(regs, dstreg) += 16;\r\n"); we should probably do: if(srcreg != dstreg) m68k_areg(regs, srcreg) += 16;\r\n"); m68k_areg(regs, dstreg) += 16;\r\n"); I know that this is a degenerate case, but Murphy's Law states that some weird program relies on this particular behavior :) Lauri |
From: <gb...@di...> - 2000-09-05 11:17:28
|
> "Optimizing direct threaded code by selective inlining" > Ian Piumarta and Fabio Ricardi > PLDI'98. (ACM) > > If you don't have access to the ACM, I could try to recover from my > memory the link from which I got the article. Google was faster: <http://www-sor.inria.fr/publi/ODCSI_pldi98.html> Bye, -- Gwenolé Beauchesne |
From: <gb...@di...> - 2000-09-05 10:59:32
|
Hi, > I think the complete part in uae_cpu/basilisk_glue.cpp, lines 59..82 > (#if REAL_ADDRESSING .. #endif) could be moved into main.cpp/InitAll(), > after the call to CheckROM() has been made. I think so as well, especially the fact that the switch/cases are already in place. Even the memory_init() could be moved because in DIRECT_ADDRESSING mode, I made it do nothing. > > (b) In order to get a chance to mmap() the address space as > > above-mentioned, MacRAM would not get allocated before VideoInit() is > > executed. > > This is harder. I placed VideoInit() at the latest possible point in the > initialization order because it may switch to a screen mode where it's > no longer possible to put up dialog boxes for error messages from modules > that are initialized earlier. Still in direct addressing mode, the trick was to call other buffer initialization routines after InitAll() is completed in main_unix.cpp. The function (VideoInitBuffer) takes only one parameter: the new memory area allocated for the temporary frame buffer. > > (d) Due to the different frame layout that could be used, I implemented > > video handling on SIGSEGV (see below). Drawback: DGA mode will be slower > > since a temp frame buffer will have to be used too. > > But the whole point of DGA mode is to avoid a temporary frame buffer... The rationale is currently as follows according to DGA screen depths: - 8 : no temporary frame buffer is required - 15 : a temp buffer is needed because because long get/put have to be word swapped - 16 : temp buffer required because color conversion is necessary - 24 : byteswap All the cases above-mentioned are true only if the host is little-endian. Otherwise, no temp buffer at all is required. Note: for windowed mode, the reasoning is a bit different since I should take into account the underlying bitmap bit order instead of the host's. See the tests you are take when calling video_x.cpp/set_video_monitor. Another though about direct addressing: MacOS seems to try to write to ROM then read back for some testing purposes, right ? I used Lauri's method to handle that, i.e. protect the ROM area from writes. When an access violation occurs, the Screen_fault_handler() will just advance the host (x86) instruction pointer to the next instruction. But since I don't want to have an advance() function for any target processor, I wonder if the ScratchMem method used for real addressing is safe all the time ? If so, correct #if should also be added in main.cpp and rsrc_patches.cpp. > > TC flush will occur when: > > - Code is created and executed from Execute68k(), Execute68kTrap() > > - FlushCodeCache() is called > > - A-Traps: FlushCodeCacheRange, FlushInstructionCache > > - BlockMove() > > Why is this not simply done every time a MOVEC *,CACR or CPUSH is executed > to clear the emulated 680x0's caches? Yes, Bernie's compiler does that but I was just wondering about self-modifying code and other ways to detect it and avoid complete checksuming of basic blocks. BTW, I was also wondering how hard it would be to patch the Segment Loader and compile, possibly in a more aggressive way, blocks that got loaded. I was also thinking about the use of MAE's dispatching method instead of going every time through the TLB to find and check for a compiled block. In other words, take one unused opcode, put it in place of the first opcode of the block and patch the jump table accordingly. This would make it easily possible to port the JIT compiler on a system that doesn't have GCC and its "Label as Value" extension, say under Windows and VC++ ;-) The TLB is still needed, to recover the original opcode and to push back the special compile_opcode. Overlapping of compiled blocks should be taken care of as well since the original m68k opcode got replace with one of the special compile_opcodes. I have not started to work on compemu_*.c yet. For starters, I will probably create a similar framework that would just generate calls to the appropriate instruction handlers. Then, there would just be the need to have specific "call <target>" instruction generator per target processor. I am also thinking/experimenting/working on another emulator that should enable retargetting of a JIT compiler in near no time. In fact, that's not really a code generator, just a code "copier". GCC will be mandatory because of the following (at least) two features: - "Label as Value" : to determine code ranges to copy - "Explicit Reg Vars" : for static register allocation. If the host permits it, I intend to cache D0, D1, A0, A1, A7 in no particular order. Q&D profiling shows that those are the most frequently used registers. Sure this won't provide as much power as of a customized JIT compiler with dynamic register allocation but the point is that no code generator is needed. ;-) I have not seen this implemented before in a real emulator. I got the idea by reading again: "Optimizing direct threaded code by selective inlining" Ian Piumarta and Fabio Ricardi PLDI'98. (ACM) If you don't have access to the ACM, I could try to recover from my memory the link from which I got the article. > If you are convinced that it will still work on non-x86 machines and other > operating systems (including NetBSD/m68k, where it runs without CPU > emulation), you can check it in. I won't commit the direct addressing diffs before I see it working, at least, on Solaris/SPARC which supports siginfo_t. > > Actually, I never used CVS > > There's a nice tutorial at > http://www-classic.be.com/aboutbe/benewsletter/volume_III/Issue40.html#Insight Thanks, I will check it out. > > Should I make it the default one when an i386 cpu is detected ? > > If it's an improvement, then yes. Is "it works (enable scrollbars)" a right answer ? ;-) PS: sorry, I did not notice I wrote so much text. Bye, -- Gwenolé Beauchesne |