From: Vincent R. <vin...@fr...> - 2025-01-03 02:30:11
|
Hi, and happy new year to all of you. Here is my present: A (partial) disassembly of TOS 1.00 using Ghidra Everything is here, including explanations: https://github.com/disastos/tos100fr Here is some background. Recently, someone told me about a strange GEMDOS bug happening only on TOS 1.00. I quickly managed to reproduce it with a program of mine. As it was simple enough to be bug-free, the problem was obviously inside TOS 1.00. I'm used to trace the ROM with MonST2, no problem for that. But as the GEMDOS is written in C, and uses rather complex structures, it quickly becomes hard to follow. Then I looked at Thorsten's (excellent) reconstructed sources. It helped a lot, but unfortunately the bug was in memory management. And that part of the code changed between TOS 1.00 (my target) and TOS 1.04 (Thorsten's sources). So it wasn't enough. So I decided to disassemble the TOS 1.00 using Ghidra. Not entirely, of course, but at least the offending GEMDOS functions. I finally found the bug, then a workaround, and everyone was happy. I still need to clarify a few things, then I will tell more. Anyway, that's another topic. During that investigation, I practiced Ghidra a lot. And definitely, that's a really powerful tool. And Free. Not completely polished, but definitely very usable, as soon as you have understood the basics. My conclusion is that Ghidra is *the* tool to learn for anyone interested in reverse engineering (just like Git for source control). Ghidra is a bit bloated (not so much, for nowadays standards). It requires a Java JDK as runtime, then it can be downloaded as a simple zip archive. And run from a .bat file. After an austere project management window, the main tool opens. It is a nice workspace-like window, see the screenshot on GitHub. Basically, you start creating a new project by specifying the 68000 CPU. Import the ROM binary (I used tos100fr.img). Then it is automatically analyzed, and disassembled to both assembly language and C. Be sure that the disassembly is as good as possible. But there are no miracles. You do get labels and cross-references, but the names are meaningless. Fortunately, just press L to rename a label and it will change everywhere. Same in the C decompilation, you can rename functions, parameters, variables, etc. For code using structs (such as basepage), it's quite unreadable. But as soon as you define the actual struct and correctly specify the type of the variables, the decompilation becomes completely usable, and you're able to understand the algorithms. And this works really well, thanks to the hard work of Thorsten on the reconstruction of TOS 1.4 sources. Even if there are some important differences, the major code base is similar, when not identical. Of course, the addresses of the internal functions are different. But after renaming some labels, it's just like a puzzle: highly addictive. With more labels, you are able to identify more functions, and so on. So after my initial bugfix, I continued to add labels. And finally, I found most major functions of BIOS/XBIOS/BDOS/VDI. Bad luck with AES/Desktop, because due to the infamous Line-F obfuscation (to reduce code size), Ghidra is lost. Maybe something could be done with a proper script. Again another topic. Anyway, as I disassembled most of the TOS 1.00, and added a lot of labels and structs from Thorsten's sources, I considered it was worth to share this work. It can help a lot to find bugs, and understand things from the origins. As that 40-years old TOS has been considered as abandonware for a long time, it shouldn't be a problem to put that disassembly and related documentation on GitHub. I don't plan to work much more on that project. Or maybe, occasionally. But other people may have additional needs, to disassemble other parts, etc. So this needs to be a collaborative project. This is why I've created a new GitHub Organization to hold that project. I named it DisasTOS, meaning "Disassembly of TOS, and more". Currently, it only holds this single TOS 1.0 project. But this could work equally well for TOS 4.04 for example. Or even some user programs such as FOLDRXXX.PRG for example. My idea is to easily give write access to the DisasTOS organization, so many developers can contribute. Between disassembly and documentation, there is much to do. So if people are interested, here is a new toy. Feel free to contribute. Really, it is worth to learn Ghidra. I won't do much more myself on that topic. Just one more thing: Unfortunately, Ghidra projects can't be easily shared. They actually can, but that requires a Ghidra Server. And we don't have one. If this project becomes popular, we could consider to setup a Guidra Server somewhere. It just needs to be able to run Java, and a Ghidra instance. But for now, let's start like this. What I have set up is really not convenient, but that's a starting point. Basically, Ghidra projects can be archived into a GAR archive. I've put such archive in the GitHub project, so any developer must unpack/work/repack/push the archive. As it is a binary file, it doesn't support multiple concurrent modifications. So a only single user can work on the project as the same time. That's a severe restriction. The only solution seems to be usage of a Ghidra Server. We will see if we finally need that. Enjoy! -- Vincent Rivière |
From: Roger B. <an...@xp...> - 2025-01-03 02:42:17
|
Hi Vincent, On 3 Jan 2025 at 3:29, Vincent Rivière wrote: > Hi, and happy new year to all of you. > > Here is my present: > A (partial) disassembly of TOS 1.00 using Ghidra > > Everything is here, including explanations: > https://github.com/disastos/tos100fr > Nice! Happy New Year! Roger |
From: Thorsten O. <ad...@th...> - 2025-01-03 06:14:25
|
On Freitag, 3. Januar 2025 03:29:59 CET Vincent Rivière wrote: > Here is my present: > A (partial) disassembly of TOS 1.00 using Ghidra Nice work! Unfortunately, ghidra has another quirk: i'm currently not able to open the archive, because it was "created using an unknown version of ghidra" (i'm using a somewhat older version, 9.1.2). I have to check whether i can install a newer version of ghidra alongside, and try again. >Bad luck with AES/Desktop, >because due to the infamous Line-F obfuscation (to reduce code size), Yes, that is certainly a problem. Note that the Line-F dispatcher is used for 2 purposes: - replace function calls by a single trap instruction. The file https:// github.com/th-otto/tos1x/blob/master/bin/linux/lineftab.txt currently defines the mapping of these for TOS 1.04. If they are different in TOS 1.0, that can just be adjusted. Then you'll have to find a way to tell ghidra to understand those calls. Should be doable by a script (but maybe not that easy, you have to dig deep into ghidra documentation) - replace the function epilogue (movem; unlk; rts) I would expect that TOS 1.0 uses the same logic here as TOS 1.4. But for ghidra, this is still a problem, as it probably does not find the end of the function. Ghidra has some other quirks (most of them you already mention in your readme). One of them being, that it often tends to generate "equivalent" C- code, but not neccessarily what was originally written. Eg. It often changes code like "x > 0" to "x >= 1" or similar, and also reorders the bodies of if/ else. But of course, ghidra is only an intermediate tool to easier spot the differences. Ultimate goal would be to do the same as for tos 1.4, and get sources that can be compiled to binary identical images. A long way to go, though. >Or even some user programs such as FOLDRXXX.PRG for example. That is already available in the other repo: https://github.com/th-otto/tos3x/ blob/master/system/foldr100.S |
From: Thorsten O. <ad...@th...> - 2025-01-03 07:18:22
|
Ok, installing a newer version of ghidra (and also a newer version of JDK) worked. I "hacked" the launch.sh script (in <ghidra>/support) a bit to get it working. At the top, i added: export JAVA_HOME=$HOME/ghidra_11/jdk-21.0.5+11 export PATH=$JAVA_HOME/bin:$PATH where $HOME/ghidra_11 is the directory where i installed it. Maybe you want to do the same when working on linux, or put that statements in some custom script. What i didn't know until know: even the tos header with the date seems to be different for different tos languages. for us/uk: 11/20/1985 for de: 02/06/1986 for fr: 04/24/1986 |
From: Peter S. <ps...@sc...> - 2025-01-03 07:35:20
|
Happy New Year everyone. I had a quick look at Ghidra a few months back and last night I was using Sweet16 midi sequencer to try to get my head around Ghidra. I tried Reko decompiler a few years ago but didn't make any progress. Peter On 3 Jan 2025, 02:29, at 02:29, "Vincent Rivière" <vin...@fr...> wrote: >Hi, and happy new year to all of you. > >Here is my present: >A (partial) disassembly of TOS 1.00 using Ghidra > >Everything is here, including explanations: >https://github.com/disastos/tos100fr > >Here is some background. > >Recently, someone told me about a strange GEMDOS bug happening only on >TOS 1.00. I quickly managed to reproduce it with a program of mine. As >it was simple enough to be bug-free, the problem was obviously inside >TOS 1.00. I'm used to trace the ROM with MonST2, no problem for that. >But as the GEMDOS is written in C, and uses rather complex structures, >it quickly becomes hard to follow. Then I looked at Thorsten's >(excellent) reconstructed sources. It helped a lot, but unfortunately >the bug was in memory management. And that part of the code changed >between TOS 1.00 (my target) and TOS 1.04 (Thorsten's sources). So it >wasn't enough. So I decided to disassemble the TOS 1.00 using Ghidra. >Not entirely, of course, but at least the offending GEMDOS functions. I > >finally found the bug, then a workaround, and everyone was happy. I >still need to clarify a few things, then I will tell more. Anyway, >that's another topic. > >During that investigation, I practiced Ghidra a lot. And definitely, >that's a really powerful tool. And Free. Not completely polished, but >definitely very usable, as soon as you have understood the basics. My >conclusion is that Ghidra is *the* tool to learn for anyone interested >in reverse engineering (just like Git for source control). > >Ghidra is a bit bloated (not so much, for nowadays standards). It >requires a Java JDK as runtime, then it can be downloaded as a simple >zip archive. And run from a .bat file. After an austere project >management window, the main tool opens. It is a nice workspace-like >window, see the screenshot on GitHub. > >Basically, you start creating a new project by specifying the 68000 >CPU. >Import the ROM binary (I used tos100fr.img). Then it is automatically >analyzed, and disassembled to both assembly language and C. >Be sure that the disassembly is as good as possible. But there are no >miracles. You do get labels and cross-references, but the names are >meaningless. Fortunately, just press L to rename a label and it will >change everywhere. Same in the C decompilation, you can rename >functions, parameters, variables, etc. For code using structs (such as >basepage), it's quite unreadable. But as soon as you define the actual >struct and correctly specify the type of the variables, the >decompilation becomes completely usable, and you're able to understand >the algorithms. > >And this works really well, thanks to the hard work of Thorsten on the >reconstruction of TOS 1.4 sources. Even if there are some important >differences, the major code base is similar, when not identical. Of >course, the addresses of the internal functions are different. But >after >renaming some labels, it's just like a puzzle: highly addictive. With >more labels, you are able to identify more functions, and so on. So >after my initial bugfix, I continued to add labels. And finally, I >found >most major functions of BIOS/XBIOS/BDOS/VDI. Bad luck with AES/Desktop, > >because due to the infamous Line-F obfuscation (to reduce code size), >Ghidra is lost. Maybe something could be done with a proper script. >Again another topic. > >Anyway, as I disassembled most of the TOS 1.00, and added a lot of >labels and structs from Thorsten's sources, I considered it was worth >to >share this work. It can help a lot to find bugs, and understand things >from the origins. As that 40-years old TOS has been considered as >abandonware for a long time, it shouldn't be a problem to put that >disassembly and related documentation on GitHub. > >I don't plan to work much more on that project. Or maybe, occasionally. > >But other people may have additional needs, to disassemble other parts, > >etc. So this needs to be a collaborative project. > >This is why I've created a new GitHub Organization to hold that >project. >I named it DisasTOS, meaning "Disassembly of TOS, and more". Currently, > >it only holds this single TOS 1.0 project. But this could work equally >well for TOS 4.04 for example. Or even some user programs such as >FOLDRXXX.PRG for example. My idea is to easily give write access to the > >DisasTOS organization, so many developers can contribute. Between >disassembly and documentation, there is much to do. > >So if people are interested, here is a new toy. Feel free to >contribute. >Really, it is worth to learn Ghidra. I won't do much more myself on >that >topic. > >Just one more thing: >Unfortunately, Ghidra projects can't be easily shared. They actually >can, but that requires a Ghidra Server. And we don't have one. If this >project becomes popular, we could consider to setup a Guidra Server >somewhere. It just needs to be able to run Java, and a Ghidra instance. > >But for now, let's start like this. What I have set up is really not >convenient, but that's a starting point. Basically, Ghidra projects can > >be archived into a GAR archive. I've put such archive in the GitHub >project, so any developer must unpack/work/repack/push the archive. As >it is a binary file, it doesn't support multiple concurrent >modifications. So a only single user can work on the project as the >same >time. That's a severe restriction. The only solution seems to be usage >of a Ghidra Server. We will see if we finally need that. > >Enjoy! > >-- >Vincent Rivière > > >_______________________________________________ >Freemint-discuss mailing list >Fre...@li... >https://lists.sourceforge.net/lists/listinfo/freemint-discuss |
From: Miro K. <mir...@gm...> - 2025-01-03 09:31:37
|
On Fri, 3 Jan 2025 at 03:30, Vincent Rivière <vin...@fr...> wrote: > Here is my present: > A (partial) disassembly of TOS 1.00 using Ghidra > > Everything is here, including explanations: > https://github.com/disastos/tos100fr Very nice, interesting read. TOS 4.04 most likely doesn't need this treatment as its leaked source code is pretty much complete. Btw not sure if it's just me but if I click on any of the pictures in README.md, I get some weird one-page http error and not a zoomed image. -- http://mikro.atari.org |
From: Jean-François L. <jfl...@po...> - 2025-01-03 14:25:55
|
On Friday 3 January 2025 10:31:19 Central European Standard Time Miro Kropáček wrote: > On Fri, 3 Jan 2025 at 03:30, Vincent Rivière <vin...@fr...> > wrote: > > Here is my present: > > A (partial) disassembly of TOS 1.00 using Ghidra > > Everything is here, including explanations: > > https://github.com/disastos/tos100fr > > Btw not sure if it's just me but if I click on any of the pictures in > README.md, I get some weird one-page http error and not a zoomed image. No such problem here with Firefox. Cheers, JFL -- Jean-François Lemaire |
From: Thorsten O. <ad...@th...> - 2025-01-04 10:55:26
|
> for us/uk: 11/20/1985 > for de: 02/06/1986 > for fr: 04/24/1986 While trying to identify some more functions from aes/desktop, i also noticed that there are some strange differences if the location of functions, for example gem_main: - in fr: gem_main fd9362 - in de: gem_main fd902a - in us: gem_main fd9340 In TOS 1.04 and later, such addresses typically only differ by a few bytes, caused by different handling of alt-keys in the bios. But in this case, they differ by more than 800 bytes. So, given the different dates of the ROMs, i wonder whether language versions like de/fr were maybe already compiled from slightly newer versions of the code? Also, it seems that TOS 1.00 was more close to the original DRI sources. Eg. the first thing in gem_main is a function call, which seems to be "ini_dlong" from the DRI sources. In 1.04, that function was "inlined" into gem_main. PS: even worse, addresses of some variables seem to be different: fr: gem_main: [00fd9362] 4e56 fff8 link a6,#-8 [00fd9366] 48e7 0304 movem.l d6-d7/a5,-(a7) [00fd936a] 2a7c 0000 73e0 movea.l #$000073E0,a5 [00fd9370] f7c8 dc.w $F7C8 ; ini_dlongs [00fd9372] f7cc dc.w $F7CC ; hcli [00fd9374] f7d0 dc.w $F7D0 ; takecpm [00fd9376] 4279 0000 9f1c clr.w $00009F1C [00fd937c] 4279 0000 6ed2 clr.w $00006ED2 [00fd9382] 42b9 0000 6e98 clr.l $00006E98 de: gem_main: [00fd902a] 4e56 fff8 link a6,#-8 [00fd902e] 48e7 0304 movem.l d6-d7/a5,-(a7) [00fd9032] 2a7c 0000 73e4 movea.l #$000073E4,a5 [00fd9038] f7c8 dc.w $F7C8 ; ini_dlongs [00fd903a] f7cc dc.w $F7CC ; hcli [00fd903c] f7d0 dc.w $F7D0 ; takecpm [00fd903e] 4279 0000 9f20 clr.w $00009F20 [00fd9044] 4279 0000 6ed6 clr.w $00006ED6 [00fd904a] 42b9 0000 6e9c clr.l $00006E9C So maybe it would have been better to start with the US version? |
From: Thorsten O. <ad...@th...> - 2025-01-07 18:00:37
|
On Freitag, 3. Januar 2025 07:14:05 CET Thorsten Otto via Freemint-discuss wrote: > Nice work! That was a great understatement. I recently took a closer look, and realized that you managed to enter already 1700!! symbols. Even with the 1.04 source available, that is simply awesome. You even managed to find some aes variables although the functions cannot correctly be disassembled by ghidra (because of the LineF opcodes). How long did it take to get that far? Anyway, i've spend some time trying to identify the aes functions. About halfway through, (most of aes is done, but most of desktop is still missing). I've pushed my current work to a new branch of the sources: https:// github.com/th-otto/tos1x/tree/TOS_100 Most of it is just done on a large assembler listing (not compilable, just used as a reference). I've also changed already some of the sources where i saw differences, but without being able to verify them yet. There are also extracted resource files. Handling of the resource file is a bit strange: the format dialog is in a seperate resource, and there are 2 additional blocks of data for which i have idea yet what they are good for (you already noticed that too, given that you already assigned lables to them). The routine that copies them to ram (ram_rom) also does some strange juggling with the aes global array. BTW, it seems there only have been official releases for us,de and fr. All other images i found on the net seem to be just patched versions of the german version. |
From: Eero T. <oa...@he...> - 2025-01-07 20:53:14
|
Hi, On 7.1.2025 20.00, Thorsten Otto via Freemint-discuss wrote: > That was a great understatement. I recently took a closer look, and realized > that you managed to enter already 1700!! symbols. Even with the 1.04 source > available, that is simply awesome. You even managed to find some aes variables > although the functions cannot correctly be disassembled by ghidra (because of > the LineF opcodes). How long did it take to get that far? Btw. Hatari Git version will automatically load symbols for TOS image, like it already did for programs started from GEMDOS HD. Only thing needed for that is symbols being in <image>.sym file in same dir as corresponding TOS <image>.img. (This already works fine for EmuTOS releases providing the corresponding *.sym files.) => Could those TOS address & symbol names be provided as Hatari (= nm format) symbol files, along with checksums for matching original TOS images? - Eero |
From: Vincent R. <vin...@fr...> - 2025-01-08 01:34:35
|
On 07/01/2025 at 21:52, Eero Tamminen wrote: > Btw. Hatari Git version will automatically load symbols for TOS image, like > it already did for programs started from GEMDOS HD. Nice. > Only thing needed for that is symbols being in <image>.sym file in same dir > as corresponding TOS <image>.img. > > (This already works fine for EmuTOS releases providing the corresponding > *.sym files.) Good! > => Could those TOS address & symbol names be provided as Hatari (= nm > format) symbol files, along with checksums for matching original TOS images? Yes! This was one of the goals of such disassembly. To help people when debugging. And getting automatically the symbol names in Hatari is definitely convenient. I think that such labels could easily be exported with some Ghidra scripts. And for TOS 1.04+, Thorsten already has the address/label pairs. So I think that your proposition is completely good. Hatari could be distributed with the symbols (as external files) for all know TOS versions. And when detecting a know TOS version, load those symbols automatically. This couldn't be simpler for the users. -- Vincent Rivière |
From: Vincent R. <vin...@fr...> - 2025-01-08 01:21:29
|
On 04/01/2025 at 11:55, Thorsten Otto via Freemint-discuss wrote: > While trying to identify some more functions from aes/desktop, i also > noticed that there are some strange differences if the location of > functions, for example gem_main: > > > - in fr: gem_main fd9362 > > - in de: gem_main fd902a > > - in us: gem_main fd9340 Oh. > In TOS 1.04 and later, such addresses typically only differ by a few bytes, > caused by different handling of alt-keys in the bios. But in this case, they > differ by more than 800 bytes. So, given the different dates of the ROMs, i > wonder whether language versions like de/fr were maybe already compiled from > slightly newer versions of the code? Possible. > Also, it seems that TOS 1.00 was more close to the original DRI sources. Eg. > the first thing in gem_main is a function call, which seems to be > "ini_dlong" from the DRI sources. In 1.04, that function was "inlined" into > gem_main. Indeed, I tried to understand gem_main(), but it didn't match the 1.04 source. You found the reason. > PS: even worse, addresses of some variables seem to be different: Oh. This is interesting to know, in case someone wants to do something with those private variables. Starting from debugging. > So maybe it would have been better to start with the US version? Certainly. But as I started with the French version, and I already had done some amount of work, then I continued. Note that Ghidra supports applying the disassembly markup (labels, etc...) to newer or older version of the target software. As I understood, it should be able automatically find code similarities. However, I haven't tested that feature. Without relocation table, I'm not sure if it could do that with different TOS ROMs. That feature is called "version tracking", and it can be enabled from the Ghidra Project Manager window (NOT the CodeBrowser). -- Vincent Rivière |
From: Vincent R. <vin...@fr...> - 2025-01-09 22:14:47
|
On 08/01/2025 at 02:21, Vincent Rivière wrote: > Note that Ghidra supports applying the disassembly markup (labels, etc...) > to newer or older version of the target software. As I understood, it should > be able automatically find code similarities. However, I haven't tested that > feature. Without relocation table, I'm not sure if it could do that with > different TOS ROMs. That feature is called "version tracking", and it can be > enabled from the Ghidra Project Manager window (NOT the CodeBrowser). BTW, I've quickly tested that "Version tracking" feature. It works! That's crazy. Concretely, it has been able to automatically apply more than 500 TOS 1.00 labels to the TOS 1.02 ROM, by finding code similarities! - Some code hasn't been automatically disassembled. This could explain why no more labels have been found (or not). - Labels have been found in both ROM and RAM. Most sysvars haven't been found, I guess they could easily copied manually. - No AES/Desktop labels have been found. Certainly due to the crippled Line-F code. - Equates haven't been copied. Quick reminder HOWTO: - Don't create a new project. Just import tos102fr.img (or other) to the current project. - Double-click on tos102fr.img to run the CodeBrowser, and set up the memory map there (rom, ram, etc.). Then close the CodeBrowser. - In the Ghidra project manager, select the Version Tracking tool ("steps" icon). - In the toolbar, click again on the steps icon "Version Tracking Wizard". - Select a session name, source program and destination program. - Continue running the wizard. It will open 2 CodeBrowser windows (source and destination). - Go back to the Version Tracking window, and in the toolbar, click the magic wand icon "Run several correlators". - See the result in the top window "Version Tracking matches". Click on each line see a diff tool on matched sections. - In the toolbar, right side, click on the gree check mark to apply the match. This isn't a precise procedure, but should be enough to start. For sure, it would be worth to read the documentation of this tool. Note: I don't plan to do any work on TOS 1.02, or push anything. This was just a quick test. Quite successful for a first try. -- Vincent Rivière |
From: Vincent R. <vin...@fr...> - 2025-01-08 02:50:01
|
On 07/01/2025 at19:00, Thorsten Otto via Freemint-discuss wrote: > That was a great understatement. I recently took a closer look, and realized > that you managed to enter already 1700!! symbols. Even with the 1.04 source > available, that is simply awesome. You even managed to find some aes > variables although the functions cannot correctly be disassembled by ghidra > (because of the LineF opcodes). How long did it take to get that far? Thanks 😄 Actually, I spent a week to hunt that damn GEMDOS bug. So I learned Ghidra and I added the relevant labels and function signatures. Then as that archeology was highly addictive, I continued to add the labels for most TOS entry points. I started with the BIOS/XBIOS/BDOS function tables, as they could easily be identified. Same for VDI and Line-A, but unfortunately not Line-F. Then when finding a function, I looked at function calls inside, and by looking at the TOS 1.04 source I was able to find those function names. And so on. Same for the global variables. For AES/Desktop, it was tricky because of the ugly Line-F calls. I only found some functions. I started with the easy ones such as cli/sti. Then the dos_* wrappers. A Ghidra trick that helped me later. When pressing 'D', Ghidra stops at the first 0xf opcode. But looking 2 bytes after, we can see if there is another Line-F call afterwards. If not, we can press 'D' again and disassemble a few more lines. I did that for a few functions like sh_main, gemstart, etc. But the best method to find functions is to decode the Line-F opcode. For example, if the opcode is 0xf124. It is an even value, so it's a function call. Keep only the last 3 digits, that gives 124. Then in Ghidra, type 'G' (go to) the enter the expression "lineftab+124". This reveals the address of the called routine. Unfortunately, not its name. See more at the end of this message (*). In each case, I entered the names manually with 'L'. Nothing automatic. I did that kind of stuff during 2 more weeks. Then as it wasn't possible to easily go further, I stopped. For the AES variables, I don't remember well. But I guess I found some list of variables in your TOS 1.04 sources. A key tool was Ctrl+Shift+F to find the references to a variable. By seeing how it was used, and initialized, that gave a clue. I specially did that for VDI. > Anyway, i've spend some time trying to identify the aes functions. About > halfway through, (most of aes is done, but most of desktop is still > missing). I've pushed my current work to a new branch of the sources: > https://github.com/th-otto/tos1x/tree/TOS_100 Fine. If, for your work, you add some stuff to the Ghidra disassembly, please consider contributing your finds. I can give you write access to the DisasTOS repositories, no problem with that. Only issue is the inconvenient way to share Ghidra projects. But I don't plan to work actively on it in a near future. So if you want to have hands free to add more labels to the disassembly, then go ahead. On the other hand, if you have your new labels in a flat text file, we should find a way to import them into Ghidra. Certainly easy using a script. > I've also changed already some of the sources where i saw differences, but > without being able to verify them yet. There are also extracted resource > files. Handling of the resource file is a bit strange: the format dialog is > in a seperate resource, and there are 2 additional blocks of data for which > i have idea yet what they are good for (you already noticed that too, given > that you already assigned lables to them). The routine that copies them to > ram (ram_rom) also does some strange juggling with the aes global array. Indeed, it is different from TOS 1.04. I saw the separate resource files, DESKTOP.INF, and other data that I didn't understand. Some other information: I didn't know, but someone else already wrote some interesting Ghidra scripts for Atari binaries 😃 I haven't tried them, but that's certainly worth: https://github.com/czietz/ghidraScripts_for_Atari/ (*) This week I also looked at Ghidra scripts, and after some initial efforts to understand the object model, it seems to be rather easy. At least for basic stuff. Key point is that I found that it was possible: 1) To replace a Line-F instruction with "dw" (a.k.a dc.w) by pressing 'T' and assigning the type "word". 2) To add a reference on that pseudo-opcode to actual Line-F function by pressing 'R' and entering the target address. There are 3 benefits: - Double-click on the opcode reference to jump to the function. - On a Line-F function, use Ctrl-Shit-F to find references. - Ability to rename the function by simply using 'L' on any reference. Yes, I'm really speaking about those obfuscated Line-F calls! This doesn't fix the C decompilation. But at least, this eases browsing from the listing window, finding cross-references, and renaming functions. Then I went further and I wrote a script for that: https://github.com/disastos/tos100fr/blob/main/ghidra_scripts/AddLineFReference.java I will write more complete documentation later. But in a few words: - Get the script with "git pull -r" as usual. - In Ghidra, click on Window > Bundle manager. - Click on the + icon from the right of the toolbar, and add the ghidra_scripts directory. - Click on Window > Script Manager - On the left, go to the Atari section. - On the AddLineFReference.java line, on the left, check the "In Tool" checkbox to enable the script. I assigned it to the '$' key by default. Then to use the script: - Go to Ghidra listing (disassembly) window. - Type 'G' then gem_main, for example, to go to that function. - A few lines down, put the cursor on the "?? F7h" line - And simply press '$'. First time, the script is compiled. Next times, it's immediate. Result: You will see the 2 lines transforming into "dw FUN_00fd92fc" for example. You can double-click on that FUN_00fd92fc to go to its definition, press Alt+Left to go back, and even rename it with 'L' if you managed to determine its real name. This way, it's quite easy to disassemble AES/Desktop: - 'D' for normal disassembly - '$' to disassemble a Line-F call - If needed, 'C' to revert disassembly to undefined state. NB: I pushed the script, but as I didn't make significant work on the disassembly itself, I'm not going to push a new version of the GAR file. -- Vincent Rivière |
From: Thorsten O. <ad...@th...> - 2025-01-08 06:38:31
Attachments:
tos100frsyms.tar.bz2
|
On Mittwoch, 8. Januar 2025 03:49:49 CET Vincent Rivière wrote: > I started with the BIOS/XBIOS/BDOS function > tables, as they could easily be identified. Yes, i did similar for AES. There is no direct function ptr table, but once the crysbind function was identified, you can look at its disassembly, and compare that to the source. Then you know the LineF opcodes of all functions that are called there, and looking at the lineftab, you will find their address. Then you can go to their definition, and locate the other functions from the same sourcefile (AES is still similar enough). >If, for your work, you add some stuff to the Ghidra disassembly, please >consider contributing your finds. Yes, of course. But i have to do some checks first. I am using a variant of the 68000 processor definition that is not part of the ghidra source, don't know how well that works if someone else imports the GAR file. >On the other hand, if you have your new labels in a flat text file, we >should find a way to import them into Ghidra. Certainly easy using a script. You can open the symbol table window, select all labels, then right-click and export them to a CSV file. But there seems to be no (automatic) way to import them again (maybe using the ImportSymbolsScript). My current version is attached if you want to have a look. >I didn't know, but someone else already wrote some interesting Ghidra >scripts for Atari binaries Yes, i know them. But they are for importing TOS binaries, that does not help for the ROMs. >To replace a Line-F instruction with "dw" (a.k.a dc.w) by pressing 'T' >and assigning the type "word". Yes but there is currently one major problem: if you do that, the decompiler will no longer stop at that instruction. But it does not find the function return statement either, so it will continue to the end of the ROM (or if it runs into some other data that cannot be disassembled). That takes ages. I've also tried to match the linefcalls, but without much success yet. See https://github.com/NationalSecurityAgency/ghidra/pull/ 7126#issuecomment-2572492219 Maybe you want to try that if your are more familiar with ghidra. |
From: Vincent R. <vin...@fr...> - 2025-01-08 08:49:30
|
On 08/01/2025 at 07:38, Thorsten Otto via Freemint-discuss wrote: > Yes, i did similar for AES. There is no direct function ptr table, but once > the crysbind function was identified, you can look at its disassembly, and > compare that to the source. Ah, indeed. That makes sense. But crysbind() was one of the latest functions I found before stopping. I even found it after I bundled the current GAR file, so its label is not present in the disassembly. I only put it to internals.md. So currently, I need to do 'G' fe54d6 to find it. Of course, crysbind() is crippled with Line-F calls, so it's a pain to disassemble. But by using 'D' (standard disassembly) and '$' (AddLineFReference script) it becomes rather easy. Then functions can be directly renamed with 'L' from crysbind() disassembly, and you're done. Unfortunately this can't be done from the decompiled source, due to the Line-F issue. > Yes, of course. But i have to do some checks first. I am using a variant of > the 68000 processor definition that is not part of the ghidra source, don't > know how well that works if someone else imports the GAR file. Ah. Yes, this has to be checked. But it seems that processor definitions are only used in read-only mode, to display a nice disassembly and of course produce the dynamic decompilation. So I'm quite optimistic, custom processor definitions shouldn't cause trouble. But this has to be tried. > You can open the symbol table window, select all labels, then right-click > and export them to a CSV file. But there seems to be no (automatic) way to > import them again (maybe using the ImportSymbolsScript). My current version > is attached if you want to have a look. Ah, it's nice. I will have a look. > Yes, i know them. But they are for importing TOS binaries, that does not > help for the ROMs. There is also this script: https://github.com/czietz/ghidraScripts_for_Atari/blob/master/ImportAtariTOSROM.py It does the initial memory setup, then it *does* load symbols from an external file. I haven't tried it, though. >>To replace a Line-F instruction with "dw" (a.k.a dc.w) by pressing 'T' >>and assigning the type "word". > > Yes but there is currently one major problem: if you do that, the decompiler > will no longer stop at that instruction. No, this doesn't happen for me. In the decompilation window (the C one), I get: /* WARNING: Bad instruction - Truncating control flow here */ halt_baddata(); It seems that the disassembly and decompilation window are completely different stuff. My '$' macro only acts on the disassembly window, this has no effect to the decompilation (alas). You may have different results due to your custom 68000 definition. Note that most "F" opcodes are invalid, so this stops the decompilation. But if you managed to convince Ghidra that they are instructions (not data), then sure, the result would be different. Another case would be Line-F opcodes which are valid FPU instructions. I haven't checked that case. > But it does not find the function return statement either, so it will > continue to the end of the ROM (or if it runs into some other data that > cannot be disassembled). That takes ages. Again, I don't encounter this behaviour on standard Ghidra, because "halt_baddata()" happens as soon as a Line-F opcode is found. > I've also tried to match the linefcalls, but without much success yet. See > https://github.com/NationalSecurityAgency/ghidra/pull/7126#issuecomment-2572492219 > Maybe you want to try that if your are more familiar with ghidra. Fine. I will have a look. Note that my '$' script is just a workaround to ease browsing of the disassembly. Of course, the right solution would be what you tried in that Issue: properly implement Line-F opcodes as instructions. Theoretically, they should just act like "jsr" and "rts". But convincing Ghidra to do that is another story. -- Vincent Rivière |
From: Thorsten O. <ad...@th...> - 2025-01-08 09:08:05
|
On Mittwoch, 8. Januar 2025 09:49:17 CET Vincent Rivière wrote: > Note that most "F" opcodes are invalid, Yes, luckily. The more problematic ones are those which happen to be valid FPU instructions, because in that case, the disassembler may take one ore more words to decode the <ea>, instead of treating them as new instruction. But that happens only in a few cases. It can be avoided if you use a custom variant, which has all the FPU instructions disabled. >Theoretically, they >should just act like "jsr" and "rts". But convincing Ghidra to do that is >another story. Yes, that was the idea. Translating the offsets to function names could be done later by a script, but for that we have do know them first ;) |
From: Vincent R. <vin...@fr...> - 2025-01-08 10:00:00
|
On 08/01/2025 at 10:07, Thorsten Otto via Freemint-discuss wrote: >>Theoretically, they >>should just act like "jsr" and "rts". But convincing Ghidra to do that is >>another story. > > Yes, that was the idea. Translating the offsets to function names could be > done later by a script, but for that we have do know them first ;) The key point is to add a reference from the opcode to the target address. What I did with the '$' script. Then as soon as a label is added or renamed at the target address, it will be displayed near all the referring opcodes. I guess that a proper opcode instruction decoding should do the same: - First, automatically add a reference to the target address, using the same method as the '$' script. - Then properly implement the PCode operation (in InstructionPrototype instance?) to materialize the jsr/jmp behaviour for decompilation. I'm not familiar with that stuff, though. -- Vincent Rivière |
From: Thorsten O. <ad...@th...> - 2025-01-09 12:59:39
|
I've pushed the ghidra archive now to my TOS_100 branch, in the ghidra directory. Please let me know if you can import it. AES functions should now be complete, as well as most BSS symbols from AES. There are also text files for the symbols, and the structure definitions, just in case. What i recently noticed is, that the jmptable of the switch statement in osif() is somewhere in the middle of the ROM. That worries me a bit, normally such tables go into the data segment and are linked in at the end. I currently have no idea how to achieve that when compiling/linking from source. |
From: Vincent R. <vin...@fr...> - 2025-01-09 21:38:22
|
On 09/01/2025 at 13:59, Thorsten Otto via Freemint-discuss wrote: > > I've pushed the ghidra archive now to my TOS_100 branch, in the ghidra > directory. Please let me know if you can import it. > Unfortunately, it doesn't work with standard Ghidra. You can easily test it yourself, with a standard Ghidra in another directory: This is a fatal error, the project doesn't open. > AES functions should now be complete, as well as most BSS symbols from AES. > > > There are also text files for the symbols, and the structure definitions, > just in case. > Fine! Vincent |
From: Thorsten O. <ad...@th...> - 2025-01-10 05:38:52
|
On Donnerstag, 9. Januar 2025 22:38:12 CET Vincent Rivière wrote: > Unfortunately, it doesn't work with standard Ghidra. You can easily test it > yourself, with a standard Ghidra in another directory: I feared that. Could you try with the patch from https://github.com/disastos/ tos100fr/issues/2? I think that is needed anyway, otherwise the decompiler may do strange things with function parameters. |
From: Vincent R. <vin...@fr...> - 2025-01-09 23:12:28
|
On 07/01/2025 at 21:52, Eero Tamminen wrote: > => Could those TOS address & symbol names be provided as Hatari (= nm > format) symbol files, along with checksums for matching original TOS images? I've started writing such an export script. That's trivial. It will have to be tweaked to produce the expected output format, but basically it's there. https://github.com/disastos/tos100fr/blob/main/ghidra_scripts/ExportLabels.java -- Vincent Rivière |
From: Thorsten O. <ad...@th...> - 2025-01-10 18:19:07
|
On Freitag, 10. Januar 2025 00:12:14 CET Vincent Rivière wrote: > It will have to > be tweaked to produce the expected output format, but basically it's there. I think that should already almost work, except that you need to add the symbol type. For the ROMs, that is easy: you can just treat everything >= 0xfc0000 as text('T'), and everything else as bss ('B') . Maybe the system variables < 0x800 should also be ommitted (IIRC Hatari has them builtin) |
From: Thorsten O. <ad...@th...> - 2025-01-11 08:22:07
Attachments:
Screenshot_20250111_091758.png
|
Found some interesting bug in the meantime. In TOS 1.00, the THEDSK structure (which holds most variables of the desktop) is larger than 32k. The g_screen array (which holds the icons placed on the desktop background) starts at offset 31900, and has 133 items. That makes routines like obj_init (at 0xfdff56) and obj_ialloc (at 0xfe0134) produce code that write into other parts of the structure. Ghidra is also confused by this: |
From: Thorsten O. <ad...@th...> - 2025-01-12 20:42:35
Attachments:
Screenshot_20250112_212919.png
|
Had partly success with my patched processor definition. The decompiler now atleast does not barf on the linef-call anymore. A simple function now looks like this: linefcall(0x708) is the call to get_par(). The strange assignment to obj in line 9 is, because ghidra does not know yet that linefcall() is a function that returns a value in d0. I've also pushed an updated archive. Functions from desktop are almost complete, except for a few which seem to be related to an older implementation of the textfile viewer, and the format dialog. |