Debugging

xenic
2013-08-14
2013-08-23
  • xenic
    xenic
    2013-08-14

    You mentioned 2 hardcore bugs in a forum at Amigans.net (1 68k & 1 PPC). I haven't been paying attention for a while so tell me what those 2 bugs are and I'll see if I can find the problem.

     
    Last edit: xenic 2013-08-14
  • kas1e
    kas1e
    2013-08-15

    @xenic

    68k one is that on which we stop with you 2 months ago with "fancy characters in the name of windowses of 68k version of configopus.module".

    ppc one: is that some 68k modules crashes when used with native bin/library, but that one seems can be unsolvable at all, and we need to make everything native. I get a pretty good explains from Thore and SBA, and in brief: ftp.module and filetype.module (those which crashes when they in 68k form) both uses some "direct callbacks" , which can't works with 68k->ppc combo, and only solutions is to make them native. Another modules do not use direct callbacks so they just works. There is relevant part from Thore:

    Ftp plugin gets a callback function passed in the variable "func_callback" of the function L_Module_Entry(). If a PPC native DOpus calls L_Module_Entry() of an 68k ftp.module then the passed in func_callback will point to PPC code, but the 68k ftp.module will treat it as 68k code which if course must crash. This also explains why everything works with both 68k DOpus and 68k ftp.module.
    
    The same done for filetype.module.
    
    Other plugins are not affected by this cross call stuff, because they get something different passed in the same 68k register or don't use this parameter at all.
    
    Please note that cross calls work in one direction only. Only PPC code can call 68k code directly using the EmulateTags() function, but the opposite way cannot work. Library functions being called through the jump table are a different story. Here we are talking about jumping to callback functions directly.
    
    To put it short: don't mix different binaries if direct calls of callback functions are involved. It cannot work.
    

    I also get a mail from SBA, and he says almost the same:

    Calling ppc hooks from 68k even bypassing the utility.library is supported on OS4 but then it must follow the ABI (three parameters put in the proper registers etc.). Vice versa (e.g., calling 68k hooks from ppc) is also supported though it is required that the ppc code only use utility.library for calling the hook. For any another types of function pointers special care has to be taken (special trampolines and the ppc code needs to be aware that the function is called only from 68k). Depending on how the pointer is propagated, it may be even impossible to find solution that just works.
    
    It is impossible to resolve general function pointers calling at emulation level as 68k has two register types (or puts the args on the stack for certain compilers) and the ppc only one.
    

    I.e. that its just impossible to make some general way, when every old 68k module will works on ppc when hooks/callbacks attached. What we have now will works almost with everything, but if module is use those direct callbacks/hooks, then it will not works.

    At leas that how i understand it.

    So we will need all modules native, and with release saying "on os4/mos 68k modules also will works in most cases, except the situation when modules use direct callbacks, but there is almost none of those modules, and from 20 modules done originally with dopus5 there was only 2 which affected".

    So for now imho 68k one worth of your worrying, + fixing of as much warnings as your free time will allow in all the parts (library/program/ported_modules).

     
    • Hi!

      Am 15.08.2013 11:10, schrieb kas1e:

      It is impossible to resolve general function pointers calling at
      emulation level as 68k has two register types (or puts the args on the
      stack for certain compilers) and the ppc only one.

      I.e. that its just impossible to make some general way, when every old
      68k module will works on ppc when hooks/callbacks attached. What we
      have now will works almost with everything, but if module is use those
      direct callbacks/hooks, then it will not works.

      Theoretically, you can workaround the problem as the API is known to
      Dopus (it is just not known to the emulator). The solution would be to
      have 68k trampoline entries for the callback function that then
      translate the arguments of the call to one that match the PPC code.
      Currently, I'm not sure if there is a trap mechanism implemented but as
      68k->ppc Hooks work, it should be possible to use them at gate, if there
      is no direct way.

      Obviously, this must be done only if the module is 68k and works as long
      as the module doesn't pass this again to other modules that are PPC.

      Bye
      Sebastian

       
  • kas1e
    kas1e
    2013-08-15

    Btw, i also get mail from Tony Willen (winaue author), he says he will have a look at that 68k problem, if we can describe it to him very well.

     
  • kas1e
    kas1e
    2013-08-15

    @All

    Sebastian wasn't aware of that topic, so he send answer via mail again, but with his agreement i repost it here (so we all can continue to discuss it here):

    Theoretically, you can workaround the problem as the API is known to Dopus (it is just not known to the emulator). The solution would be to have 68k trampoline entries for the callback function that then translate the arguments of the call to one that match the PPC code. Currently, I'm not sure if there is a trap mechanism implemented but 68k->ppc Hooks work, it should be possible to use them at gate, if there is no direct way.

    Obviously, this must be done only if the module is 68k and works as long as the module doesn't pass this again to other modules that are PPC.

    -->8--

    Basically, it has to work as it is possible to call entries of PPC libraries also from 68k. So everything that one has to write are some traps (they can be generated via idltool). As said, I will take a look at it.

     
    Last edit: kas1e 2013-08-15
  • xenic
    xenic
    2013-08-16

    @kasle
    The garbled text problem in the configopus.module is located in the L_Config_Menu() function in configopus_module/lister_menu.c and is yet another A4/A5 arguments problem. It can be fixed in one of 2 ways:

    1. Add an assembler hack in lister_menu.c that stuffs stuffs the A4/A5 arguments into temporary global variables before calling the actual Config_Menu() function. I've already done that and am attaching the file for you to compile (for OS3) and test.

    2. (Requires multiple changes):
      a) Change the L_Config_Menu() function in lister_menu.c to use a structure to hold the A3/A4/A5 arguments and pass them in the A3 register.
      b) Change the Config_Menu() function calls in Program/misc_proc.c to match the changed Config_Menu() function.
      c) Remove the Config_Menu() assembler function from Program/68k_asm_a4_a5.c.
      d) Add the required structure to configopus_module/configopus.h and Include/Libraries/configopus.h.
      e) Add or change the Config_Menu() definitions in all the Include files (clib, defines, fd, inline, inline4, interfaces & ppcinline).
      f) There may be other changes required that I haven't thought of.

    I'm reluctant to apply the second method because I can't test the result of those changes with MOS or AROS. The first method only affects 68k compile. Here is an archive of the changed configopus_module/lister_menu.c for you to compile and test. Let me know if it works before I commit it.

     
    Last edit: xenic 2013-08-16
    Attachments
  • kas1e
    kas1e
    2013-08-17

    @Xenic
    It works ! At least i test it with native build of library/binary on os4, just with 68k version of config.module with your asm-fix: and when i go to the "hotkeys", then no more garbled text on the top of window (without asm-fix, there is garbled text).

    Also no more garbled text in the "Script" window. So seems your fix did the trick ! Cool !

    Through, i found another bug in configopus.module :( Dunno to what it related, but : when you use native version of library and binary, and 68k version of configopus.module, and then just go to "Settings/Environment" and close it via close gadget, then it crashes with such stack trace:

    Stack trace:
    (0x644ACC00) native kernel module kernel+0x000596f8
    (0x644ACD10) native kernel module kernel+0x00059838
    (0x644ACDB0) _ConfigOpus_Config_Environment()+0x1ec (section 1 @ 0x3BC)
    (0x644ACE80) [environment.c:812] environment_procPPC()+0x6cc (section 1 @ 0x82830)
    (0x644ACFE0) native kernel module kernel+0x0005de4c
    (0x644ACFF8) 0x0000FFF8 [cannot decode symbol]
    

    DAR: 00000000

    I.e. DAR point out on null-pointer access. When i build the same configopus.module native, and do the same (i.e. go to "Settings/Environment" and press on close gadget to close it, then no crash.

    Program/environment.c:812, it is running of Config_Environment(). Maybe, again the same problem with registers ?

    I also tested original 68k-sasc version of configopus.module on native library/binary, and there is no crash when i close config-env-window. Seems again the same problem can be just in different function ? Maybe we also can 'patch' it in that way ?

     
    Last edit: kas1e 2013-08-17
  • xenic
    xenic
    2013-08-17

    @kas1e
    My last 2 updates from the repository were massive. bszili has been changing a lot of files so I will need to diff my changes against the latest file from the repository before committing the changes. The advantage of the ASM changes is that it only affects the OS3 compile. Once everything is working on all platforms then someone can change the Config_Menu() function to use arguments that will work the same on all platforms. I added a warning above the current change so we'll be reminded that it needs a permanent fix.

    It looks like Config_Environment() is another A4/A5 register args function so I'll see if the same type of fix works for that bug.

     
  • xenic
    xenic
    2013-08-18

    @kas1e
    I added the same ASM fix to the config_module Config_Environment() function. It seems to fix any crashes when opening the config window with OS3 program, library and modules. I compiled and tested OS4 program/library/modules and the environment window doesn't crash there either.

    However, when I use the OS3 configopus.module with OS4 program/library, opening the environment window crashes. That must be a different problem.

     
  • kas1e
    kas1e
    2013-08-19

    @xenic

    However, when I use the OS3 configopus.module with OS4 program/library, opening the environment window crashes. That must be a different problem.

    Mm.. Are you sure ? I tried right now: os4 native libary, os4 native program, but latest 68k configopus.module : hard reboot, run dopus5, settings/environment, press on close gadget : no crash. Maybe you just forgot to hard-reboot and old module was in memory ?

    I.e. for me it looks like that you fix all the problems in configopus.module. Maybe it is worth to fix all others parts in configopus.module where a4/a5 functions is in use ? Because there is 2 more: Config_EditFunction() and ShowPaletteBox(). If previous 2 cause such errors , i assume those 2 will do that as well and we need to add your asm-fixes there too.. What you think ?

     
    Last edit: kas1e 2013-08-19
    • xenic
      xenic
      2013-08-19

      @kas1e

      Mm.. Are you sure ? I tried right now: os4 native libary, os4 native program, but latest 68k configopus.module : hard reboot, run dopus5, settings/environment, press on close gadget : no crash. Maybe you just forgot to hard-reboot and old module was in memory ?

      Actually, none of the 68k modules work with my OS4 program/library anymore. I don't know if it's because I switched from a SAM to an X1000 or what. I'm not worried about it since there is no reason to use 68k modules with OS4 anymore.

      I.e. for me it looks like that you fix all the problems in configopus.module. Maybe it is worth to fix all others parts in configopus.module where a4/a5 functions is in use ? Because there is 2 more: Config_EditFunction() and ShowPaletteBox(). If previous 2 cause such errors , i assume those 2 will do that as well and we need to add your asm-fixes there too.. What you think ?

      Sometimes the A4/A5 problem doesn't show up because the arguments in A4/A5 aren't being used. However, I'll go ahead and add the ASM fix to the Config_EditFunction() and ShowPaletteBox() functions in the next few days.

       
      • kas1e
        kas1e
        2013-08-19

        @xenic
        if is3 modules stop working on native library/binary, then my bet on 99% : you forget to copt ppc-glue stubs (those m.main). and them all should be in libs directory, not in modules as modules itself )

         
        • xenic
          xenic
          2013-08-20

          @kas1e

          if is3 modules stop working on native library/binary, then my bet on 99% : you forget to copt ppc-glue stubs (those m.main). and them all should be in libs directory, not in modules as modules itself )

          You're right, yet another senior moment. Thanks for reminding me.

           
          Last edit: xenic 2013-08-20
  • kas1e
    kas1e
    2013-08-19

    @Xenic

    I also found another "Garbled text" error. To reproduce:
    -- run dopus5
    -- just select any executable file
    -- press RMB and hold till menu will not spawns with all those "Open, Open with, Information and so on".
    -- choice "Open With". And you will see that fields "Drawer" and "File" filled out by fancy characters.

    Dunno to what it related .. seems like something with string functions maybe .. or something with how stuff passed to ASL ..

    We talk with Biro about , and he found that "Open With" entry should builds only for files which has WBPROJECT. I.e. in Program/backdrop_popup.c we have:

        // Open with for projects
        if (object_type==WBPROJECT && (!object || object->type!=BDO_APP_ICON))
        {
            // Build OpenWith menu
            popup_build_openwith(menu);
        }
    

    I.e. it should be in option only for Project tooltypes. But now its here for all the tooltypes (drawers, scripts, executables) and , together with it, bring garbled characters in Drawer and file fields.

    Also, before bring an ASL requester, AOS4 also bring a warning window with words:

    A process called a DOS function with a buffer that was too-small.
    Supplied Size: 1002:
    Requesred Size: 1858
    Process: dopus_open_with"
    Function "ParsePaternNoCase()"
    You can recieve this message because the application has accessed a system function in a way that is contraty to normal operating sustem guilines. This aplicaiton may oir may not actulaly perfomr as intended.
    

    Do not know what it mean, but seems there is some bug which prevent to detect right object_type (seems so).

    On i386-AROS for example that part works as intended: "open with" menu appears only for "Project" tooltypes, as well as no garbled text in the fields Drawer and File.

     
    Last edit: kas1e 2013-08-19
  • xenic
    xenic
    2013-08-20

    @kas1e

    I also found another "Garbled text" error.

    Fixed. Committed revision 485.

    We talk with Biro about , and he found that "Open With" entry should builds only for files which has WBPROJECT. I.e. in Program/backdrop_popup.c we have:
    // Open with for projects
    if (object_type==WBPROJECT && (!object || object->type!=BDO_APP_ICON))
    {
    // Build OpenWith menu
    popup_build_openwith(menu);
    }
    I.e. it should be in option only for Project tooltypes. But now its here for all the tooltypes (drawers, scripts, executables)

    The problem isn't in that code. There is code elsewhere in the program that identifies OS3 executable files. It doesn't recognize OS4 executables so it treats them as project files. Try clicking on an OS3 program and see what the menu shows. It's possible that we can add an '#ifdef' somewhere so it will recognize OS4 executables instead of OS3 executables in the OS4 version of the program.

     
  • kas1e
    kas1e
    2013-08-20

    @Xenic

    Fixed. Committed revision 485.

    Confirmed ! Nice elegant fix :)

    The problem isn't in that code. There is code elsewhere in the program that identifies OS3 executable files. It doesn't recognize OS4 executables so it treats them as project files. Try clicking on an OS3 program and see what the menu shows. It's possible that we can add an '#ifdef' somewhere so it will recognize OS4 executables instead of OS3 executables in the OS4 version of the program.

    Well, i also tryed to do it on the .info files, on the images, on anything which is not Project and not os3 binary : its still build Open With entry in menu.. But anyway, i think we can skip that part for now and put it to TODO for next releases, as it not something really problematic. Its even better when you can do "OpenWith" for anything. Looks just like more possibilitys :)

    So .. For now we left with last un-working fully native (on all 4 oses) module called "ftp.module". Everything more or less works in it, whole network code also works, just there is some bugs named : we can't copy/move from ftp listers to local lister (not by buttons, not by d&d), as well as it crashes when we trying to connect to anything which return "no hostname found" or so. Check plz that thread for more details:
    https://sourceforge.net/p/dopus5allamigas/discussion/dev/thread/bf9076c5/

    Maybe you will have ideas about as well..

     
    Last edit: kas1e 2013-08-20
  • xenic
    xenic
    2013-08-20

    @kas1e
    One thing I have a problem with is your use of the word "native". Could you start specifying "native 68k", "native OS4", "native MOS" etc. If bszili hasn't fixed the ftp.module by the time I return from Fall vacation, I'll have a look then. Meanwhile, I'll add the ASM fixes in configopus.module.

     
  • kas1e
    kas1e
    2013-08-20

    @Xenic

    Could you start specifying "native 68k", "native OS4", "native MOS" etc.

    Yep no problems.

    If bszili hasn't fixed the ftp.module by the time I return from Fall vacation, I'll have a look then. Meanwhile, I'll add the ASM fixes in configopus.module.

    Right, good. I will start to prepare release archive and test all the builds on all platforms to see if we miss something.

     
  • xenic
    xenic
    2013-08-21

    @kas1e
    I added the ASM fix to the ShowPaletteBox() function. Committed revision 498. However, Config_EditFunction() only uses the A4 register and when I disassembled the edit.o file it looks like it is handling the A4 register properly. I'm going to leave Config_EditFunction() alone for now.

    While I was testing OS3 & OS4 versions after changing the ShowPaletteBox() function I noticed that the palette box doesn't work in the OS4 version. I tested the previous OS4 version and my OS3 ShowPaletteBox() changes have nothing to do with the problem in the OS4 version. The problem is easy to reproduce:
    Select the "Settings/Environment..." menu item.
    Click on Palette in the left lister of the Environment window.
    In OS3 version the palette box displays colors.
    In OS4 version the palette box is disabled (ghosted).

    My test used all OS3 binaries for OS3 test and all OS4 binaries for OS4 test.

     
  • xenic
    xenic
    2013-08-21

    @kas1e

    Right, good. I will start to prepare release archive and test all the builds on all platforms to see if we miss something.

    I know you're anxious to make a release but I think it will take quite some time before all bugs are fixed and all versions (OS3, OS4, MOS, AROS) are ready. It wouldn't hurt to have a release archive organized and ready though. It would probably be a good idea to update all the version strings in the program, library and modules to avoid confusion (especially for the OS3 compile).

     
    Last edit: xenic 2013-08-21
  • kas1e
    kas1e
    2013-08-22

    @xenic

    The problem is easy to reproduce:
    Select the "Settings/Environment..." menu item.
    Click on Palette in the left lister of the Environment window.
    In OS3 version the palette box displays colors.
    In OS4 version the palette box is disabled (ghosted).

    Uhm, you are right :( Its like nothing reads related to palette.. Can be again something like signed/unsigned moments .. There is a lot of warnings when we build configopus.module, can be anything as usual.. Expectually when we compile palette related files, there is a lot "cast from pointer to integer of different size". I also recheck all amigaos4 ifdefs in configopus.module, in case i somewhere do some typo, but nope, looks ok imho ..

    We with Biro test it on AROS (he already have working configopus.module too), and it seems ok: http://s1064.photobucket.com/user/BSzili/media/AROS/dopus7.png.html

    But on morphos, its just crashes when i press on "Palette" entry ! I assume bug are the same for both os4/mos, but while for us on os4 it ghosted and somehow skips, on morphos it bring crashes.

    I know you're anxious to make a release but I think it will take quite some time before
    all bugs are fixed and all versions (OS3, OS4, MOS, AROS) are ready.

    Yep, there is all the time something new which need to fix .. But at last now its all really start to work, just some bits there and there.

    It wouldn't hurt to have a release archive organized and ready though.

    Yes, just to have something ready with only bins/library/modules changed.

    It would probably be a good idea to update all the version strings in the program, library and modules to avoid confusion (especially for the OS3 compile).

    I think the same before, we for sure need to change all the versions everywhere on something like "5.90" maybe for first release..

     
    Last edit: kas1e 2013-08-22
  • xenic
    xenic
    2013-08-22

    @kas1e

    But on morphos, its just crashes when i press on "Palette" entry ! I assume bug are the same for both os4/mos, but while for us on os4 it ghosted and somehow skips, on morphos it bring crashes.

    When there is a crash, you can work back from the crash location to find the problem. Since the OS4 version is disabled (ghosted), it's hard to tell where to look. I still haven't found where the OS4 version is being disabled. It might help if I knew where MOS version crashes. Do you have a stack trace for the MOS crash?

     
  • kas1e
    kas1e
    2013-08-23

    @xenic

    It might help if I knew where MOS version crashes. Do you have a stack trace for the MOS crash?

    Crash point out on the begining of _palette_slider_callback() from configopus.module (and then crashes in the L_UpdateGadgetValue from library, which crashes in L_AddObjectList from library as well).

    I also make a full-blown debug_log with everything enabled on mos, so maybe some more info will be there (through, i think its all will be irrelevant, as it just that palette_slider_callback() should be called now via something like our REFCALL macroses or so):
    http://kas1e.mikendezign.com/misc/dopus5/modules_bug/configopus_palette_morphos.txt

     
    Last edit: kas1e 2013-08-23
  • xenic
    xenic
    2013-08-23

    @kas1e

    I also make a full-blown debug_log with everything enabled on mos, so maybe some more info will be there (through, i think its all will be irrelevant, as it just that palette_slider_callback() should be called now via something like our REFCALL macroses or so):

    I think you may be right. All the gadgets in the OS4 palette window work on OS4 except the palette color display. I need to find where the palette colors are disabled and that could take a lot of time.