Menu

ftp.module : almost done, need last bug-fixing

kas1e
2013-08-16
2013-10-11
  • kas1e

    kas1e - 2013-08-16

    @all

    Ok, with big fat help from BSZili we almost done with native ftp.module. For now, it compiles for os4, mos and aros (for os3 need to install necessary network SDK where all those proto/socket.h present and so on).

    There was a lot of changes in summ to make it works on gcc/ng:

    • rewrite all to SDI
    • make it ABI compatible
    • new macroses for calling correctly 68k-hook-structures (now uses in themes.module and ftp.module)
    • get rid of as255 fully
    • all the necessary bsdsocket.library changes
    • getting rid of old aligned attribute via D_S macro
    • and bunch of other stuff, everything in Modules/ftp/ now.

    As first test, to be sure that everything is ok with native ftp.module itself , we do tests on OS4 till it will not works properly. So, for now we can login to ftp (let's say aminet), we can browse files, we can dbl-click on them, and they will downloads and shows. We also can dbl-click on let's say .lha archive , it also will downloads, and go inside with new lister (if we use xadopus.module or those arc-arexx scripts).

    Now to the problems which we need to fix.

    -1--

    D&D of files from ftp-listers to any local listers do not works. Didn't crashes, just keep silence. On serial through it thrown such info when i try to d&d something from ftp to local lister:

    ** trapped 'dropfrom'
    check alive
    check network
    check network this site
    LOG: --> NOOP
    
    lister_xfer()
    LOG: 200 NOOP command successful
    

    -2--

    If use debug.kernel on os4 with all those "munge" and co, and we choice a file on ftp and press button at top bar for "copy" (or for move, whatever), then it crashes heavy with such stack trace:

    Stack trace:
    (0x6444EF30) [function_paths.c:177] function_path_end()+0x58 (section 1 @ 0x62188)
    (0x6444EF40) [function_run.c:250] function_run()+0x658 (section 1 @ 0x64650)
    (0x6444EFA0) [function_run.c:40] function_run_function()+0x94 (section 1 @ 0x646F8)
    (0x6444EFB0) [function_launch.c:393] function_launch_codePPC()+0x174 (section 1 @ 0x61E20)
    (0x6444EFE0) native kernel module kernel+0x0008d98c
    (0x6444EFF8) 0x0000FFF8 [cannot decode symbol]
    

    DAR shows: 0xCCCCCCCC , what mean again can be that problem like we have before with filetype.module (0xCCCCCCC mean that we have tried to free a Node a second time).

    I assume that D&D can not works because of the reasons why copy crashes. But can be wrong.

    As i say that crash i only can catch with debug.kernel and munge enabled (that 0xCCCCCCCC is always catches only on debug kernels with munge enabled). With user kernel, pressing on copy/move/copyas/moveass buttons in lister also make nothing (like as it with d&d).

    Also, if i test it on "user" kernel and it didn't crashes (as user kernel can't catch those 2-times-node-free, it thrown to serial such info when i press toolbar button "copy":

    ** trapped 'Copy'
    check alive
    check network
    check network this site
    LOG: --> NOOP
    
    lister_xfer()
    LOG: 200 NOOP command successful
    

    For "move" button:

    ** trapped 'Move'
    check alive
    check network
    check network this site
    LOG: --> NOOP
    
    lister_xfer()
    LOG: 200 NOOP command successful
    

    for "copy as" button:

    ** trapped 'CopyAs'
    check alive
    check network
    check network this site
    LOG: --> NOOP
    
    lister_xfLOG: 200 NOOP command successful
    er()
    

    -3--

    Third bug are different: Just press on "ftp" in the buttons, it will bring a ftp-buttons window, where one of them are "localhost". Pressing on that will spawn a new Lister, and then 2 crashes come one after another.

    Stack trace of first one (in dopus_requester_proc and seems graphics related, more exactly something with rastports):

    Stack trace:
    (0x63ABAC00) [simplerequest.c:581] simple_build()+0xb8 (section 1 @ 0xFCC4)
    (0x63ABACC0) [simplerequest.c:572] simple_build()+0xb0 (section 1 @ 0xFCBC)
    (0x63ABAF40) [requesters.c:337] requester_proc()+0x4a4 (section 1 @ 0x41030)
    (0x63ABAF90) native kernel module dos.library.kmod+0x00023448
    (0x63ABAFC0) native kernel module kernel+0x0006aa5c
    (0x63ABAFD0) native kernel module kernel+0x0006aadc
    

    DAR: 00000000. I.e. in first stack-trace dar point out on NULL-pointer access.

    And then stack trace of second one (in dopus_ftp_lister):

    Stack trace:
    (0x644962D0) [requesters.c:149] L_AsyncRequest()+0x220 (section 1 @ 0x412BC)
    (0x64496350) [requesters.c:123] L_AsyncRequest()+0x1d4 (section 1 @ 0x41270)
    (0x64496360) [ftp_util.c:1336] ftpmod_request()+0x418 (section 1 @ 0x18C10)
    (0x644967E0) lister_request()+0x11c (section 1 @ 0x70C0)
    (0x64496800) [ftp_lister_connect.c:690] lister_connect_and_login()+0x924 (section 1 @ 0x1221C)
    (0x64496D90) [ftp_lister_connect.c:903] lister_new_connection()+0x438 (section 1 @ 0x12CA0)
    (0x64496F00) [ftp_lister.c:3357] lister()+0x54c (section 1 @ 0x7D14)
    (0x64496F90) native kernel module dos.library.kmod+0x00023448
    (0x64496FC0) native kernel module kernel+0x0006aa5c
    (0x64496FD0) native kernel module kernel+0x0006aadc
    

    DAR there point out on 000064C0 (so no NULL pointer) and it can be just side-effect-crash of the first, null-pointer crash.

    It can be original bug, like "non-checking-if-there-is-ftp-port-opened-at-all", but that for sure should be fixed to make all looks robust and clean.

    It also can be something with ABI (like those ASM related functions)..

    To add, it not necessary should be localhost only. You can create any new entry in ftp's address book, just with wrong hostname and then it will crashes.

    I assume, it just crashes when dopus trying to build a window by "simplerequest" with error, and just sucked up. The same, as it was in some other parts where SimpleRequest is involved (like it was in filetype.module, which Xenic fixed back in times). But can be wrong, need usual debugging and bug-hunting.

     

    Last edit: kas1e 2013-08-20
  • kas1e

    kas1e - 2013-09-07

    @All

    Now, i checked rev533, till what BSzili do some cleanup in ftp.module as well. Now, in aos4 version i CAN download files from ftp listers to local listers and most of problems just disappear !

    For example there is no more crashes when we tried to log on localhost (now, normal window with "Cannot log in to localhost (could not connect)", the same for all other "bad" hostnames.

    Also there is no crash when we copy files from ftp lister to local lister via buttons at top bar, or via drag&drop, but ! There is still one problem keeps, and it can be or because of some of our previous replacements of some functions, or because of some still lefts STDARGS based funcs, or because of that VA_END which was missing and no added (maybe that was intended?) or anything else, but anyway, problems is:

    i can't copy 2 files at the same time now. I.e. i mark 2 files from aminet, d&d them to ram: , and while first one copy ok, another one says "sorry you there is no such file". But then, i just try to copy that file standalone, and it copies.

    It seems just after we download a file, something going wrong with buffers (like forgotten null-termination somewhere). For example, if we will just do hard reboot, then run dopus5, go at aminet/dbase/ , there mark AA_30.lha and AA_30.readme, and d&d them, then, ftp.module download first file, and bring us a window for the next one "550 A_30.readme: No such file or directory try again/skip/abort), i.e. there visibly that first character of second file name just "eats", like after first one, there wasn't null-termination in buffers => fail.

    In the log it says:

    ** trapped 'dropfrom'
    check alive
    check network
    check network this site
    LOG: --> NOOP
    
    LOG: 200 NOOP command successful
    
    LOG: --> PORT 192,168,1,6,4,34
    
    LOG: 200 PORT command successful
    LOG: --> RETR AA_30.lha
    
    LOG: 150 Opening BINARY mode data connection for AA_30.lha (371418 bytes)
    LOG: 226 Transfer complete
    LOG: --> PORT 192,168,1,6,4,35
    
    LOG: 200 PORT command successful
    LOG: --> RETR A_30.readme
    
    LOG: 550 A_30.readme: No such file or directory
    ** get src err
    

    After that happens, all sort of weird things can happens when we try to download/rename/move files from ftp lister to local one. Names are fucked pretty much by all sort of fancy characters, etc.

    Also as far as i can see, pressing on "Aminet" button in the FTP button bank do nothing, its only throw me at serial:

    mod_connect()
    module connect done (0)
    

    while on that button by default dopus5 have: Command FTPConnect aminet.net DIR.

    Its just do the same as if i pres on FtpConnect button, where i can write all my data, but without window.

     

    Last edit: kas1e 2013-09-07
  • BSzili

    BSzili - 2013-09-07

    If you remember I ended up reverting my function replacements. Also VA_END is necessary, it might not do anything on some platforms, but you can end up with memory leaks on others.

     
  • kas1e

    kas1e - 2013-09-07

    @BSzili
    Right.. maybe something with that lsprintf/rawdofmt changes ? it for sure looks like some buffer not null terminited..

     
  • BSzili

    BSzili - 2013-09-08

    I doubt a function which opens an error requester has much to do with file transfer. If anything, I might have forgotten to change a strncpy back to stccpy.

     
  • kas1e

    kas1e - 2013-09-08

    imho nope, as you revert them in rev522, but maybe its strcmpi which wasnt reverted ? will check them all tomorrow

     
  • BSzili

    BSzili - 2013-09-08

    Why on the earth should we revert back to using strcmpi?

     
  • kas1e

    kas1e - 2013-09-08

    Should't we try anything just to make bugs fixed ? do you have any other ideas ? i am not. except maybe that sculd or how it was called. i cant see code till tomorrow normaly, so even if idea with strcmpi make sounds stoopid, i have only for tomorrow: step by step rewer changed funcs and.see when differences start. but if you have any other ideas that for sure will be cool. we can.go tomorrow again that kprinf way, but that skipped first symbols of second buffer looks like real non null termination somwhere (and it arise when we change funcs)

     

    Last edit: kas1e 2013-09-09
  • BSzili

    BSzili - 2013-09-08

    I'm not sure how random guesswork is supposed to fix bugs. LSrintF is used in a single function for displaying an error requester, and strcmpi / stricmp is used to compare strings. How are these related to the null termination of any string? You are free to experiment, but don't expect me to agree with you.

     
  • kas1e

    kas1e - 2013-09-09

    @BSzili

    How are these related to the null termination of any string?

    Null termination it also my random gueswork. It can be not null termination at all. It just looks like this. But it can be easy overwriting of one buffer by another, or some overflow somewhere, or anything else, like some string compare and then wrong if/else somewhere, or some long/ulong char/uchar differences or some non-harmless warnings. It can be anything , and its of course all random gueswork.

    Are you have any other ideas in compare with random gueswork ?

    edit: another random gueswork: maybe something related to "fib" stuff , as it used for filenames with all those FILENAMELEN + 1 , so pretty possible we can miss somewhere to change one of them ?

     

    Last edit: kas1e 2013-09-09
  • BSzili

    BSzili - 2013-09-09

    I'd prefer to trace back to the root of the problem instead of trial and error. I'm not comfortable with the idea that computing is non-deterministic, and changing anything can solve the bug. If that makes closed-minded so be it, but as I said you are free to experiment, prove me wrong. That will be one less bug to take care of.

     
  • BSzili

    BSzili - 2013-09-09

    I won't really be able to go on jabber before 17:00 GMT+1 anymore, because the semester just started, and I'm busy with my studies and office routines (yuck!).
    Note that I was not having a go at you, but I'm literally swamped, and I have to get ftp.module working on AROS.

     
  • kas1e

    kas1e - 2013-09-16

    @all
    Ftp.module fully working now on os4 ! Last bug was because of stptok() function which wasn't close enough to sasc : so we found right one, and bug is gone.

     
  • kas1e

    kas1e - 2013-09-19

    @all
    Found one error in ftp.module (tested os4 version). To reproduce:

    1. spawn adressbook.
    2. create any new entry
    3. press RMB, and choice "save". It will save a file called "ftp_sites" in the dopus5:System directory. All ok.
    4. again change anything in addressbook (add new entry, or edit previous one), and then again RMB and choice "save", and we have:

    Directory Opus Request
    Error Saving File !
    DOS error 205: object not found
    retry/chancel

    Looks like some code didn't close normally file when save or something ?

     
  • kas1e

    kas1e - 2013-10-09

    @BSZili

    Build todays svn (rev733), and ftp.module on os4 didn't shown any content anymore. I.e. i go to address book, dbl-click on any host, it connects, says "reading files" and then show empty lister without files. In the log i have:

    [ftp_main.c:660 handle_ipc_msg] LOG: 200 Switching to Binary mode.
    [ftp_main.c:660 handle_ipc_msg] LOG: --> CWD /pub/amiga/
    
    [ftp_lister_connect.c:590 lister_connec[ftp_main.c:660 handle_ipc_mst_gand_login] connect_cwd -> 250
    [ftp_arexx.c:558 rexx_lst_set_path] rexx_lst_set_path(1876364480)
    [ftp_arexx.c:559 rex] LOG: 250 Directory successfully changed.
    x_lst_set_path] -> '/pub/amiga/'
    [ftp_lister.c:3544 lister_prog_clear] Clear progress
    [ftp_arexx.c:186 rexx_lst_empty] rexx_lst_empty(1876364480)
    [ftp_main.c:2852 handle_rexx] ** trapped 'inactive'
    [ftp_main.c:1512 opus_inactive] opus_inactive(0)
    [ftp_lister_list.c:276 lister_list] lister_list()
    [ftp_arexx.c:293 rexx_lst_lock] rexx_lst_lock()
    [ftp_arexx.c:202 rexx_lst_clear] rexx_lst_clear(1876364480)
    [ftp_arexx.c:558 rexx_lst_set_path] rexx_lst_set_path(1876364480)
    [ftp_arexx.c:559 rexx_lst_set_path] -> '/pub/amiga/'
    [ftp_main.c:660 handle_ipc_msg] LOG: --> PORT 192,168,1,6,1,4
    
    [ftp_lister_list.c:354 lister_list] list command result -1
    [ftp_arexx.c:147 rexx_[ftp_main.c:660 handle_ipc_msg] LOG: 500 Illegal PORT command.
    lst_refresh] Refresh lister - start - [ftp_arexx.c:153 rexx_lst_refresh] end
    [ftp_lister.c:3544 lister_prog_clear] Clear progress
    [ftp_lister_connect.c:698 lister_connect_and_login] lister_connect_and_login returns 1
    

    Also if i just in this state press button "up", then it show me that:

    [ftp_main.c:2852 handle_rexx] ** trapped 'Parent'
    [ftp_lister.c:3192 lister_msg_switch] lister_msg_switch()
    [ftp_arexx.c:293 rexx_lst_lock] rexx_lst_lock()
    [ftp_main.c:660 handle_ipc_msg] LOG: --> CDUP
    
    [ftp_main.c:660 handle_ipc_msg] LOG: 250-==[ pub ]========================================
    [ftp_main.c:660 handle_ipc_msg] LOG: 250-
    [ftp_main.c:660 handle_ipc_msg] LOG: 250- amiga/ - Amiga demo scene archive
    [ftp_lister.c:3544 lister_prog_clear] Clear progress
    [ftp_main.c:660 handle_ipc_msg] LOG: 250-
    [ftp_main.c:660 handle_ipc_msg] LOG: 250-=================================================
    [ftp_main.c:660 handle_ipc_msg] LOG: 250 Directory successfully changed.
    [ftp_lister_list.c:276 lister_list] lister_list()
    [ftp_arexx.c:293 rexx_lst_lock] rexx_lst_lock()
    [ftp_arexx.c:186 rexx_lst_empty] rexx_lst_empty(1876364480)
    [ftp_main.c:2852 handle_rexx] ** trapped 'inactive'
    [ftp_main.c:1512 opus_inactive] opus_inactive(0)
    [ftp_arexx.c:558 rexx_lst_set_path] rexx_lst_set_path(1876364480)
    [ftp_arexx.c:559 rexx_lst_set_path] -> '/pub'
    [ftp_main.c:660 handle_ipc_msg] LOG: --> PORT 192,168,1,6,2,4
    
    [ftp_lister_lis[ftp_main.c:660 handle_ipc_msg] LOG: 500 Illegal PORT t.c:354 lister_list] list ccommand.
    ommand result -1
    [ftp_arexx.c:147 rexx_lst_refresh] Refresh lister - start - [ftp_arexx.c:153 rexx_lst_refresh] end
    [ftp_lister.c:3544 lister_prog_clear] Clear progress
    

    And then crashes with such stack trace:

    Stack trace:
    (0x63DE7F30) [function_paths.c:177] function_path_end()+0x58 (section 1 @ 0x62028)
    (0x63DE7F40) [function_run.c:250] function_run()+0x658 (section 1 @ 0x644F0)
    (0x63DE7FA0) [function_run.c:40] function_run_function()+0x94 (section 1 @ 0x64598)
    (0x63DE7FB0) [function_launch.c:393] function_launch_codePPC()+0x174 (section 1 @ 0x61CC0)
    

    With DAR 0xCCCCCCCC which mean that something wrong with "free a Node a second time"

    Seems all about those PORT changes, which imho rev759. I can't recheck that rev right now, but will do tomorrow if need it.

     

    Last edit: kas1e 2013-10-09
  • BSzili

    BSzili - 2013-10-09

    Committed the fix for the PORT command, the crash is unrelated.

     
  • kas1e

    kas1e - 2013-10-11

    Yep, PORT is fixed, crash is here. I assume on AROS you do tests you just can't catch those "free node" bugs. I can catch them only on debug kernel with "munge" option. I.e. bug for sure there , just dunno how you can reproduce it.. Maybe aros-hosted will segfault on it ?

     
  • kas1e

    kas1e - 2013-10-11

    Right .. In meantime will just make a ticket so we will have all in place

    EDIT: done, ticket #19

     

    Last edit: kas1e 2013-10-11

Log in to post a comment.