Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#257 autom. retry on error after x seconds before asking user

closed
nobody
None
2013-09-19
2012-01-30
André
No

While synching 200 GB(!) with a SAMBA share there were some network errors (semaphore timeout, network path not found) after some hours. FSF then stops and waits for the user to click "repeat". After that click, everything works fine again for some hours. As I cannot locate or fix this network issue at the moment.

It would be nice if FSF handles such situations by itself: On detecting an error it could retry multiple times with some seconds between every retry and log their occurence. If these retries fail, FSF stops and reports the error to the user. This behavior could be switchend on/off in the configuration.

Thank you

Discussion

1 2 > >> (Page 1 of 2)
  • Zenju
    Zenju
    2012-01-30

    For the record, FFS allows to "ignore" errors rather than show a popup, which can be configured in synchronization settings.

     
  • I believe what's being requested is a "number of retries allowed" feature, separate from "ignore errors."

    For example, the default would be 0 retries (i.e. try once and if it fails then popup the message or ignore the error and continue). The user could specify a different number, say 5. It would be best if the user could also specify the delay between retries, but if that seems like too much work to you, use some reasonable hardcoded value such as 2 seconds or maybe 5.

    Now, when FreeFileSync encountered an error, it would go into a "delay-then-retry" loop for the specified number of iterations. If none of them succeed, =then= it pops up the message, or ignores the error and continues with other things, depending on how it was configured.

    If it succeeds before all iterations are used, it continues with the rest of it's actions. If errors are encountered on later items/actions, it again goes into the retry loop, for the same maximum number of iterations. (i.e. if the user selected "5 retries", and an earlier problem took 3 retries, then the new problem still gets up to 5 retries, not 2.)

    I too would really like to see this feature. Without the retry counter, you'd have to re-run the entire sync operation. That's nasty if the error is an intermittent one deep in the middle of the set of things to do.

     
  • André
    André
    2012-02-17

    Yes, johnb_atl is right here. Moreover, the global "ignore errors" setting doesn't work. "The network was path not found" error pops up after some hours up regardless of this setting. With clicking repeat or ignore(!) FFS continues. I guess the system recovers during the timeout (memory, leases, ...). Anyway, it requires human intervention and stops synch for X hours. Increased fault tolerance as described in my or johnb_atl's comment would be a nice feature of FFS.

     
  • André
    André
    2012-02-17

    Sorry, it actually DOES ignore errors without popups. This was wrongly reported to me. However, everything else (timout + retry) would be nice, though.

     
  • Excellent article!

    I too have run into the exact same kind of problem on a project where I was involved.

    I disagree with the author's conclusions, though.
    I think it proves something different than "retries are bad".

    I think the proper conclusions are:
    Don't just BLINDLY use it as a "fix".
    Don't use a retry if another mechanism exists!
    Know WHAT you're doing and WHY.
    Before you proceed, understand what is happening, on a DEEP level.
    * Don't go overboard, limit the number of retries and the delay.

    I think you've got the first 4 items covered -- quite well, probably!
    To handle the 5th item, don't let the user select any more than about 3 tries total.
    And limit the delay to the realm of 0.5 - 2.0 seconds; that's all SMB needs.

    =====

    The real problem is in "SMB" (Windows File Sharing protocol), and the way the Microsoft has implemented it in the client OS.
    Because it's a client problem and a protocol problem, SAMBA's hands are tied and it can't do much better than a Windows Server can.
    I call this the "whoops the drive is gone, no wait, it's there" problem.
    If MS had done the job right, hundreds of developers wouldn't have to resort to the same few kludges.

    I know of only 3 ways to deal with it:

    1. 1 or 2 retries with 0.5-2.0 second delay, when the drive appears to flake out.

    --or--

    1. Detect whether you're dealing with an SMB fileshare and "pre fleep" the share with a dummy access (ignoring any failure) before doing the actual work.

    --or--

    1. Accept the fact that a program that makes some reasonable assumptions about drive availability is going to fail sometimes, and sometimes very badly, because SMB violates those assumptions.

    Approach #2 handles only the most common case, SMB being a little slow to respond properly on the initial uptake. It doesn't handle the occaisional odd flakiness that can happen between dealing with one directory and the next. #1 can handle that, at the expense that it might go into "retry hell" if you're not very careful and limit things.

     
  • Zenju
    Zenju
    2012-05-08

    FFS's current implementation would be 3 then. 2 is out because it pessimistically gives away (a little) performance. FFS generally implements the opposite approach, assume everything works, but recover on failure. This allows for the most efficient implementation if all software layers play nice and do report failures (https://sourceforge.net/p/freefilesync/discussion/help/thread/bec883f8)
    1 may still be an improvement though.

    For v5.3 I just implemented "retry" for RealtimeSync in the form of adding a countdown timer to the modal error dialog which would sort of click for the user on retry after 15 seconds (time left is shown as it ticks out, like "Retry (13 sec)").
    For RTS this seems the perfect match. The tool is designed to run as a background demon, so there really needs to be some form of "retry". What RTS does is very "cheap", i.e. setup a directory monitor. So there isn't even much need to ask the user if he wants "retry" by adding new intrusive GUI options like one new checkbox and two spincontrols to enter the delay and retry-count. I just hardcoded 15 sec and an endless retry in case a failure does not go away. For RTS this hard-coded behavior is unlikely to be a problem, since directory monitoring either fails directly (if it is not supported or configuration or the command line is not correct) or at same later time due to network problems, which are often recoverable. And in the more unlikely case they are not, failing to setup monitoring indefinitely each 15 sec, is not going to hurt the system in any way.

    For FFS the scenario is less specified. In worst case "retry" may copy a 2 GB file again, if there was a failure to apply ACLs at the very end of the copy process. This is nothing one would ever want to run indefinitely. So at the very least there has to be a user-configurable limit on how many retries. But this needn't be exposed on GUI, it's sufficient to place it into GlobalSettings.xml. As a default, I don't know, maybe 2 retries, would be a good number with a delay of 15 seconds (also into GlobalSettings.xml). Not exposing it on GUI a priori, is of course an optimization to not confuse mainstream-users at the expense of advanced/admin users which may then miss to see that FFS has "retry" and can have it configured. But this reduced visibility is mitigated if I apply the same implemenation strategy as in RTS, namely just put a countdown to the modal error dialog. There is just one downside to this "show countdown timer in modal dialog trick:" In the majority of cases the error will not be recoverable, so users may get annoyed if they see the time ticking out, while they are trying to get a clue about the error message.

    Generally the problem with automatic "retry" is that it's a blunt weapon, suitable only to the minority of error situations. On the other hand, I do not know of a better solution to accomodate for unreliable devices like network shares which fail spuriously (and I think there should be some support)

     
    Last edit: Zenju 2013-06-13
  • FireLink
    FireLink
    2013-04-13

    The vote because it might be the case when the network cable is disconnected or reset a router etc. .. Attempts too close could block Win in failed expectations cumulative.
    Or other processes may not yet have freed space on the destination drive.

     
  • I need it too.

     
  • Adelino Araujo
    Adelino Araujo
    2013-06-04

    As I sync over network in a VPN, many times the connection goes down, so along with retry in x seconds, it would so awesome that I could execute a program in y times, within the retry sequence.
    I open the VPN with a rasdial command, and every 10 failed retry, I would love to execute the rasdial command again, for exemple.
    In slow connections, sync takes me over half an hour just to get the file list! And when it fails, it stop's!
    So in the morning I'm very **** off... this is very important for me too.

     
1 2 > >> (Page 1 of 2)