Menu

#3 Merge not making progress

open
nobody
None
6
2012-12-31
2012-12-29
WilliamKF
No

On rare occasions, the merge step gets stuck in an infinite loop inside rs_job_drive() where the result of rs_job_iter() keeps coming back as RS_BLOCKED and there is another process that I think it is blocked on which is defunct. Most of the time the code works perfectly, but in about one in ten thousand runs, I get into this failure mode. This is on Centos4.

Discussion

  • WilliamKF

    WilliamKF - 2012-12-31
    • priority: 5 --> 6
     
  • WilliamKF

    WilliamKF - 2013-01-06

    I have more information on how the failure occurs. I am calling popen() to invoke the merge of deltas step but need to abort due to an exception being thrown while processing the deltas being passed to merge. When the exception is thrown, the stack is being unwound, and pclose() is called. This results in the merge hanging in the situation described above with RS_BLOCKED.

    It seems there is a bug in librsync whereby it does not respond correctly to the pipe being closed prematurely.

     
  • WilliamKF

    WilliamKF - 2013-01-06

    I think the following code update resolves this issue where by "(orig_in && orig_out)" in the condition below is instead changed to be "(orig_in || orig_out)" as shown:

    rs_result rs_job_iter(rs_job_t *job, rs_buffers_t *buffers)
    {
    rs_result result;
    rs_long_t orig_in, orig_out;

    orig_in = buffers->avail_in;
    orig_out = buffers->avail_out;

    result = rs_job_work(job, buffers);

    if (result == RS_BLOCKED || result == RS_DONE)
    if ((orig_in == buffers->avail_in) && (orig_out == buffers->avail_out)
    && (orig_in || orig_out)) {
    rs_log(RS_LOG_ERR, "internal error: job made no progress "
    "[orig_in=" PRINTF_FORMAT_U64 ", orig_out=" PRINTF_FORMAT_U64 ", final_in=" PRINTF_FORMAT_U64 ", final_out=" PRINTF_FORMAT_U64 "]",
    PRINTF_CAST_U64(orig_in), PRINTF_CAST_U64(orig_out), PRINTF_CAST_U64(buffers->avail_in),
    PRINTF_CAST_U64(buffers->avail_out));
    return RS_INTERNAL_ERROR;
    }

    return result;
    }

     

Log in to post a comment.