#8 Fileio: a hidden trouble for multi-thread

open
nobody
None
5
2007-07-30
2007-07-30
XU Huaxing
No

[root@lkp-tulsa01 work]# for i in 1 2 3 4 5 ; do echo
"3">/proc/sys/vm/drop_caches; sysbench --test=fileio
--file-test-mode=rndrw --num-threads=32 --file-total-size=800M
--max-requests=15000 --max-time=3000 run; done
sysbench v0.4.8: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 32

Extra file open flags: 0
128 files, 6.25Mb each
800Mb total file size
Block size 16Kb
Number of random requests for random IO: 15000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
FATAL: Failed to fsync file! file: 0 errno = 22 ()
Done.

Operations performed: 9009 Read, 5991 Write, 19202 Other = 34202 Total
Read 140.77Mb Written 93.609Mb Total transferred 234.38Mb
(7.2741Mb/sec)
465.54 Requests/sec executed

Test execution summary:
total time: 32.2206s
total number of events: 15000
total time taken by event execution: 401.1199
per-request statistics:
min: 0.0000s
avg: 0.0267s
max: 0.3581s
approx. 95 percentile: 0.1077s

Threads fairness:
events (avg/stddev): 468.7500/63.91
execution time (avg/stddev): 12.5350/1.09

Discussion

  • XU Huaxing
    XU Huaxing
    2007-07-30

    a patch for this bug

     
    Attachments
  • XU Huaxing
    XU Huaxing
    2007-07-30

    Logged In: YES
    user_id=1856321
    Originator: YES

    With the flag --debug=on, we can see the debug information. The important informations are listed below:

    DEBUG: Executing request, operation: 3, file_id: 126, pos: 0, size: 0
    DEBUG: Executing request, operation: 3, file_id: 128, pos: 0, size: 0
    DEBUG: Executing request, operation: 3, file_id: 129, pos: 0, size: 0
    FATAL: Incorrect file discovered in request
    FATAL: Failed to fsync file! file: 0 errno = 22 ()

    22 is EINVAL which means the parameter is incorrect. The parameter of fsync(fd) is incorrect. We have only 128 files(with the file_id 0 to 127), but here some threads tried to access the files with the file_id 128 and 129.
    The get_request function may return a request containing a wrong file_id!
    We think that the error occurs because the file_get_rnd_request function in sb_sileio.c is unsafe for multi-thread.
    The unsafe part is:

    /*
    is_dirty is only set if writes are done and cleared after all
    files are synced
    */
    if(file_fsync_freq != 0 && is_dirty)
    {
    if (req_performed % file_fsync_freq == 0)
    {
    file_req->operation = FILE_OP_TYPE_FSYNC;
    file_req->file_id = fsynced_file;
    file_req->pos = 0;
    file_req->size = 0;
    fsynced_file++;
    if (fsynced_file == num_files)
    {
    fsynced_file = 0;
    is_dirty = 0;
    }
    return sb_req;
    }
    }

    We tried to use the mutex to solve this problem just like another part in the same function.
    Mutex is a big lock which might hurt the performance. So we modified the nested if-if structure to limite the mutex in the first if(file_fsync_freq != 0){} structure.
    Detailes of our solution can be found in the attachment (sysbench.patch).

    *************************************************************************************
    ZHUO Yue
    School of Software, Shanghai Jiao Tong University, 200240 Shanghai, China
    lingyer@sjtu.edu.cn

    XU Huaxing
    School of Software, Shanghai Jiao Tong University, 200240 Shanghai, China
    zsdy@sjtu.edu.cn

     
  • XU Huaxing
    XU Huaxing
    2007-07-30

    • summary: Fileio: an hidden trouble for multi-thread --> Fileio: a hidden trouble for multi-thread