From: Rashit A. <ras...@ya...> - 2015-05-14 12:06:00
|
Hello! We're using 2.0.60 version of Moosefs and in some rare cases have problem opening file for writing stored on Moosefs file system. This happens just after the deletion of this file and I believe this error is somehow connected with dircache. We have following code: void File::Create(const char *Name) throw (std::exception) { c_name = Name; /*удаление файла*/ HSDelete(c_name); /*создание файла*/ ff = fopen(c_name, "w"); if (ff == NULL) { int errnum = errno; throw FR::Exception(errnum, "File::Create(%s)\ncannot CREATE file !!!", c_name.txt()); } } int HSDelete(const char *name) { int ret; char * path = strdup(name); path = dirname(path); DIR *dir = opendir(path); if (dir) { Diags::Debug("HSDelete: scanning directory %s", path); struct dirent *de = 0; do { de = readdir(dir); } while(de); closedir(dir); } else Diags::Error(errno, "HSDelete: Cannot open directory %s", path); Diags::Debug("HSDelete: deleting file %s", name); ret = unlink(name); if (ret != 0 && err != ENOENT) Diags::Error(errno, "HSDelete: Cannot delete file %s", name); return ret; } This code is executed on different files in parallel and in about 10% cases fopen returns 'No such file' error This is how it seen in oplog (for file data/aprelevskoe/Lines/bl7_1_I_217/DepthModels/TEST_FILE_SYS/depthMigration.mig): 05.06 19:07:22.289334: uid:1002 gid:1000 pid:15476 cmd:unlink (3046656,depthMigration.mig): OK 05.06 19:07:22.291324: uid:1002 gid:1000 pid:15476 cmd:lookup (1,data): OK (0.0,110,1.0,[drwxrwx---:0040770,69,1002,1000,1430927232,1430843190,1430843190,4041897]) 05.06 19:07:22.293070: uid:1002 gid:1000 pid:15476 cmd:lookup (110,aprelevskoe): OK (0.0,4173806,1.0,[drwxrwxr-x:0040775,12,1001,1000,1430927234,1429271510,1429271510,4000266]) 05.06 19:07:22.294813: uid:1002 gid:1000 pid:15476 cmd:lookup (4173806,Lines): OK (0.0,4173818,1.0,[drwxrwxr-x:0040775,1566,1001,1000,1430837413,1428610404,1428610404,3011673]) 05.06 19:07:22.296328: uid:1002 gid:1000 pid:15476 cmd:lookup (4173818,bl7_1_I_217): OK (0.0,4340334,1.0,[drwxrwxr-x:0040775,7,1001,1000,1429999627,1426953017,1426953017,2023511]) 05.06 19:07:22.297866: uid:1002 gid:1000 pid:15476 cmd:lookup (4340334,DepthModels): OK (0.0,4340341,1.0,[drwxrwxr-x:0040775,5,1001,1000,1429999627,1430928238,1430928238,2016702]) 05.06 19:07:22.299618: uid:1002 gid:1000 pid:15476 cmd:lookup (4340341,TEST_FILE_SYS): OK (0.0,3046656,1.0,[drwxrwxr-x:0040775,3,1002,1000,1430928442,1430928442,1430928442,2000210]) 05.06 19:07:22.299721: uid:1002 gid:1000 pid:15476 cmd:lookup (3046656,depthMigration.mig) (using open dir cache): OK (0.0,3088005,1.0,[-rw-rw-r--:0100664,1,1002,1000,1430928276,1430928303,1430928303,2207520]) 05.06 19:07:22.301303: uid:1002 gid:1000 pid:15476 cmd:open (3088005): ENOENT (No such file or directory) Looks like unlink doesn't invalidate dircache entry and lookup returns cached result, which is no longer exist. We recompiled moosefs client binary with disabled dircache and it eliminates problem, but has some perfomance costs. We haven't yet tested latest 2.0.67 version, do you believe that bug can be fixed in it? |