From: SourceForge.net <no...@so...> - 2004-01-19 19:32:51
|
Patches item #871583, was opened at 2004-01-06 10:47 Message generated for change (Comment added) made by vincentdarley You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=310894&aid=871583&group_id=10894 Category: 36. File System Group: None Status: Open Resolution: None Priority: 5 Submitted By: Vince Darley (vincentdarley) Assigned to: Jeffrey Hobbs (hobbs) Summary: filesystem optimization Initial Comment: Here's a patch to cvs HEAD to speed up a lot of the filesystem. Please test (esp Unix), since I've only tested on Windows. ---------------------------------------------------------------------- >Comment By: Vince Darley (vincentdarley) Date: 2004-01-19 19:32 Message: Logged In: YES user_id=32170 Here's a quick attempt to extend the optimisation to 'file extension'. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-19 18:58 Message: Logged In: YES user_id=80530 New patch passes all tests. The slowdowns of [file dirname] and [file tail] are now big speedups! 020 FILE glob / dirname 1.00 0.25 031 FILE glob / tail 1.00 0.36 Next biggest slowdowns (still in this patch) are: 023 FILE glob / extension 1.00 1.30 029 FILE glob / rootname 1.00 1.33 ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-19 17:45 Message: Logged In: YES user_id=32170 Attached new patch, addressing all the issues below, except that we need to rename variables to 'pathObj' in all files except tclPathObj.c (which has now been done). ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-19 12:33 Message: Logged In: YES user_id=32170 I see, your analysis of Tcl_FSConvertToPathType showed a bug in there. In the 'special' FsPathptr case, there's no point in comparing the ->cwdPtr to the cwd, since it has nothing to do with cwds. So, that should fix the slowdown. Will attach patch when sf-cvs starts to work again. ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-19 12:11 Message: Logged In: YES user_id=32170 Ok, a follow-up on all your good comments and questions below: (1) The documentation of FsPath is indeed behind the times (since the first filesystem optimisation). I've updated that in my sources and will attach a patch as soon as sf's cvs servers are up. Comments/advice on how best to name these multi-use object points would be appreciate. (2) Indeed objPtr, pathPtr, pathObjPtr, filenamePtr are all used all over the place and this is annoying and confusing. I think pathPtr is the best, so will begin to convert to that. (3) I've factored the file dir/tail into a common function in tclPathObj.c which should also allow it to be clever in what it returns. (4) For TclFSCwdPointerEquals and *objPtrPtr being set, yes I believe you are right that this would be a sensible step to take. Added this to the next patch. Thanks also for the detailed analysis of TclFSGetPathType and friends. I'll examine that more closely before proceeding. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-15 20:48 Message: Logged In: YES user_id=80530 TclFSGetPathType() especially looks like it could use some attention. It appears that TclGetPathType() is fairly expensive and TclFSGetPathType() is an attempt to avoid it. However, if pathObjPtr is not already a FSPathObj with cwdPtr NULL or up to date, then the Tcl_FSConvertToPathType() call will have to call TclGetPathType() to attempt to make it so. The last call to TclGetPathType is mysterious too. Don't we know when cwdPtr == NULL that we have an absolute path? Or is that another example of the comments being out of date? Anyhow, it appears that in the slow operations, TclGetPathType is getting called twice. The first time from within the patched Tcl_FSConvertToPathType() and the second time when the cwdPtr of the converted pathObjPtr turns out to be NULL. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-15 18:44 Message: Logged In: YES user_id=80530 In the duplicated stanza of code noted before, there is a call to Tcl_FSSplitPath() on the pathObj returned by [glob]. That pathObj was created by TclNewFSPathObj() and has the special structure you noted before. Tcl_FSSplitPath calls TclFSGetPathType() which calls Tcl_FSConvertToPathType(). which was modified by this patch. Prior to the patch, T_FSCTPT sees a typePtr of &tclFsPathType and quickly returns TCL_OK. After the patch, there's a check of fsPathPtr->cwdPtr and since it holds the value of the directory globbed and not the current working directory of the process, the pathObj is reconstructed by generating it's string rep, then re-parsing. Appears that's where the time difference comes from. Seems that if either Tcl_FSSplitPath() or TclFSGetPathType() were updated to recognize the optimized form and process it more efficiently without a string rep re-parsing, this slowdown might be avoided. It might be, though, that the re-parsing just has to happen at some point, so you just have to pick where. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-15 18:22 Message: Logged In: YES user_id=80530 Just another optimization suggestion. Near the end of TclFSCwdPointerEquals() when string comparison is done to verify that the two cwdPtr values are different objects, but equal values.... Couldn't you reset the value of (*objPtrPtr) to be tsdPtr->cwdPathPtr -- along with appropriate refCount housekeeping -- so that subsequent checks could use the fast pointer comparison and avoid the repeated string comparison? ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-15 17:58 Message: Logged In: YES user_id=80530 I'm a bit confused on the FsPath structure. The normPathPtr is documented to be "Normalized, absolute path" However, Tcl_NewFSPatchObj() seems to be initializing it to the relative name of a file? And then stashing the name of the containing directory in the cwdPtr field? Is this just a case of comments falling behind? ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-15 17:56 Message: Logged In: YES user_id=80530 still looking at this. Just a note while I'm thinking about it. One of the recommendations in the Tcl Engineering Manual is that the same thing gets the same name throughout all the code. For example, it is always Tcl_Interp *interp As I move from function to tunction in the Tcl_FS code, my sense is that there's not as much naming consistency as I would like. Make the code harder to follow than it needs to be. Maybe once we're through bug fixing, an additional pass for code cleanup ? ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-14 20:13 Message: Logged In: YES user_id=32170 "yes" to your question below. Note that the "path" object results of 'glob' can be different to a normal path: glob -dir $dir * returns special path objects which know that they are '$dir' plus a little bit. It could be there is a slowdown just for these special paths. (We should add a separate 'file dirname' benchmark on an ordinary path). However: I really don't know what would be slower with this patch (unless it introduces some subtle weirdness I haven't thought about). Here's what this patch does: Tcl 8.4/8.5: spend lots of time with relative paths checking if the 'cwd' has changed. This involved calling the OS'es getcwd, converting the encoding, normalizing, creating a path object, comparing with the previous path object, and then usually throwing it all away because no-one had called chdir behind our backs to change the cwd anyway. Tcl 8.5+patch: cache the native cwd and perform an immediate comparison of the result of getcwd with that native cwd. If it's the same, return the cached value which then short-circuits everything else (no normalization, no encoding conversion, no object being created). There are a few other small changes beyond the above, which could be to blame (some code paths have changed in tclPathName, which helps to simplify everything, I think!). Thanks for doing all this, and for pointing out the code duplication -- I'll look into that, probably after we understand this issue, since fixing this might lead naturally do removing the duplication anyway... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 18:51 Message: Logged In: YES user_id=80530 No data yet, but I notice that both slow operations share a stanza of code in common. Lines 2659-2674 of tclFileName.c and lines 1402-1417 of tclCmdAH.c I'd look at those operations first I think. BTW duplicate code suggest a factoring opportunity. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 18:31 Message: Logged In: YES user_id=80530 so looking at the benchmarks, I should be looking for the cause of slowdowns in the [file dirname] and [file tail] commands? ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-14 16:23 Message: Logged In: YES user_id=32170 In that case I'd be very interested to see some sort of profiling (at least at the level of function calls) of those two slow benchmarks (glob /dirname glob /tail), since it's very hard to see how they could possibly get worse with this patch.... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 15:58 Message: Logged In: YES user_id=80530 My benchmarks tested HEAD against HEAD+patch. ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-14 10:01 Message: Logged In: YES user_id=32170 Thanks for fixing the patch! I'm wondering what version of Tcl you're benchmarking against? Is it 8.4.5 or 8.3.4? Could you perhaps upload full results to: http://mini.net/tcl/10582 I must say I'm not really sure what could cause the slowdown in those two glob functions, and I don't really have the experience (or access, given compilefarm never works) to do profiling on Windows. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 01:48 Message: Logged In: YES user_id=80530 testing your benchmarks, it does seem that most things are a bit faster. There appear to be especially big speedups for: 009 FILE exists! absolute tmpfile (str) 1.00 0.70 010 FILE exists! relative tmpfile (obj) 1.00 0.68 011 FILE exists! relative tmpfile (str) 1.00 0.27 ... 013 FILE exists! tmpfile (str) 1.00 0.75 ... 037 FILE recurse / cd 1.00 0.62 ... 039 FILE recurse+stat / cd 1.00 0.68 On the other hand, I see substantial slowdowns for: 020 FILE glob / dirname 1.00 1.41 ... 031 FILE glob / tail 1.00 1.56 Hope that helps... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 01:32 Message: Logged In: YES user_id=80530 After that change, `make test` completes with no (new) failures. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 00:43 Message: Logged In: YES user_id=80530 uh, looks like I misread slightly. Just need to add some safety NULL-checking to line 606 of unix/tclUnixFile.c: if ((clientData != NULL) && ... After that change, the segfault is corrected. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 00:30 Message: Logged In: YES user_id=80530 The Unix implementation of TclpGetNativeCwd() is not happy receiving a NULL ClientData -- at least not if the system getwd() or getcwd() calls fail. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-14 00:19 Message: Logged In: YES user_id=80530 % source ./../tests/all.tcl Program received signal SIGSEGV, Segmentation fault. strcmp () at ../sysdeps/alpha/strcmp.S:41 41 ../sysdeps/alpha/strcmp.S: No such file or directory. Current language: auto; currently asm (gdb) bt #0 strcmp () at ../sysdeps/alpha/strcmp.S:41 #1 0x12010c080 in TclpGetNativeCwd (clientData=0x0) at ./../unix/tclUnixFile.c:606 #2 0x1200b5d60 in Tcl_FSGetCwd (interp=0x12025b818) at ./../generic/tclIOUtil.c:2360 #3 0x1200d8a84 in Tcl_FSGetNormalizedPath (interp=0x12025b818, pathObjPtr=0x1202e0578) at ./../generic/tclPathObj.c:1356 #4 0x1200b4898 in Tcl_FSEvalFileEx (interp=0x12025b818, pathPtr=0x1202e0578, encodingName=0x0) at ./../generic/tclIOUtil.c:1556 #5 0x120044ab8 in Tcl_SourceObjCmd (dummy=0x0, interp=0x12025b818, objc=2, objv=0x12025c800) at ./../generic/tclCmdMZ.c:1128 #6 0x12002a704 in TclEvalObjvInternal (interp=0x12025b818, objc=2, objv=0x12025c800, command=0x0, length=0, flags=0) at ./../generic/tclBasic.c:3143 #7 0x12007baf0 in TclExecuteByteCode (interp=0x12025b818, codePtr=0x1202e5828) at ./../generic/tclExecute.c:1548 #8 0x1200796e4 in TclCompEvalObj (interp=0x12025b818, objPtr=0x1202a4ef8) at ./../generic/tclExecute.c:1005 #9 0x12002bdd8 in Tcl_EvalObjEx (interp=0x12025b818, objPtr=0x1202a4ef8, flags=131072) at ./../generic/tclBasic.c:3884 #10 0x12012c83c in Tcl_RecordAndEvalObj (interp=0x12025b818, cmdPtr=0x1202a4ef8, flags=131072) at ./../generic/tclHistory.c:142 #11 0x1200bfb90 in Tcl_Main (argc=1, argv=0x11ffff4f8, appInitProc=0x1200109a0 <Tcl_AppInit>) at ./../generic/tclMain.c:478 #12 0x12001097c in main (argc=1, argv=0x11ffff4f8) at ./../unix/tclAppInit.c:90 ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-01-13 23:58 Message: Logged In: YES user_id=80530 first test report: this patch gives me an immediate segfault when attempting a `make test`. Will look further. ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-01-06 10:48 Message: Logged In: YES user_id=32170 Added modified file.bench that I'm using. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=310894&aid=871583&group_id=10894 |