From: SourceForge.net <no...@so...> - 2004-06-10 14:08:59
|
Bugs item #932314, was opened at 2004-04-09 16:11 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=932314&group_id=10894 Category: 36. File System Group: development: 8.5a2 Status: Open Resolution: None Priority: 8 Submitted By: Don Porter (dgp) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: epoch safety in threads? Initial Comment: The interal rep of a "path" Tcl_Obj can remember things, like which filesystem has claimed it. Each "path" Tcl_Obj remembers a filesystemEpoch value, to help determine whether the cached information still agrees with the authoritative data structures. The epoch value checking done within tclPathObj.c always compares the filesystemEpoch value in the "path" to the tsdPtr->filesystemEpoch value, which characterizes the local thread's copy of the master filesystem data structures. That will keep the "path" values consistent with the thread's view of the filesystem state, I agree. However, the Tcl_FSRegister(), Tcl_FSUnregister(), and Tcl_FSMountsChanged() routines all directly manipulate the master per-process filesystem data. The per-thread cache of that data will be out of date until some routine in the thread calls FsGetFirstFilesystem() in tclIOUtil.c. It appears to me then, that the epoch checking of "path" values isn't really checking validity unless somehow threads are prompted to call FsGetFirstFilesystem() when the master data changes. As one example, consider Tcl_FSGetFilesystemForPath(). It calls TclFSEnsureEpochOk() which will check that the epoch within the "path" is the same as that of the thread's copy of the filesystem. If so, then the cached filesystem that claimed this "path" before is returned. Only later in Tcl_FSGFFP() is FsGetFirstFilesystem() called, so that it might be noticed that the filesystem return is invalid (perhaps now unregistered! or perhaps just mounting changes have changed the filesystem that has a valid claim) That particular example can probably be fixed by moving the FsGetFirstFilesystem() call before the TclFSEnsureEpochOk() call, but I'm also curious about the general issue of keeping per-thread caches in sync with the master data. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-10 16:08 Message: Logged In: YES user_id=95086 Backport of this patch commited to core-8.4-branch. I see that Vince already fixed the relative directory problem which caused the stack overflow. Very good. Will keep this patch open until we see that all is ok or needs to be tightened in respect to registering/unregistering filesystems in master list. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-10 09:17 Message: Logged In: YES user_id=95086 Oh, this little patch is trivial to backport. The non-trivial one is actually the one handling the: set env(HOME) dir cd pwd thingy which under 8.4 causes endless loop. This has not been reported as bug in the bugtracker. It is solved in 8.5 branch. I dont know if it is worth the effort to backport this one in 8.4 tree (lots of work). ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2004-06-09 20:00 Message: Logged In: YES user_id=72656 IMO this seems like a good thing to backport to 8.4, but it may not be a simple drop-in code update (diverging code bases ... :/ ). ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-06-09 19:43 Message: Logged In: YES user_id=80530 good news. I have a workaround for this bug in my trofs extension. (I force a shimmer to purge invalid cached information). I removed that workaround, and my test suite raised 11 errors. Then I updated the core with your fix, and back to 0 errors. Fix looks good, and it seems to me simple enough to have confidence in it. Is it a good candidate for backporting to fix this issue in Tcl 8.4.6 as well? I'm going to hold back on that CREW stuff unless/until I can find another bug that this patch did not fix but CREW locks will. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-09 18:36 Message: Logged In: YES user_id=95086 Don, I just went (again) thru the code in a fast-scan mode. I think you will have *lots* of work with the reader-writer locks. This is because ot the fact that you need to properly lock *all* references to filesystem record pointer all over the place! Just look in tclPathObj.c and check at howmany places they are referenced! You need to lock *each and every* access to it, regardless of being read of write. The same applies of course to tclIOUtil.c and in fact every file wich does something with the pointer. This might have a huge impact on the speed, not mentioning the complexity of the introduced code and all possible side-effects this may have. I'm not trying to discourage you. I just want you to know what you will encounter down the road. Zoran ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-09 18:19 Message: Logged In: YES user_id=95086 Commited to head. FsGetFirstFilesystem() gets called prior the TclFSEnsureEpochOk() in Tcl_FSGetFilesystemForPath(). Bug status changed to "Pending". ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-06-09 17:55 Message: Logged In: YES user_id=80530 Heh. I'm not expecting trouble, but that may be foolish inexperience. :^) ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-09 17:51 Message: Logged In: YES user_id=95086 Agree. I will attend those open issues today/tommorow at latest. Lets see how you will get the alternate lock scheme working. I will also think a little bit about that in the meantime. If I come with some extraordinary idea (you never know :-) ) I will come back to you pronto! ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-06-09 17:33 Message: Logged In: YES user_id=80530 How about this approach. You have some fairly simple fixes in mind for the specific identified bugs. Why don't you commit those, and we can close these bugs. Then I can pursue the CREW lock idea as a separate matter in a Patch Tracker item. I'd also leave any consideration of changing the filesystem from a fundamentally per-process thing into a per-thread thing as a separate matter. Without a pretty thorough thinking through of what properties we need the filesystem to have, we should avoid anything so radical as getting rid of the master list. Per-thread, or even per-interp filesystems have some possibly attractive properties, but that approach is a big change from what's gone before. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-09 17:22 Message: Logged In: YES user_id=95086 OK. Just for your info: we already thought about reader-writer locks but abandned them because of the potential overall speed impact and eventual writer starvation issues. This was based purely on experience. We had no time to deeply check pros/cons of reader-writer locks at that place and in the given time we had to fix the main problems (memory corruptions). As you said, the reader-writer model might do given the very low writer frequency. Nevertheless, changes to accomodate to reader-writer lock model might not be that trivial as it seems on the first glance... I believe your main concern is the hidden reference of the filesystem record from within the Tcl path object and its potential mismatch with the actual authoritative static counterpart held thread-wide, right? Maybe the better way would be to avoid the master list of filesystems altogether and just leave the per-thread ones. Those will have to be locked just once for the duration of register/unregister FS. This would entirely eliminate any complex synchronization issues and make the (already complex FS) code even simpler. Just a side-thought... Zoran ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-06-09 16:32 Message: Logged In: YES user_id=80530 looks like our comments are crossing each other. As mentioned in Item 962519, what I want to do is introduce CREW (concurrent read, exclusive write) locks on the filesystemList resource. I'd very much like your review of that patch when it is ready, and, of course, we'd need to be ready to revert to what we have now (with a few corrections), if those CREW locks don't measure up in the hard test environments you are familiar with. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2004-06-09 16:23 Message: Logged In: YES user_id=95086 Hi Don ! I'm sorry for not attending to this before. I agree about Tcl_FSGetFileSystemForPath(). Indeed the TclFSEnsureEpochOk() should be called after the FsGetFirstFilesystem(). Now, for keeping the per-thread caches in sync... As long as there is a good *isolation* between the master-copy and per-thread-copy I do not see any problems. The master copy is actually never accessed directly. It is also timestamped with the per-thread private one so all path-gymnastics referencing any filesystem records must go thru the isolation layer. It is the responsibility of that layer to keep those two reps in sync. The principal idea why this per-thread copy is introduced was to aleviate mt-problems we found out in the AOLserver project which uses Tcl (threaded build) heavily. After the introduction of Tcl FS in 8.4 quite a few things broke in mt-respect so we were forced to do something. After the examination we decided that it would be the best to separate the master (thread-wide) and private (per-thread) views of registered filesystems. Otherwisem the problems of synchronization would be very difficult, if possible at all, given the current implementation of FS. Now, I understand that you see some problems with that. Can you please be more specific about which problems you are thinking of? I will be glad to review them with you and, if necessary, correct or improve the current code. If you don't mind I can reassign this bug to myself. Zoran ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2004-04-16 17:12 Message: Logged In: YES user_id=80530 The routine Tcl_FSGetInternalRep(path, fs) helps work around the worst of these issues. I was mostly worried that a path value would get passed in to a filesystem that claimed it earlier, but no longer should. In order to access the internal rep of that path, though, the filesyste, has to make that call, and an epoch check could, and IMHO should, go in there. In fact any routine that returns information from a cached internal rep of a Tcl_Obj ought to verify that the cache is still valid. Tcl_ConvertToPathType() takes care of the check against the thread-level filesystem epoch. All that's missing is a check that the thread-level value is up to date with the process-level value. ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2004-04-13 12:56 Message: Logged In: YES user_id=32170 Assigning to Zoran who's responsible for (and understands more clearly) the thread-safety side of things. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=932314&group_id=10894 |