locking problem
Status: Beta
Brought to you by:
worden
A page of mine stopped loading, even though I could still load other pages - even the ManageProject for that page's project. I added a little logging in the php and found that it was stuck on trying to lock one of the upstream projects that page uses (a pe-git project, but I'm not sure that matters). When I moved the lockfiles aside, it was able to get going again. How did it get stuck on the file locking? How will I diagnose it? Maybe I should do more comprehensive logging.
Anonymous
It seemed to get stuck again. But now when I would try to access the page, it wouldn't even write any log messages about TRYING to lock the projects. I restarted the httpd and it seemed okay after that.
If it's a deadlock problem, I'll want to track who's requesting locks on what and when. Why might be nice too, but probably not a necessity. I think I'll try logging to the error_log file:
This is trivial to implement, except that I would have to record the lock owner's pid, probably in the lockfile itself.
ok, got that running. we'll see if it contributes anything besides filling up the log file.
well, a nice side effect might be getting me to remove more of the PHP notices that are also filling up the log file.
I'm seeing some cases of things getting locked, then locked by a different process, without an unlock being reported. I'm guessing that I'm using exceptions to springboard out of ProjectEngine operations in case of error, without explicitly unlocking along the way and trusting that the locks get abandoned when the process finishes. It is probably worth making sure they get explicitly unlocked, not only to help with the archaeology, but because that trust might become misplaced sometime in the future (I think some server architectures can field multiple requests before terminating the process, for instance).
running "tail -f /var/log/httpd/php_errors.log" is fascinating - there's so much activity! WW is constantly fielding requests, probably many of them from search engine indexers, but many from real people at work as well.
I have an example of an error that throws an exception and bypasses unlock(): when the cp call fails in PESpecialSession::initialize_working_directory(). I just watched it bail (I killed the cp process myself) and not report an unlock operation.
I believe (should double check, it's been a long time) that I don't have any of those exception-bypasses-unlock cases anymore. Locks should get released when the process terminates anyway.
I have logging stuff in place now. Closing this because there are other tickets about lock problems, e.g. [#263], [#215].
Related
Bugs: #215
Bugs: #263