From: Bharat M. <bh...@me...> - 2008-10-08 20:18:04
|
Tom wrote: > If you can find a way to read-only lock the entire gallery, perhaps at > the file system level, then prevent any database changes that effect the > data directory, a backup and restore will be successful. It may mess up > the cache and a few other items but that's a small price to pay compared > to reloading 8000+ items of 2MB each which requires rebuilding all those > thumbs plus resizes. I assume this means that each image must be opened, > resized, written then reopened and read once for each thumb and final > resize. That's a significant OS load on a shared server. Having a backup > would go a long way towards a feeling of stability. This would also be a > good candidate for a compiled module (see bottom). These are orthogonal points. Whether or not the Gallery is read-only is unrelated to whether or not the derivative (thumbnail, resized, etc) images are already built. Ultimately, this problem is fairly straightforward. You dump the database, you dump the filesystem and you have a copy of all the data that Gallery needs. If you want it to be read-only, we can add that capability but remember that if you want to make it truly read-only (ie: at the disk level) you can't take advantage of any form of caching unless the cache is pre-built. >>> 2. Meta data (descriptions, tags, id3, custom fields etc) for items > ...clip.... >>> An out of context OS viewpoint suggestion: >>> Drop the use of classes from PHP coding. They are inappropriate for a >>> run-time scripting language adding significant overhead to the >>> bytecoding process because they require the OS to read every >>> mentioned file in every subsequent class definition before any of >>> their functions can be used. They are more difficult for accelerators >>> to cache making them less efficient overall. It's an easy to code >>> philosophy that really doesn't fit embedded scripts. >> That's a little harsh. > First, it's a suggestion from someone who does not write PHP code so > it's full of chlorides... Understood, but I think that you're mis characterizing the problem based on a lack of understanding of how PHP works. I suspect that you need to spend a little more time researching how this works before advancing these topics. I'll explain where you're making some false assumptions below. > Look at it from an overall system viewpoint. If you have a .php file > with three includes, the bytecoder must ask the OS to first /find/, > /open/ and /read/ that file plus every other .inc file referenced. It's > less of an OS hit to use short inline code than to diversify in separate > files, even if the code is duplicated. Consider a compiled C program > where the same function is called in nearly every source file, like the > printf () or open(). In some cases the compiler will inline those calls > (to prevent an expensive FAR call) making the object and load module > larger but decreasing system overhead at run time at the expense of a > somewhat larger program. This is all completely orthogonal to the original argument, which was to avoid using PHP classes. Whether or not you use classes is a completely separate decision from how you organize your code into files. Furthermore, you're applying C style optimizing semantics without a deep understanding of what PHP is doing to optimize the code. Without further evidence, I have to believe that applying those semantics to PHP is pure speculation on your part. Please substantiate this in some way, at the very least by running some benchmarks. > I haven't read the syntax specifics for PHP but I suspect a class > definition requires a separate .inc file with some kind if class > identifier. I also suspect additional .inc files can be referenced in a > both a class and it's functions. The idea works very well when compiling > a source directory to a binary object or load module. In fact I was > taught to keep each subroutine in a separate file. A "pile" (source > file) with more than 100 lines was flagged with a warning. Such a > technique works well for managing a compiled C library but can generate > significant and unnecessary run-time disk overhead when the same > approach is used in a script language. PHP classes do not requires a separate file. Yes they do have a separate identifier. But again, you're falsely applying C style semantics to PHP. Disk access is far more expensive than memory access. So minimizing the amount of disk access required to render a page is an obvious win. So at one extreme, let's just stick all the code in one file. But then the parse time is also expensive, so parsing more code than you need to serve your request is wasteful. So at the other extreme we put every function in its own file and only load it if it's needed. Thus far I've seen you advance both arguments ("too much disk access" and "too much time spent parsing"). I think that the proper solution is to measure what code is used for the most common operations and organize that code into the fewest possible disk accesses. I've done this in a coarse fashion in G2 (this is why the helpers are _simple, _medium and _advanced). > That's what I was getting at. > > The web industry is still young in terms of experience, but eventually > that experience can be leveraged as feedback that helps define the > overall purpose for scripts and their usage. I learned that to use a > script as a substitute for a compiled language was a setup for > performance disaster. I was taught that scripts are best used as the > "cerebral cortex" of a complex system and are intended to operate an > underlying group of objects, not solely to develop an application. Only > time will tell... This is a nice generality. But it's not helpful. Please, you've got to start basing these arguments in today's technologies. I've been at this a long time, too and I remember all these stories. But we can't have a practical discussion about how PHP works until you base your arguments in the fundamentals about how PHP is implemented. > I'm sure there are good PHP profilers available. Perhaps when > development slows down you'll have time to fiddle with some of them, or > delegate it to someone else :) Keeping efficiency as a basic philosophy > makes a world of difference to the user end. >> If the issue is that the time it takes to load a class is a >> performance hit, there are ways to mitigate that while still keeping >> PHP classes. > The issue is how to determine the hit. If you profile code on a single > user system you'll never see the difference. If you profile it on a > heavily loaded web server it makes a HUGE different to the user. In many > cases it's the difference between serving the page and the user "hanging > up" on it. Then you've mobilized resources on a loaded system for no > reason adding to the load... Have you heard the term "thrashing"? :( >> I haven't seen any evidence that they're harder to cache (perhaps you >> have a reference for that? I looked but couldn't find anything obvious). > They may not be harder to cache but they do take longer to load or > "instantiate" the first time around. There are good PHP profilers and I've used them extensively. I have profiled our code on low end vs high end machines. I've profiled it on a wide range of shared hosts with varying loads, with and without caches. I've explored the semantics of this thoroughly. This is one of the reasons why I'm suggesting that we follow the models I'm advancing in Gx. Please install a PHP profiler and start profiling the code and then let's start talking about *specifics* of optimization. I am eager to learn about places where we can make things better (it's been a while since I did a thorough pass over G2 performance) but let's move away from generalities. >> Classes give us the advantage of a limited namespace which makes >> embedding easier. > But there's no such thing as a free lunch. It may be easier to code but > it's going to cost something, somewhere. Hopefully it's only cost is > increased storage. Please measure the impact and then we can discuss it. I suspect that you won't even be able to detect the impact. >> Now the email you sent about this a while back makes more sense :-) >> I'm interested in Roadsend. I'd be curious to see the results of >> trying to compile G2 with it. I'd also be curious to see the results >> of compiling the Kohana Hello World app with it (since Kohana is one >> of the leading contenders for PHP frameworks for Gx). > If I have time I'll poke around and see how Roadsend deals with classes > and file names. Obviously the output will be a binary file so web > references to .php would need adjustment. That or the files need to be > renamed. Then there's web server cgi issues to think about. But it might > be really cool for installing optional binary modules. Especially those > that tend to be inefficient. > > I had trouble getting it set up before. It's not a debian repository > installation and it uses some odd packages that I had to download and > install manually. But I did manage to get test.php -> test and it > executed successfully. Tom, I like your energy. You're clearly spending a lot of time on the project and I value that. However, you are advancing a lot of ideas and thoughts that are not necessarily productive. I think that this is one of the reasons why people are not taking time to give you an in-depth response to your emails. I would like you to participate in the design and implementation and testing of Gx moving forward. But I ask that you refrain from hyperbole and try to focus on the real issues at hand. If you're going to advance technical arguments, please gather some data to support it or at least shed some light on it. We cannot make decisions today based on your knowledge of how optimizations work in C. We must make decisions based on our actual platform. To give an example of this, I refer you to earlier in this discussion when Jay and I were discussing the performance metrics for the drupal hook style approach. My response to him was to create a benchmark and measure the actual impact so that we can make an informed decision, as opposed to speculation: http://www.nabble.com/My-thoughts-on-radically-changing-Gallery2-ts19337418i40.html#a19681800 Please use this approach as a model for our future discussions and I believe that we'll be much more productive. |