Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
In a comment on issue 67, Fred says that the model classes can be awkward to use and that it's sometimes hard to see how to extend them. This is a bug as sure as if the program stopped with a backtrace. But I'm not sure exactly how to fix it, so I'd like some discussion on the issue.
One of the long-term good things about OpenBiblio is that a lot of people feel comfortable diving into the code and extending it as needed. I don't want to break that. My goal with the model classes was to eliminate duplicate code and make the database interface more consistent so that common actions like adding, updating, and deleting records worked the same for all tables. I do not consider the design to be set in stone, and in particular, if it can be made clearer and easier to use, I want to do it.
Older versions of OpenBiblio had a class for each table. If the table was called foo, then the class would be FooQuery. A lot of code was duplicated between these classes. I noticed that most of them behaved in nearly the same way for the basic operations on records: insert, update, delete, and retrieve single records by primary key. Only a few things differed:
list of fields
whether the key was auto-generated
So I wrote a generic class that did the basic operations for any table, given the information above. For each table, DBTable is subclassed and the information above filled in. When operations specific to a particular table are needed, then they are added as methods to that table's class.
One thing in the current design's favor is that it cut code size by hundreds of lines without reducing functionality. I'm a big advocate of code reduction, and I think the design was successful in that area. Most of the model classes are well under 100 lines, which couldn't be said for the FooQuery classes.
The design is based on extension by inheritance, which I think can be confusing. This has never sat well with me (even though I did it), and I wonder if it's part of the problem.
I'd like to hear your thoughts on what is good, what is bad, and what we can do to make it better.
I haven't used BBCode on SF forums before, and it appears that lists aren't supported or don't work the way I thought they did. It would be nice if there were a preview option I could use before posting.
Personally I think the Model approach is way better that what there was in earlier days. And I am a great fan of inheritance. I have written email clients in OO, and currently have a fairly good size Solar Power system monitor written originally in Delphi which I am migrating to Python. Witout inheriatnce I could not have done either of these in nearly so little code.
The problems I have with a few models is that some of the internal coe is so generic that I simply cannot follow it. A good part of that could be because I got into OpenBiblio from the outside trying to solve a single issue - online lookup. This let me simply scratch the surface of the system and apply addons where I could. Sort of like the addons to an older car without knowing or caring how the engine works.
I don not think inheritance is part of the problem really. I tend to do it alot myself. In fact my own code often uses lots of abstract methods, which can be even more confusing to a new commer. But I think about code as an engineer as components with interfaces. But not having a CS background I find some of the code obscure (while not applicable here, I cannot for the life of me work with trees, or pointers - I hate C). The reports code is completely beyond me. I don't know how many time I have looked at it, I simply cannot follow it.
Give me some straight forward FFT with convolutional integrals, or calculations of astronimical ephemeris and I am fine. But the kind of code I see in compilers, etc and my head hurts. So I don't know that I can offer any good ideas in this ara I am afraid. But please dont drop the models or inheritance - that would be a move backward. I even added a model of my own to the mix.
I've been thinking about this some more. If I have a bias in software developement its towards simplicity for the repairman at whatever cost for the developer. I guess that comes from my experience where it seems I spend more time trying to figure out how to modify/extend or repair what the prior author had in mind. So as a result I tend to write lots of short to-the-point modules. while I tend to extract repetitive blocks into libraries of sorts, I dont do it to reduce code size, but rather to minimize the number of places to fix something when it inevitably breaks. Yup, I truly beleive everything code breaks sonner or later, and nothing I have seen has changed my position yet. So I want the repair process to be as simple as possible both for me and the next guy in line. I do this in electronics, woodworking, programming, whatever.
As a result of the above I am getting very uncomfortable with what is happening with the search and onlne code. Its not that I dont like the functionality, I think its great and getting better almost daily. But the code is getting ungainly and awkward. What were short simple functions are getting steadily larger and more complex. And we are seeing more bugs crop up as a result. My solution would be to rewrite portions with many more smaller modules with possibly more lines of code but simplifying the individual modules. Some of the models are like that already with many small reusable functions. But there are a few that don't fit that description. One I came across just before starting this ramble is in Copies.php. Specifically the validate_el() function. I like the idea of such a module, but a part of it is to detect duplicate barcodes. So now that Luuk points out he is seeing duplicate barcodes appear I'd like to use that function; but it's embedded. You'd think this would be no big deal, but it refers to some array $copy which I suppose is the fieldset for the copy table. But if so why aren't we referring to the fields set as properties of the class itself? Perhaps this is due the way PHP4 sees objects, but whatever the reason, it isn't clear what is happening here so I can't be sure just what the best way is to extract this duplication detector because I don't really know what I have to pass to it.
Any way I don't mean any of this as criticism, just trying to better explain the feeling I mentioned in bitBucket issue 67.
Whether criticism or explanation, it's useful.
I think our philosophy about code length is fairly similar. I have two main motivations for reducing the amount of code - both stem from trying to reduce the difficulty of maintenance. The first is that most programmers have a fairly constant bug rate in terms of lines of code, no matter the programming language in use. Fewer lines of code mean fewer bugs. The second is that if there's less code in a project, then there's less for a new developer to read before it's fully understood. I'm not for making things so short that they're cryptic - that's self defeating. I think we're on the same page here.
I haven't even begun to look at the new search and online code, so I can't comment on it yet.
Reporting has a few complicated pieces. There are parts of it that I'm not happy with, but I'm quite pleased with the whole. Since the system includes a little language interpreter, I can see where it runs into areas you aren't comfortable with. Is there something you need fixed in that code?
I think you may not be understanding the purpose of validate_el(). It's generally only called from within DBTable. On every insert or update, DBTable calls validate_el() to determine whether it should put the inserted or updated record in the DB. The first argument is the record, and the second is a boolean flag indicating whether the record is a new insert or an update. If the record doesn't pass validation, then the insert or update doesn't happen and the caller of insert_el or update_el gets the error list.
The duplicate barcode problem itself is worthy of its own thread, so I won't talk about it here.
I think the bottom line is that the working of DBTable.php isn't clear from a quick read. That's the problem we need to fix. I'll take another look at it and maybe write another message here in a bit.
Rereading DBTable.php, I'm not sure exactly where the problem lies. The custom iterator stuff is probably confusing, though that only affects the "get" methods. Could you let me know what you find clear and what you don't? I could just write an article on how it works, but I'd rather that the code be self-explaining. DBTable is probably the single most-called piece of code in the system, and I want it to be easy to understand so nobody's in doubt about what it's doing.
OK, I had missed the call to validate_el() in DBTable.php. That's embaressing, since I was just mentioning how I like abstract methods. I like your approach, just hadn't seen it being used. But in the particular case in Copies.php, I feel that the duplicate test should be extracted so it can be used at an earlier point if wanted.
I think some of the problems I have may be just a Fred thing. Just different styles coupled with a different view of what the target audience is. Not that I disagree with you, just that I got into OpenBiblio for my own very small application and have never had a reason to see its use beyond the small low budget private library. I concede it seems to be spreading out into many other places, but I haven't been considering that larger world in my approach to the code.
And I must say that I too consider it essential that code be understandable without external documentation except to provide a high level strategy sort of thing. Whenever possible, code should be self documenting. But I have to say (for me anyway) that is impossible any time a regular expression is encountered. The purpose should always be stated in plain language. In the same vein, I think we would benifit from a move to the PHP5 object model, though I understand the issue with those whose ISP wont go along. For me, I'd break the tie to ver4 with 1.0. Its got to come someday, just like similar ties to Windows 3.1 16bit code. The use of Private, Public, & Abstract keywords would catch all kinds of erors while making the intent much clearer. Requiring the use of the 'Parent::' prefix makes calls to the underlying parent class clearer too.
I just reviewed DBTable.php and can't say I find anything to obnscure in it except that I find myself having to take so much of it on faith as what is happening is hidden even further down in Query.php. Since you have worked with all this for some time, you know almost instinctively what is going on - I dont. Personally I like to see my SQL written in plain language, not so much parameterized.
You will find in ../catalog/biblio_search.php which is used in the 'existing Item' menu action, that I did not use DBTable or any other of your code but Query(and only the most basic parts of that). That was because at the time I was unable to figure out how to do what I needed with what existed. As a result none of the features you have in your models is used. Parts were subsequently expanded even further by Luuk to satisfy his needs. Not something to brag about, just reality. This is one example of the code bloat I mentioned earlier; you might make quick work of throwing this stuff away.
Im rambling here, probably not saying much. time to stop.
Let me try this again.
Now through a chain of events we have at least 4 defferent styles of code in OpenBiblio. Dave Stevens', Yours, mine, and Luuk. I think that perhaps this stylistic freedom has gotten a tad out of control. :-)
OK, first let me say some "Amens": The variation in code style is bad and getting worse. I also think you're right that 1.0 may be the best time to drop PHP4 compatibility - especially if it helps us work out these style issues. Moving to a more AJAX-based architecture is also the right thing to do - indeed, the model classes were consciously intended as a first step in that direction.
Checking for duplicates from AJAX queries is worthwhile, and I wasn't thinking of it when I wrote my last message. But is it better to check each field individually with an individual entry point or to expose validate_el more directly and check the whole record? The nice thing about giving AJAX direct access to validate_el is that the code can easily make checks that require multiple fields. E.g. "Is this booking trying to make the checkout length too long?", which needs to know the item being booked, who's booking it, and the proposed check out and return dates. If we expose validate_el for each model class, then all the interfaces can be the same, rather than having to figure out which parameters are needed for each server-side check. Is there a downside to that approach?
The logical next step to what we've said is making a single HTTP/JSON interface that exposes most of the functionality of the model classes.
Now for DBTable and Query - I'm not sure I understand what you're saying about the code in DBTable relying on stuff that's hidden in Query. I'm not sure what Query is hiding - it presents a pretty thin layer over the MySQL functions. The Query constructor does hide mysql_connect(). Query->select() is exactly like mysql_query(), only it packages the result set in an Iter. Query->act() is again just mysql_query() only it ensures that there is no result set. Query->select1() and Query->select01() are just shorthands for very common patterns using Query->select(). Query->getInsertID() is mysql_insert_id(). Query->lock() and ->unlock() only need to be used when multiple queries are interdependent. For instance, if you are going to run a query for each result in a select and you need to guarantee that the rows don't go away after the select, then you need to call lock() before the initial select and unlock() after all the operations are done.
This leaves only Query->mkSQL(). That function lets you take this
$sql = "SELECT * FROM biblio_field WHERE data='".mysql_real_escape_string($query, $dbconn)."'";
and turn it into this
$sql = $db->mkSQL("SELECT * FROM biblio_field WHERE data=%Q", $query);
In catalog/biblio_server.php, you're using a style more like this
$sql = "SELECT * FROM biblio_field WHERE data='$query'";
I can understand the appeal, if only from the standpoint of brevity, but this is an SQL injection vulnerability. This goes back to the same escaping concerns we were talking about elsewhere. What happens when a user searches for O'Reilly books? Query->mkSQL() fixes this problem.
We've now talked about numbers 1 and 2 from the OWASP top 10 web app vulnerabilities. I'm sure you aren't the only developer who'll need to hear about those, so I've started a wiki page about security.
It has just occurred to me that some of the excessive generalization that bothers you may really be due to DBTable->_pairs() and DBTable->_keyTerms() rather than Query. They are a bit abstract, but I'm not sure how to make that better and still keep the benefits of DBTable, i.e. that a table description is little more than an array.
AS to validation: should it be done interactively or at the DB interface. That's an 'or' question and the answer is no. I think it always be done in both places. Procesing time is minimal for something like this so that isn't a considerstion I beleive. The user experience is what matters it seems to me and immediate feedback on errors is always a good thing I think. Having a check at the server catches coding oversites before they get out of hand.
I think I aluded to a problem with Copies, etc in my previous. biblio_server is currently not really using Copies/DbTable at all. And I don't like that, but while looking it over last night to fix (again, because I never finished it the first time) I could not see how to get back to using them. I know that I cant use Copies as is since update & insert don't know about custom fields yet. As to the security issues, I mentioned before that I accept the necessity, its simply that never having to deal with them before, they aren't part of my 'normal' thought process. Also I think I am more of a 'just get it working' kind of engineer.
My problem isn't the idea of inheritance as I mentioned before - I strongly like that. But for some reason I am constantly missing the fact that a function being called is in a lower layer. It could be because other languages I am playing with lately seem to require more explicit references to a parent while PHP4 does not. I tried to see if the PHP manual makes any mention of that , but if it does, it isn't obvious. I think I need to review some of my PHP5 OO references to see what pops out. For sure I think use of public, private, etc can only help.
As to validate_el. I think that should normaly be a private method, but portions of it, like barcode duplicates, should be public with an access at the mode case statement in the server module.
I will read over the security wiki and respond to that later.
Honestly I don't think I ever noticed _pairs or _keyterms. I will have to say that it is not at all aparent to me how to prepare the data received in a POSt or GET into a form to pass to Copies or any other model for updates or inserts. I have figured out the various select functions, and while its not my style, I can see what is happening with mkSQL(). If using mkSQL fixed the O'Reilly issue, that alone will make me a beleiver!
I am comfortable with a table being an array. That is how most mySql stuff in PHP or else where is done. What would probably help most is just running into a good example in an area I am familiar with. Let's say the insert function of biblio_search which currently does not call Copies functions at all. Can you modify the folowing in biblio_server:
$theDb = new SrchDB;
if ($theDb->isDuplicateBarcd($_POST[barcode_nmbr], $copyid)) return;
to instead use Copies.php?
Then I will attempt to do the same for Update, and afterward for the rest of it.
Incidently, you haven't commented on the user interface monster I have created in catalogging and Member Search. I have been very concerned you would not care for it. What do you think?
I agree that validation needs to happen both client side and server side - client side for convenience, server side for security. The public use of validate_el and the new user interface should probably be on their own thread, so I'll take them up there.
For updating using Copies.php, I think the code should look like this:
$copies = new Copies;
$errors = $copies->update_el($_POST);
# Do whatever needs to be done on error or success, e.g. returning status in a JSON structure
This almost works right now. DBTable only pays attention to the correct fields and validates all input, so it's safe to give it $_POST. We just need to make sure the form uses the same field names as the DB. The only reason this won't work today is because of custom fields and status changes. These are easily fixed by putting the appropriate code into the Copies class.
Coming up with the code snippet above just by examining the existing code was probably almost impossible. I'm just now realizing that I never made my aim really clear. I got the model classes written, but the forms and other code needed to be changed to make the simple call above really work. But first I wanted to remove all the old DB access code, so I just went through and did a quick-and-dirty switch to the model classes. A clear use like the one above may not exist in the current codebase, but it is the goal. I've added a mention of this idiom to the wiki page on database access. I hope this clears things up a bit.
I think last Jult or so, you mentioned that this was just a work in progress and you were in the midst of creating the models. In fact most of your commit notes at that time pretty much support that.
I'll try to take some time this weekend to pursue your sample above. Even if unsuccessful I will probeably gain a better feel for what you are doing. Somehow just passing $_POST seemed to easy though I couldn't see any other way. Your wiki writeup helps a lot.
Well, it works. I can do an update of a copy using the above code sample, in fact a custom field of a copy. But I'l be darned if I can see how. I can trace the calls for updating the copy itself, bu can see no trace of a way to update the custom fields. I'm sure it is there, but there is no indication that it will happen in Copies.php that is apparent to me.
So tomorrow I will try an insert. If this all pans out the size of biblio_search will shrink dramatically.
I'm more than a little surprised that the custom fields work out of the box. Are you certain that's happening? They should need special processing in both insert_el() and update_el() in model/Copies.php, and I don't see that there.
I think I modified the procedures and inserted the call somewhere in Copies.php, but cannot remember exactly so might be wrong. It is more than two months ago now when I included custom fields in the copy edit, but as far as I recall (I cannot look at the code at this time) I tried top use the class Copy.php for the operations..
Micah, I think it might save a bit of typing for al of us if you would take a quick scan of biblio_search. You wuill see that precious little of your work in Biblios.php or Copies.php is actually in use. I don't think that is desiarable even though the major cause of that is my inability to figure out how to do it the 'right way'. But the fact that the code isn't there in Copies is really irrevelant. We aren't using those functions. Luuk added two functions to Copies, a Get & Set CustomFields, but they dont use insert_el() or update_el() either.
You're right, Fred. I've started to look harder at catalog/biblio_search so I can get my head around what's really happening these days.