From: Daniel J S. <dan...@ie...> - 2004-08-14 18:30:53
|
Hope this doesn't sound like a lecture, but I want to discuss how to keep datafile.c clean and prevent the evolution of convoluted code that is starting to occur with that file. In having done quite of bit of modular and object oriented code for a few years, I'd like to discourage the use of global variables across two conceptually different pieces of code. In particular, in this instance is the global variable "df_datum". Let's begin there. I see that before the histogram strings code was added, df_datum was used outside of datafile.c in one--and only one--case, plot2d.c. I would discourage making df_datum available to the outside world in datafile.h. Here is its use: case 0: /* not blank line, but df_readline couldn't parse it */ { df_close(); int_error(current_plot->token, "Bad data on line %d", df_line_number); } case 1: { /* only one number */ /* x is index, assign number to y */ v[1] = v[0]; v[0] = df_datum; /* nobreak */ } case 2: /* x, y */ /* ylow and yhigh are same as y */ OK. The idea is simple; if only one variable is read from the file then the index into the file should be used as the x value and the read variable should be used as y. But I'd like to point out a few things. First, consider what is being done here, the use of df_readline() starting the loop, i.e., while ((j = df_readline(v, max_cols)) != DF_EOF) { retrieves from df_readline the variables (v[]) and the number of variables read (j) in a modular fashion. However, plot2d.c retrieves a short time later something also generated by df_readline (df_datum) as a global variable. That is not good. If you were evaluating that practise for, say, a software exam, ask yourself what grade you would give that. Alright then, so how to avoid using df_datum like that? Well, I would say that that case 1 situation above really could be easily moved into df_readline itself (** referenced later). Notice that if at the end of df_readline() we knew that this was for 2D data (in the current case, that is a given fact because reading 3D data is its own special routine) would could have this exact same code. For example, at the end of df_readline() if (df_plot_mode == MODE_PLOT && output == 1) { v[1] = v[0]; v[0] = df_datum; output++; } The df_plot_mode variable is something I invented for the image stuff that is recorded upon openning the file. (It isn't the main focus of this point.) You might argue, well, then df_readlin() needs to know something about how the data is going to be used. Yes, but all it needs to know is that it is intended for 2D data or 3D data. In other words, we are making available to df_readline some knowledge of what the *minimum* allowable number of variables is. In that scenario then, our case statement would conclude that *both* 0 variable returned and 1 variable returned is invalid, i.e., case 0: /* not blank line, but df_readline couldn't parse it */ case 1: { df_close(); int_error(current_plot->token, "Bad data on line %d", df_line_number); } case 2: /* x, y */ /* ylow and yhigh are same as y */ In some sense the test for case 0 and case 1 is merely a sanity check. It is when the number of returned variables is 2 or greater that the meaning of the data is open for interpretation based upon the plot style, etc. Alright so let me summarize three alternatives with the various concepts: 1. Allow the use of df_datum as a global variable leaving the code as is. This encourages people to use df_datum at will, and that is starting to happen. It isn't exactly modular and clean practise. 2. Pass into df_readline() a variable indicating the minimum number of columns that is valid. E.g., df_readline(v[], mincols, maxcols), where in plot2d.c mincols would be 2 and in plot3d.c mincols would be 3. I think you understand the point. This is probably the most appropriate "modular" syntax, but I would say that passing in that mincols has little advantage if in fact we know it will always be 2 or 3 depending upon the plot mode. 3. What I described above. It is conceptually the same as 2) but instead the information is passed to datafile.c via df_open(int max_using, int plot_mode) where in plot2d.c plot_mode would be MODE_PLOT and in plot3d.c plot_mode would be MODE_SPLOT. Like case 2), df_datum no longer needs to be globally available. I would opine that alternative #3 is a good balance. Now, having said all that, I'd like to point out some code that Ethan added to df_readline() somewhere: /* FIXME EAM - Trap special case of only a single 'using' column. */ /* But really we need to handle general case of implicit column 0 */ if (output == 1) xpos = (axcol == 0) ? df_datum : v[axcol-1]; else xpos = v[axcol]; I think that is exactly the concept I've discribed in 3) above. (**) So you can see how things are starting to get convoluted. While on this topic of df_readline, I wonder if introducing too much "plot dependent" stuff into df_readline is a good idea. For example, with the histogram tics, this kind of line seems like it shouldn't be in a file-reading routine: add_tic_user(axis,temp_string,xpos,0); Is there some way to move this functionality outside of df_readline() back into plot2d.c? I pose this question because I've been trying to make the case that df_readascii() and df_readbinary(), or whatever, should be transparent to the calling routine. If functionality like above keeps being added to df_readascii (df_readline) then soon the situation arises where certain types of plots can't be done simply because the data comes from a binary data file. Ethan, what is the minimal amount of information that you would need coming back from df_readline() to implement headers from files? If df_readline() were equipped with a char pointer for which df_readline could realloc() memory and assign a string, would that do it? That is, I might propose df_readline(v[], maxcols, string) where string is a character pointer. Then add_tic_user(axis,temp_string,xpos,0); could be moved to plot2d.c. I see absolutely nothing wrong with that addition. I think that is much cleaner than working with so many global variables. In any case, if my comments have motivated anyone to change something, please hold off until after the image patch... or if you want me to make an attempt at removing df_datum from outside datafile.c as part of the image patch, I can do that. Dan PS: I've concluded that moving df_readbinary() to another file would require the sharing of too many "local" variables. So I'm thinking that just putting it to the bottom of datafile.c is best... that's actually where it probably should be organized anyway. |