From: Daniel J S. <dan...@ie...> - 2004-08-14 22:25:31
|
Ethan Merritt wrote: >On Saturday 14 August 2004 11:56 am, Daniel J Sebald wrote: > > >>Hope this doesn't sound like a lecture, but I want to discuss how to >>keep datafile.c clean and prevent the evolution of convoluted code that >>is starting to occur with that file. >> >> > >[snip lengthy rant, some of which is on target, some not] > :-) >Didn't we already have this discussion a few months ago? >I proposed that the whole notion of tracking input data by how >many columns were read in has outlived its usefulness. >I think we should get rid of max_cols and all the various >tests that depend on it, and instead pass explicit information >about the requested input data. Hans-Bernard disagreed. > Well, yeah. Lot's of disagreement; but not much agreement and what the paradigm should be and I'm suggesting that there be some agreement to avoid too much divergence. The discussion sort of faded... >>df_open(int max_using, int plot_mode) >> >> > >Like that, yes, except that >(1) I think max_using is not necessary or desirable, and >(2) I proposed passing a pointer to the whole plot >structure rather than passing only the plot style. > Another paradigm is fine, passing in a pointer to the whole plot is fine. Just not a combination of multiple views. Some of the image stuff may fit to one or the other paradigms, so it might be good to adhere to only one in the near future. I know that for ASCII files the number of columns can be determined by the file itself and gnuplot readjusts accordingly. That code is currently in plot2d.c. That will remain there? Or will that be moved to inside datafile.c as part of df_readline? df_open? >>While on this topic of df_readline, I wonder if introducing too much >>"plot dependent" stuff into df_readline is a good idea. >> >> > >And that was Hans-Bernard's counterargument. > > > >>For example, >>with the histogram tics, this kind of line seems like it shouldn't be in >>a file-reading routine: >> >> add_tic_user(axis,temp_string,xpos,0); >> >>Is there some way to move this functionality outside of df_readline() >>back into plot2d.c? >> >> > >Why? It is not specific to 2D plots. But anyway, the answer is no. >The information being processed, the tic labels, are not specific to >the current plot; they are a property of the axis. The code belongs >in axis.c, which is where it is currently. But still you have to call it >from somewhere, and I maintain the logical place (maybe the only >possible place) is the point at which you obtain the information. >That is set.c in the case of axis tic info coming from a "set [xyz]tics" >command, and datafile.c in the case of tic info read in from a file. > OK, let me back up here. I think I see now the more important issue here is that the data to be plotted, the imigration.dat file for example, won't work because it has more columns than allowed by max_cols passed into the df_readline routine. That is, the normal gnuplot ascii file looks like "string" ** <data> "streing" <data> "string" <data> But the 'imigration.dat' file is ** "string" "string" .... "string" <data> <data> .... <data> where the data which is to serve as the tic labels (read as a string rather than a number) is contained in the ** element. I don't have all the answers, but I'll make some comments. In the latter case, those bunch of strings at the start of the file could all be read at once. In fact, it is almost similar in strategy to the "gnuplot binary" type of file where along the top is the x values and the first column afterward is the y_values. Also, the df_readline() routine might be easily arranged to remove the max_cols restriction and make the value of j returned dynamic, from 2, 3, 4, etc. all the way up to 500 if one wants. It may just mean dynamic alocation of memory (that doesn't need to be reallocated if the number of read values doesn't change, thus saving efficiency). >>I pose this question because I've been trying to make the case that >>df_readascii() and df_readbinary(), or whatever, should be transparent >>to the calling routine. If functionality like above keeps being added >>to df_readascii (df_readline) then soon the situation arises where >>certain types of plots can't be done simply because the data comes from >>a binary data file. >> >> > >If that is indeed true then I have reservations about introducing binary >input at all. Are you saying that it will not be possible to read strings >in from a binary file, so that the new "plot with labels" and >"using ...:xticlabels(<col>)" will not work? If so, then the functionality >has already diverged. And if the two modes have different capabilities >then all the more reason to keep them separate in the code as well. > No, certainly I could add reading strings from binary data files. But I would propose making it a generic thing. Say for example, a command line syntax (or it doesn't have to be command line, it could be an internal variable) whereby one of the columns can be designated as a string tic label, e.g., "ticlabel <col>", or whatever. But it's meaning is generic, it is just a string passed back and treated accordingly. In the case of histograms it is a tic label. Perhaps something different for something else. However, my feeling about df_readline(), df_readascii(), df_readbinary() are that these should be core little routines (in scope anyway), a kernel if you will, that takes in data and shuffles it off to somewhere else to be processed further. If one mixes dedicated code like plot->histogram, etc. into df_readascii(), then they also need to remember to make that change in df_readbinary(). If it is tweak in one location, then it has to be touched in another spot, which might go forgotten. So, maybe a version of df_readline as follows: int df_readline(double vector[], char **string) where now the vector can be of any length, and the string is a location where df_readline is to put a pointer to a character string that it dynamically allocates. (It can be one, at most, of the columns treated as a string rather than a number.) Does this get around some problems? Am I understanding the big issue now, that there are more columns now than max_cols? I guess I'm asking that if the max_cols restriction were dropped, would the current set up allow you to move data into the plot structure as desired? Is there a paradigm shift here for the way data can be arranged in the file for histograms? >>Ethan, what is the minimal amount of information that you would need >>coming back from df_readline() to implement headers from files? If >>df_readline() were equipped with a char pointer for which df_readline >>could realloc() memory and assign a string, would that do it? >> >> > >That's what it does now. Because plot->title is not visible from >inside df_readline (which actually I would prefer), the title is allocated >and a pointer to it is stored in a static variable. A helper routine >df_set_key_title() is later called from get_data(), which is indeed in >plot2d.c. No global variables are involved. > Yeah, that is fine. I assume that df_set_key_title() is not within df_readlin(). My major point in all this is to keep df_readline() clean and generic, and in the long run it will promote happiness. >>That is, I might propose >> add_tic_user(axis,temp_string,xpos,0); >>could be moved to plot2d.c. >> >> > >You are confusing plot titles and axis tic labels. The two things are >quite different. One is a specific property of the current plot, >the other is not. > >I know, you are going to point to a single place where the histogram >code stuffs a plot title into an axis tic label. I'm not terribly happy >about that either, but let's split that off into a totally separate >discussion that only applies to stacked histograms. > No biggie. Got to start somewhere. >>PS: I've concluded that moving df_readbinary() to another file would >>require the sharing of too many "local" variables. >> >> > >I don't agree. Most of those local variables are indeed local. >They should not *need* to be shared. > Well, here is the thing. There is a certain element of this that can't be disentangled (if that's a word). A lot of the parameters for reading from a file are set up by df_open() because it is there that the keywords from the command line are processed. So, at the point of df_open() it isn't known yet whethere the file is ascii or binary. That could be fixed by first, at the start of df_open, checking all the keywords to see if one is "binary", but that's not graceful. So, yes even a df_open_binary() could be generated where all the keywords are again interpretted. But why repeat all these in a different file if they are going to be pretty much the same? "every" works the same, "thru" works the same, etc. Let me make this revision, and maybe that will help things fall in place. Dan |