From: Ethan M. <merritt@u.washington.edu> - 2004-08-14 23:00:29
|
On Saturday 14 August 2004 03:50 pm, you wrote: > > OK, let me back up here. I think I see now the more important issue > here is that the data to be plotted, the imigration.dat file for > example, won't work because it has more columns than allowed by max_cols > passed into the df_readline routine. No, that's completely wrong. The data is being plotted 1 column at a time. Sure there are lots of columns in the data file, but there's nothing special about that. > ** > "string" "string" .... "string" > <data> <data> .... <data> > where the data which is to serve as the tic labels (read as a string > rather than a number) is contained in the ** element. No. The scheme you describe does not correspond to any of the histogramming modes I implemented. Just forget about histograms for the purpose of this discussion. You are going way off on a tangent because you have misunderstood how the current code works. I don't ever need to read in more than 2 columns of data for histogramming. And the histogram code doesn't use strings anyhow, except insofar as you normally want to specify tic labels to go with your histograms. But the tic label business and the histograms are separate bits of code. [snip] > No, certainly I could add reading strings from binary data files. But I > would propose making it a generic thing. Say for example, a command > line syntax (or it doesn't have to be command line, it could be an > internal variable) whereby one of the columns can be designated as a > string tic label, e.g., "ticlabel <col>", or whatever. But it's meaning > is generic, it is just a string passed back and treated accordingly. In > the case of histograms it is a tic label. Perhaps something different > for something else. Ugh. Daniel. Please try to understand what the current code is doing. You're just so far off base I don't know how to reply to your comments except to say they are irrelevant. Hint: In the current code every requested column is returned twice, once as a number and once as a string. The caller can choose whether it wants the string value or the numeric value. I don't know how this fits in with your binary data files, but I assure you it is fully generic. The histogram code doesn't use this anyhow; histogramming is not about strings, it's about columns of numbers. You are, I am guessing, thinking about my *other* new plotting mode - 'with labels'. > Does this get around some problems? Am I understanding the big issue > now, that there are more columns now than max_cols? No. Nothing at all like that. I want to get rid of max_cols not because I want lots of columns, but just because it is not used for anything that really has to do with columns. It is only used to try to deduce back to what the plot type is - which I think is nuts. If you need to know the plot type then just pass the plot type. > Well, here is the thing. There is a certain element of this that can't > be disentangled (if that's a word). A lot of the parameters for reading > from a file are set up by df_open() because it is there that the > keywords from the command line are processed. So, at the point of > df_open() it isn't known yet whethere the file is ascii or binary. That > could be fixed by first, at the start of df_open, checking all the > keywords to see if one is "binary", but that's not graceful. Hunh? Now I'm the one who is confused. I really have not been looking at that part of your patch because I have no use for binary input. But I assumed you told the program *somehow* that this was a binary data file. How does this work at all if there isn't a keyword on the command line? > Let me make this revision, and maybe that will help things fall in place. OK. I will wait and have a look at it. But I seriously hope that you can basically leave all the existing code in df_readline() untouched. |