There are 'word' and 'words' function that tokenize strings and extract tokens. Add 'findword' function that will find token in string and return index or 0 when not found.
This function will allow to nicely handle the 'string value to color/position' or similar use case. Today I am doing it in a very ugly way (not full code):
# Collect unique values of a column in one string
addToList(list,col) = list.( strstrt(list,' "'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
# Classes
Classes=''
stats $FILE u 1:(Classes=addToList(Classes,1)) nooutput
array Classes_idx[strlen(Classes)]
i=1
# Array of positions of substrings in a string
do for [Class in Classes] {
Classes_idx[strstrt(Classes,Class)]=i
i=i+1
}
# helper function for Classes_idx - returns index or NaN
c_idx_n(col, ii)=(Classes_idx[strstrt(Classes,strcol(col))] == ii ? ii : NaN)
...
# n_idx is doing same for Y position
plot for [ii = 1:words(Classes)] $FILE \
u 3:(n_idx(2)):3:4:(n_idx(2)):(n_idx(2)+0.95):(c_idx_n(1, ii)) \
w boxxyerror fs solid lc var title word(Classes, ii), \
I have in mind an alternative approach, based on a proof-of-principle implementation in a private branch of the development source where I have been playing with a larger set of possible array operators. These include array operations "split" and "join" analogous to those in other scripting languages.
Brief summary
To get the index of a word "Target" in a string you would then be able to do
Larger context
The split and join operations themselves do not seem particularly problematic, althrough the exact syntax is up for discussion. My enthusiasm for a larget set of possible array operations has foundered on uncertainty about a fundamental decision. There is a basic issue here of whether the order of elements in an array is considered immutable. Algorithms that use a pair of arrays to associate two properties depend on the order remaining fixed. union/intersection/sort operators would break this.
Comments and suggestions welcome.
My not finished code is below. I was trying to make it simpler by using data-set and index. It did not work since I cannot add a "index name" as title for data set.
From my perspective, the Gnuplot script is hard to be run without additional parameters (ARGs). For example, reading points and title from the same data file is hard.
One data set in a file could be a key-value definition of variables, while next data sets from the file would be a data to plot. Thus ARGs will not be needed and the script and the data file would recreated a figure. Storing the ARGs, script and data triplet is less convenient than storing just a script and data file.
Anything that allows reading variables, arrays or strings and use then in script is a good choice.
The array approach is better than string hacking. I my case probably "push" and "index" would make the code much simpler (push unique 'class' or 'name' values to an array, and use 'name' array as labels (Y) and 'class' array as key title and color.
It doesn't work for me to cut-and-paste your data sample because it does not preserve tabs. Add one attachment for the data and another for the script?
Also, and perhaps most important - could you attach a figure showing what you want your final plot to look like? A hand-drawn sketch or a link to someone else's plot would be fine. You are obviously far along a particular path but it may be more productive for me to step back and see if there is a simpler path, once I know where you're going.
Such a "findword()" function would nice. In your case you don't necessarily need extra arrays for that. For this type of recurrent task, I guess a hash or dictionary would be the desired feature.
Although, you can create a lookup or hash table by misusing the
sum()function.Check the two links which are similar to your task:
https://stackoverflow.com/a/72289393/7295599
https://stackoverflow.com/a/67710390/7295599
Thanks for the simpler solution.
The feature of mapping of strings read from columns to integers(indexes) is very common in other drawing software. IMO the Gnuplot should have a demo file with it and simple functions to obtain this result. Today you need to use external tools or be very creative a theozh is.
For a standard user it is hard to find information how to make anything outside demo files, so having a variety of demo files is important. Maybe theozh can contribute some plots to demo section ?
Gnuplot file.
rawgraphs.io version of the figure - not yet fully translated.
It took around 30 minutes to find out how to do it. The key is automatically sorted and I don't like this particular feature.
Gnuplot version.
It works but the code is ugly due to lack of an equivalent of a "findword()" function.
The learning curve for Gnuplot is steep for anything that is not in demos.
After some refining the code is smaller. Using strings to draw such plot still seems to be inappropriate. Using arrays seems to be more natural way for a person with basic programing skills.
There are 'xticlabels' columns that do something similar - they build value to string array. Maybe something like this idea can be added (uniq is a new function working is a spirit of xticlabels):
This will produce an A array with unique values from column 1. With an addition of
function, it will be much easer to find out how to handle categories in Gnuplot.
In my case first column of my data can be removed when I would be able to use name of the
as defined by a multi-data-set comment. Something like:
This will also simplify c_idx_n function since NaN case will not be needed - now I need to draw four times and uses NaN from c_idx_n to build right key. Having title index(i) will make a drawing an iteration over individual data-sets.
To sum up. Anything that will make arrays more useful (loading from data sets, finding values - regexp would be a dream) and the demo how to use those new array function will make Gnuplot better and easer to use.
I think that in practice the standard linux approach would be to use universal tools like
awkanduniqto pre-process the data. But I can sympathize with not feeling comfortable with that, since I myself have never bothered to learnawk. On my own I would probably tackle this whole thing in a perl script and call gnuplot from inside it. But there again I am sympathetic to not being already familiar with perl.So here is how I would approach it using only gnuplot and only syntax already available in version 5.4.
Running this script gives
At that point you can retrieve indices directly from the array, e.g.
i = index(Classes, "Destroyer")I didn't try to convert your full script, but maybe that starting point is helpful.
And here's a simpler version using syntax that is in version 5.5. In fact it is also in 5.4 although it is marked EXPERIMENTAL. This does away with the extra columns and the blank lines in the intermediate table by using the syntax
plot ... with table if (condition)Thank you for the solutions. While the arrays are the objects that conceptually proper for the problem using them with Gnuplot is not convenient. Look at Korea2.gp file.
There are just two helper functions defined and due to flexibility of the string objects one line is needed to load the data. The string version has also an advantage that input does not be sorted to be loaded due to "hash" like behavior.
The helper functions are quite a hack. I really appreciate the imagination of the author to use sum function for the index finding.
The point is that both solutions are complicated. Look at https://www.rawgraphs.io/ how it is easy to load data and use unique values as colors.
Why not try to extend a bit syntax of stats and expose to a user an array with a result of an operation similar to "labels" (it can be named uniqlabels and the array name can be STATS_uniq_labels ) ?
Anyway, the idea of adding more functions to arrays (as you presented it at the beginning of this ticket) is a good step forward.
Could you give a more complete description or pointer to the mechanism you like in rawgraphs.io? I looked on that web site and did not find anything relevant.
Of course gnuplot also suffers difficulties in finding potentially useful features. A large part of this is because a feature added to satisfy a particular task the developer had in mind may also be relevant for applications they didn't even think of... .which is great, but those unthought-of applications are obviously not listed in an index or provided with demos or examples.
For instance
statsalready does something potentially relevant to the sort of string processing you are interested in.set datafile columnheaders; stats "FOO.dat" name "FOO"will in addition to the usual numerical analysis of the contents of file FOO.dat also load an arrayFOO_column_headerthat is a string array containing the strings found in the first row of the file. I realize that in your case you would want an array of strings found in a particular column, not row, but that's the sort of thing you had in mind, right?Another possibly relevant hard-to-find feature in gnuplot is the set of "smoothing" operations, many of which are not really smoothing at all.
smooth uniquecan do something close to what you want (collect unique values in a particular column). Unfortunately it only works for numerical values, not strings. Extended the functionality to strings might be feasible; I've never thought about it.I was referring to Aggregation feature. Unfortunately the docs of RAWGraphs are 404 as on now.
Theses are quick and simple aggregation operation on particular column. Since this is a JS you can get erratic results when you apply sum to strings. :-)
They can be very useful to quickly get some useful information without processing a data file using an external tool - see attached pdf for examples.
Right, that was one of the ideas. The xticlabels() can load uniqe ,but the result is not exposed via array. Something along these lines would be helpful + new array index func.
Since I use AWK I am used to arrays that can have an index that is a string. Gunplot does not support this, but you can always use two int indexed arrays to overcome this deficiency.
As of smooth. Anything that allow you to process the data a bit and produce an array that can be accesses will help in such cases.
I am fully aware that all of these can be achieved using external tools, but having a bit of aggregation in Gnuplot will not hurt.
There is one more difference between array and string that can be tokenized: the array cannot be passed to a function while a string can. This makes strings a bit more useful.
The development version has more complete support for arrays and array operations. You can pass an array to a function or return an array from a function. I have now added split() and join() to the set of supported array functions.