If a user creates a scatterplot with some factor variable, there is a lot of -- I would say -- unnecessary output in the plot making it hard to see some aspects.
Here is an example:
nulldata 50
series y = normal()
series x = normal()
series factor = randgen(i, 1, 4)
strings keys = defarray("Country A", "Country B", "Country C", "Country D")
stringify(factor, keys)
gnuplot y x factor --dummy --output=display
The output is attached. As can be seen, for each discrete value of the factor variable, there is written "y (factor=Country X)".
I have the following suggestions:
1. Remove the variable name (here y) and rather put it as the y-label description left to the y-axis.
2. As the user knows that he/she has created a factorized scatterplot and also sees the different types of points, I think there is no need to add the string "(factor=)" to the legend.
3. Just print the value of the factor-series:
- If factor is a string variable, simply print the string-value for each discrete unit, e.g. "Country A", "Country B"...
- If factor is a numeric series, one could print for instance "f=numeric_value" or so.
This would improve the readability, I think, especially if one has a dataset with many distinct factors (e.g. regions of the world).
Artur
I agree with Artur on all counts.
These suggestions are now implemented in git.
That looks nice, Allin. Thank you for the quick implementation!
I close the ticket.