Thread: [Gnuplot-info] Data File with Group-Indexed Lines

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

gnuplot-info

[Gnuplot-info] Data File with Group-Indexed Lines

From: Yub Y. <yu...@ro...> - 2012-09-28 14:02:38

Hi all, I'm new to gnuplot.

I'm using gnuplot 4.6.0 on Ubuntu 12.04.

I've done a bunch of reading of tutorials and documentation, though there are some things I'm having trouble figuring out how to do (or even if it's possible to do).

In particular at the moment, I have the following issue -- which has been hard to search for solutions to, since the associated keywords are common to other questions and issues.

I have a file with columns of data which I would like to plot -- which is simple enough.
However, one of the columns is an index, which denotes which data-set the line belongs to.
(The lines are all in one long mixed group -- there are no sections or extra newlines.)

I would like to create a separate graph for each index in the file.

I found in the documentation[1] how to use the ternary operator to omit values based on column values, but this approach has several problems.

First, the documentation recommends setting points to undefined (1/0), which means that if I'm plotting "with linespoints", then any time there are consecutive lines in the file which have different indices, there is a break in the graph.

Second, it seems that I would need to know all of the indices in the file before running gnuplot on it, so that I could tell it to plot each of the different indices. (Using a loop perhaps, but I'd still need to know them all beforehand.)

One solution is to simply write a simple (eg. Python) script to parse the data file beforehand to sort/reformat it, break it into multiple files or sections, or something like that. Then it could create gnuplot code to hand it off to.

However, I would like to know if I can accomplish this using only gnuplot, for an arbitrary file.

Thank you so much!
-Yub

[1] Bottom of page 84: http://www.gnuplot.info/docs_4.6/gnuplot.pdf

Re: [Gnuplot-info] Data File with Group-Indexed Lines

From: walter h. <wh...@bf...> - 2012-09-28 14:34:30


Am 28.09.2012 16:02, schrieb Yub Yub:
> Hi all, I'm new to gnuplot.
> 
> I'm using gnuplot 4.6.0 on Ubuntu 12.04.
> 
> 
> I've done a bunch of reading of tutorials and documentation, though there are some things I'm having trouble figuring out how to do (or even if it's possible to do).
> 
> In particular at the moment, I have the following issue -- which has been hard to search for solutions to, since the associated keywords are common to other questions and issues.
> 
> I have a file with columns of data which I would like to plot -- which is simple enough.
> However, one of the columns is an index, which denotes which data-set the line belongs to.
> (The lines are all in one long mixed group -- there are no sections or extra newlines.)
> 
> I would like to create a separate graph for each index in the file.
> 
> I found in the documentation[1] how to use the ternary operator to omit values based on column values, but this approach has several problems.
> 
> First, the documentation recommends setting points to undefined (1/0), which means that if I'm plotting "with linespoints", then any time there are consecutive lines in the file which have different indices, there is a break in the graph.
> 
> Second, it seems that I would need to know all of the indices in the file before running gnuplot on it, so that I could tell it to plot each of the different indices. (Using a loop perhaps, but I'd still need to know them all beforehand.)
> 
> One solution is to simply write a simple (eg. Python) script to parse the data file beforehand to sort/reformat it, break it into multiple files or sections, or something like that. Then it could create gnuplot code to hand it off to.
> 
> However, I would like to know if I can accomplish this using only gnuplot, for an arbitrary file.
> 
> Thank you so much!
> -Yub
> 
> 

so far i understand you have a file with a index,value pair like that:

0       73      0       25      0       124302  000000.003107550
0       74      0       28      0       150160  000000.003754000
0       75      0       39      0       244268  000000.006106700
0       76      1       87      0       259467  000000.006486675
0       77      0       149     0       259537  000000.006488425
0       78      0       32      0       268263  000000.006706575
0       79      1       282     0       281296  000000.007032400
0       80      0       481     0       281366  000000.007034150
0       81      0       13      1       36992   000000.010924800
0       82      1       134     1       56228   000000.011405700
0       83      0       228     1       56298   000000.011407450
0       84      1       100     1       77730   000000.011943250
0       85      0       171     1       77800   000000.011945000
0       86      0       115     1       77981   000000.011949525
0       87      1       34      1       275359  000000.016883975

lets assume that column 3 is your index and column 7 is you value.
I am not sure if gnuplot is intended to do such filtering. So i use
awk for that, i found it very flexible an easy to use and less fat
than certain other languages.

awk '{ print $7 >>$3 }' [your file here]

This would generate a file called 1 and a second called 0 feel free to ajust to you needs.
You can run that script from inside gnuplot also by using

plot "<awk '{ if ($3==0) print $7 }' " but this would require a recalculation everytime
you want a plot.


re,
 wh

Re: [Gnuplot-info] Data File with Group-Indexed Lines

From: Yub Y. <yu...@ro...> - 2012-09-28 14:44:13



>________________________________
> From: walter harms <wh...@bf...>
>To: gnu...@li... 
>Sent: Friday, September 28, 2012 10:34 AM
>Subject: Re: [Gnuplot-info] Data File with Group-Indexed Lines
> 
>Am 28.09.2012 16:02, schrieb Yub Yub:
>> Hi all, I'm new to gnuplot.
>> 
>> I'm using gnuplot 4.6.0 on Ubuntu 12.04.
>> 
>> 
>> I've done a bunch of reading of tutorials and documentation, though there are some things I'm having trouble figuring out how to do (or even if it's possible to do).
>> 
>> In particular at the moment, I have the following issue -- which has been hard to search for solutions to, since the associated keywords are common to other questions and issues.
>> 
>> I have a file with columns of data which I would like to plot -- which is simple enough.
>> However, one of the columns is an index, which denotes which data-set the line belongs to.
>> (The lines are all in one long mixed group -- there are no sections or extra newlines.)
>> 
>> I would like to create a separate graph for each index in the file.
>> 
>> I found in the documentation[1] how to use the ternary operator to omit values based on column values, but this approach has several problems.
>> 
>> First, the documentation recommends setting points to undefined (1/0), which means that if I'm plotting "with linespoints", then any time there are consecutive lines in the file which have different indices, there is a break in the graph.
>> 
>> Second, it seems that I would need to know all of the indices in the file before running gnuplot on it, so that I could tell it to plot each of the different indices. (Using a loop perhaps, but I'd still need to know them all beforehand.)
>> 
>> One solution is to simply write a simple (eg. Python) script to parse the data file beforehand to sort/reformat it, break it into multiple files or sections, or something like that. Then it could create gnuplot code to hand it off to.
>> 
>> However, I would like to know if I can accomplish this using only gnuplot, for an arbitrary file.
>> 
>> Thank you so much!
>> -Yub
>> 
>> 
>
>so far i understand you have a file with a index,value pair like that:
>
>0       73      0       25      0       124302  000000.003107550
>0       74      0       28      0       150160  000000.003754000
>0       75      0       39      0       244268  000000.006106700
>0       76      1       87      0       259467  000000.006486675
>0       77      0       149     0       259537  000000.006488425
>0       78      0       32      0       268263  000000.006706575
>0       79      1       282     0       281296  000000.007032400
>0       80      0       481     0       281366  000000.007034150
>0       81      0       13      1       36992   000000.010924800
>0       82      1       134     1       56228   000000.011405700
>0       83      0       228     1       56298   000000.011407450
>0       84      1       100     1       77730   000000.011943250
>0       85      0       171     1       77800   000000.011945000
>0       86      0       115     1       77981   000000.011949525
>0       87      1       34      1       275359  000000.016883975
>
>lets assume that column 3 is your index and column 7 is you value.
>I am not sure if gnuplot is intended to do such filtering. So i use
>awk for that, i found it very flexible an easy to use and less fat
>than certain other languages.
>
>awk '{ print $7 >>$3 }' [your file here]
>
>This would generate a file called 1 and a second called 0 feel free to ajust to you needs.
>You can run that script from inside gnuplot also by using
>
>plot "<awk '{ if ($3==0) print $7 }' " but this would require a recalculation everytime
>you want a plot.
>
>
>re,
>wh
>
>
>

Re: [Gnuplot-info] Data File with Group-Indexed Lines

From: Yub Y. <yu...@ro...> - 2012-09-28 16:02:05

>________________________________
> From: walter harms <wh...@bf...>
>To: gnu...@li... 
>Sent: Friday, September 28, 2012 10:34 AM
>Subject: Re: [Gnuplot-info] Data File with Group-Indexed Lines
> 
>
>Am 28.09.2012 16:02, schrieb Yub Yub:
>> Hi all, I'm new to gnuplot.
>> 
>> I'm using gnuplot 4.6.0 on Ubuntu 12.04.
>> 
>> 
>> I've done a bunch of reading of tutorials and documentation, though there are some things I'm having trouble figuring out how to do (or even if it's possible to do).
>> 
>> In particular at the moment, I have the following issue -- which has been hard to search for solutions to, since the associated keywords are common to other questions and issues.
>> 
>> I have a file with columns of data which I would like to plot -- which is simple enough.
>> However, one of the columns is an index, which denotes which data-set the line belongs to.
>> (The lines are all in one long mixed group -- there are no sections or extra newlines.)
>> 
>> I would like to create a separate graph for each index in the file.
>> 
>> I found in the documentation[1] how to use the ternary operator to omit values based on column values, but this approach has several problems.
>> 
>> First, the documentation recommends setting points to undefined (1/0), which means that if I'm plotting "with linespoints", then any time there are consecutive lines in the file which have different indices, there is a break in the graph.
>> 
>> Second, it seems that I would need to know all of the indices in the file before running gnuplot on it, so that I could tell it to plot each of the different indices. (Using a loop perhaps, but I'd still need to know them all beforehand.)
>> 
>> One solution is to simply write a simple (eg. Python) script to parse the data file beforehand to sort/reformat it, break it into multiple files or sections, or something like that. Then it could create gnuplot code to hand it off to.
>> 
>> However, I would like to know if I can accomplish this using only gnuplot, for an arbitrary file.
>> 
>> Thank you so much!
>> -Yub
>> 
>> 
>
>so far i understand you have a file with a index,value pair like that:
>
>0       73      0       25      0       124302  000000.003107550
>0       74      0       28      0       150160  000000.003754000
>0       75      0       39      0       244268  000000.006106700
>0       76      1       87      0       259467  000000.006486675
>0       77      0       149     0       259537  000000.006488425
>0       78      0       32      0       268263  000000.006706575
>0       79      1       282     0       281296  000000.007032400
>0       80      0       481     0       281366  000000.007034150
>0       81      0       13      1       36992   000000.010924800
>0       82      1       134     1       56228   000000.011405700
>0       83      0       228     1       56298   000000.011407450
>0       84      1       100     1       77730   000000.011943250
>0       85      0       171     1       77800   000000.011945000
>0       86      0       115     1       77981   000000.011949525
>0       87      1       34      1       275359  000000.016883975
>
>lets assume that column 3 is your index and column 7 is you value.
>I am not sure if gnuplot is intended to do such filtering. So i use
>awk for that, i found it very flexible an easy to use and less fat
>than certain other languages.
>
>awk '{ print $7 >>$3 }' [your file here]
>
>This would generate a file called 1 and a second called 0 feel free to ajust to you needs.
>You can run that script from inside gnuplot also by using
>
>plot "<awk '{ if ($3==0) print $7 }' " but this would require a recalculation everytime
>you want a plot.
>
>
>re,
>wh


Sorry all, for the accidental empty message a little while ago. >_<


Walter--

Thank you!
I forgot about the ability to call scripts like that from within gnuplot [1], that seems like it will be exactly what I need!


And that was a good idea including some example data -- sorry I didn't think of it when I posted the question.

Here is the full solution I have now. It's a bit ugly, but it seems to work just fine!


If anyone has a more elegant solution, please let me know!


blah.dat:

    0    1    1
    1     2    12
    1    3    13
    0    4    4
    0    5    5
    1    6    16
    0    7    7


gnuplot code:

    set key on outside
    plot for [groupIndex in "`awk '{ if ($1 != "" ) print $1 }' blah.dat | sort | uniq | sed ':a;N;!ba;s/\n/ /g'`"] "<awk '{ if ($1==".groupIndex.") print $2\" \"$3 }' blah.dat" using 1:2 title groupIndex with linespoints


Explanation:

    awk '{ if ($1 != "" ) print $1 }' blah.dat | sort | uniq | sed ':a;N;!ba;s/\n/ /g
Use awk to get indices of all lines, 

sort them (because uniq only works on consecutive lines), 

get the unique values (so that we only count each one once), 
use sed to change it from a new-line separated list to a space separated list.

Enclose the above in `` so that gnuplot will do a substitution after running it as a system command [2].


    plot for [groupIndex in "..."]
Plot stuff for each time through a loop  [3] where we loop the variable groupIndex over the list from above.


    "<awk '{ if ($1==".groupIndex.") print $2\" \"$3 }' blah.dat"
Filter the data based on Walter's suggestion, inserting the value of groupIndex.
Based on the second example of [3] ( basename.".dat" ), it seems that you can concatenate strings using periods.


Thanks again!
-Yub



[1] http://www.gnuplot.info/faq/faq.html#SECTION00082000000000000000

[2] Page 39, "Substitution of system commands in backquotes": http://www.gnuplot.info/docs_4.6/gnuplot.pdf

[3] Page 70, "Iteration": http://www.gnuplot.info/docs_4.6/gnuplot.pdf

Re: [Gnuplot-info] Data File with Group-Indexed Lines

From: walter h. <wh...@bf...> - 2012-09-28 16:19:35


Am 28.09.2012 18:01, schrieb Yub Yub:
> 
>> ________________________________
>> From: walter harms <wh...@bf...>
>> To: gnu...@li... 
>> Sent: Friday, September 28, 2012 10:34 AM
>> Subject: Re: [Gnuplot-info] Data File with Group-Indexed Lines
>>
>>
>> Am 28.09.2012 16:02, schrieb Yub Yub:
>>> Hi all, I'm new to gnuplot.
>>>
>>> I'm using gnuplot 4.6.0 on Ubuntu 12.04.
>>>
>>>
>>> I've done a bunch of reading of tutorials and documentation, though there are some things I'm having trouble figuring out how to do (or even if it's possible to do).
>>>
>>> In particular at the moment, I have the following issue -- which has been hard to search for solutions to, since the associated keywords are common to other questions and issues.
>>>
>>> I have a file with columns of data which I would like to plot -- which is simple enough.
>>> However, one of the columns is an index, which denotes which data-set the line belongs to.
>>> (The lines are all in one long mixed group -- there are no sections or extra newlines.)
>>>
>>> I would like to create a separate graph for each index in the file.
>>>
>>> I found in the documentation[1] how to use the ternary operator to omit values based on column values, but this approach has several problems.
>>>
>>> First, the documentation recommends setting points to undefined (1/0), which means that if I'm plotting "with linespoints", then any time there are consecutive lines in the file which have different indices, there is a break in the graph.
>>>
>>> Second, it seems that I would need to know all of the indices in the file before running gnuplot on it, so that I could tell it to plot each of the different indices. (Using a loop perhaps, but I'd still need to know them all beforehand.)
>>>
>>> One solution is to simply write a simple (eg. Python) script to parse the data file beforehand to sort/reformat it, break it into multiple files or sections, or something like that. Then it could create gnuplot code to hand it off to.
>>>
>>> However, I would like to know if I can accomplish this using only gnuplot, for an arbitrary file.
>>>
>>> Thank you so much!
>>> -Yub
>>>
>>>
>>
>> so far i understand you have a file with a index,value pair like that:
>>
>> 0       73      0       25      0       124302  000000.003107550
>> 0       74      0       28      0       150160  000000.003754000
>> 0       75      0       39      0       244268  000000.006106700
>> 0       76      1       87      0       259467  000000.006486675
>> 0       77      0       149     0       259537  000000.006488425
>> 0       78      0       32      0       268263  000000.006706575
>> 0       79      1       282     0       281296  000000.007032400
>> 0       80      0       481     0       281366  000000.007034150
>> 0       81      0       13      1       36992   000000.010924800
>> 0       82      1       134     1       56228   000000.011405700
>> 0       83      0       228     1       56298   000000.011407450
>> 0       84      1       100     1       77730   000000.011943250
>> 0       85      0       171     1       77800   000000.011945000
>> 0       86      0       115     1       77981   000000.011949525
>> 0       87      1       34      1       275359  000000.016883975
>>
>> lets assume that column 3 is your index and column 7 is you value.
>> I am not sure if gnuplot is intended to do such filtering. So i use
>> awk for that, i found it very flexible an easy to use and less fat
>> than certain other languages.
>>
>> awk '{ print $7 >>$3 }' [your file here]
>>
>> This would generate a file called 1 and a second called 0 feel free to ajust to you needs.
>> You can run that script from inside gnuplot also by using
>>
>> plot "<awk '{ if ($3==0) print $7 }' " but this would require a recalculation everytime
>> you want a plot.
>>
>>
>> re,
>> wh
> 
> 
> Sorry all, for the accidental empty message a little while ago. >_<
> 
> 
> Walter--
> 
> Thank you!
> I forgot about the ability to call scripts like that from within gnuplot [1], that seems like it will be exactly what I need!
> 
> 
> And that was a good idea including some example data -- sorry I didn't think of it when I posted the question.
> 
> Here is the full solution I have now. It's a bit ugly, but it seems to work just fine!
> 
> 
> If anyone has a more elegant solution, please let me know!
> 
> 
> blah.dat:
> 
>     0    1    1
>     1     2    12
>     1    3    13
>     0    4    4
>     0    5    5
>     1    6    16
>     0    7    7
> 
> 
> gnuplot code:
> 
>     set key on outside
>     plot for [groupIndex in "`awk '{ if ($1 != "" ) print $1 }' blah.dat | sort | uniq | sed ':a;N;!ba;s/\n/ /g'`"] "<awk '{ if ($1==".groupIndex.") print $2\" \"$3 }' blah.dat" using 1:2 title groupIndex with linespoints
> 
> 
> Explanation:
> 
>     awk '{ if ($1 != "" ) print $1 }' blah.dat | sort | uniq | sed ':a;N;!ba;s/\n/ /g
> Use awk to get indices of all lines, 
> 

I do not know how large your index is but i would use "sort -n" to force numeric.
Your sed seems to do more that replaceing \n with " " but i prefer to use tr for that
tr "\n" " " should do the job. (more easy to read afterwards).
re,
 wh

> sort them (because uniq only works on consecutive lines), 
> 
> get the unique values (so that we only count each one once), 
> use sed to change it from a new-line separated list to a space separated list.
> 
> Enclose the above in `` so that gnuplot will do a substitution after running it as a system command [2].
> 
> 
>     plot for [groupIndex in "..."]
> Plot stuff for each time through a loop  [3] where we loop the variable groupIndex over the list from above.
> 
> 
>     "<awk '{ if ($1==".groupIndex.") print $2\" \"$3 }' blah.dat"
> Filter the data based on Walter's suggestion, inserting the value of groupIndex.
> Based on the second example of [3] ( basename.".dat" ), it seems that you can concatenate strings using periods.
> 
> 
> Thanks again!
> -Yub
> 
> 
> 
> [1] http://www.gnuplot.info/faq/faq.html#SECTION00082000000000000000
> 
> [2] Page 39, "Substitution of system commands in backquotes": http://www.gnuplot.info/docs_4.6/gnuplot.pdf
> 
> [3] Page 70, "Iteration": http://www.gnuplot.info/docs_4.6/gnuplot.pdf
> 
> 
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> gnuplot-info mailing list
> gnu...@li...
> https://lists.sourceforge.net/lists/listinfo/gnuplot-info
>