|
From: Karthikraja V. <vel...@gm...> - 2011-05-25 00:21:38
|
Hello friends,
I am a newbee to matplotlib and I am trying to plot (scatter plot) some
values. The data is quite big and I have them in a CSV file. For a starter I
thought I will use loadrec.py example to see if I am able to import the data
from the CSV file. The loadrec.py goes like this:
from matplotlib import mlab
from pylab import figure, show
import matplotlib.cbook as cbook
datafile = cbook.get_sample_data('msft.csv', asfileobj=False)
print 'loading', datafile
a = mlab.csv2rec(datafile)
a.sort()
print a.dtype
fig = figure()
ax = fig.add_subplot(111)
ax.plot(a.date, a.adj_close, '-')
fig.autofmt_xdate()
I believe, for the CSV file to be accessed, it has to be placed in the
sample_data folder (for windows). So I placed my csv file in the sample_data
folder and ran the script.
The output was
Traceback (most recent call last):
File "C:\Python26\loadrec.py", line 5, in <module>
datafile = cbook.get_sample_data('ch1.csv', asfileobj=False)
File "C:\Python26\Lib\site-packages\matplotlib\cbook.py", line 662, in
get_sample_data
return myserver.get_sample_data(fname, asfileobj=asfileobj)
File "C:\Python26\Lib\site-packages\matplotlib\cbook.py", line 620, in
get_sample_data
raise KeyError(msg)
KeyError: 'file ch1.csv not in cache; received HTTP Error 404: Not Found
when trying to retrieve'
The data in my CSV file looks like this
0.9963
0
0.499
0.9901
0.0025
0
1
0.0017
1
0.0173
0.9837
If anyone can understand the problem please give me your suggestions. I will
be very thankful if any of you can show me exactly how to scatter plot this
kind of data.
Karthikraja Velmurugan,
Graduate research assistant,
Dept of Biomedical Informatics,
Arizona State University,
248-421-7394
|
|
From: Daniel M. <dan...@go...> - 2011-05-25 07:07:04
|
Hi,
firstly, I do not fully understand why you have chosen such a complicated
solution to a rather simple problem. If the data in your file really is like
the example then you could simply put the file 'ch1.csv' into the same
directory as your Python script.
I have slightly modified it (I don't like the "from" import statements too
much) and commented your lines.
#from matplotlib import mlab
#from pylab import figure, show
#import matplotlib.cbook as cbook
import pylab
#datafile = cbook.get_sample_data('ch1.csv', asfileobj=False)
datafile = 'ch1.csv'
print 'loading', datafile
#a = mlab.csv2rec(datafile)
a = pylab.loadtxt(datafile, comments='#', delimiter=';')
a.sort()
print a.dtype
fig = pylab.figure()
ax = fig.add_subplot(111)
#ax.plot(a.date, a.adj_close, '-')
#fig.autofmt_xdate()
ax.plot(a, 'o')
fig.show()
I hope it helps, let me know wether you need a different approach!
2011/5/25 Karthikraja Velmurugan <vel...@gm...>
> Hello friends,
>
>
>
> I am a newbee to matplotlib and I am trying to plot (scatter plot) some
> values. The data is quite big and I have them in a CSV file. For a starter I
> thought I will use *loadrec.py* example to see if I am able to import the
> data from the CSV file. The loadrec.py goes like this:
>
>
>
> from matplotlib import mlab
>
> from pylab import figure, show
>
> import matplotlib.cbook as cbook
>
>
>
> datafile = cbook.get_sample_data('msft.csv', asfileobj=False)
>
> print 'loading', datafile
>
> a = mlab.csv2rec(datafile)
>
> a.sort()
>
> print a.dtype
>
>
>
> fig = figure()
>
> ax = fig.add_subplot(111)
>
> ax.plot(a.date, a.adj_close, '-')
>
> fig.autofmt_xdate()
>
> I believe, for the CSV file to be accessed, it has to be placed in the *
> sample_data* folder (for windows). So I placed my csv file in the
> sample_data folder and ran the script.
>
>
>
> The output was
>
>
>
> *Traceback (most recent call last):*
>
> * File "C:\Python26\loadrec.py", line 5, in <module>*
>
> * datafile = cbook.get_sample_data('ch1.csv', asfileobj=False)*
>
> * File "C:\Python26\Lib\site-packages\matplotlib\cbook.py", line 662, in
> get_sample_data*
>
> * return myserver.get_sample_data(fname, asfileobj=asfileobj)*
>
> * File "C:\Python26\Lib\site-packages\matplotlib\cbook.py", line 620, in
> get_sample_data*
>
> * raise KeyError(msg)*
>
> *KeyError: 'file ch1.csv not in cache; received HTTP Error 404: Not Found
> when trying to retrieve'*
>
>
>
> The data in my CSV file looks like this
>
>
>
> 0.9963
>
> 0
>
> 0.499
>
> 0.9901
>
> 0.0025
>
> 0
>
> 1
>
> 0.0017
>
> 1
>
> 0.0173
>
> 0.9837
>
> If anyone can understand the problem please give me your suggestions. I
> will be very thankful if any of you can show me exactly how to scatter plot
> this kind of data.
>
>
>
> *Karthikraja Velmurugan, *
>
> *Graduate research assistant, *
>
> *Dept of Biomedical Informatics, *
>
> *Arizona State University, *
>
> *248-421-7394*
>
>
>
>
> ------------------------------------------------------------------------------
> vRanger cuts backup time in half-while increasing security.
> With the market-leading solution for virtual backup and recovery,
> you get blazing-fast, flexible, and affordable data protection.
> Download your free trial now.
> http://p.sf.net/sfu/quest-d2dcopy1
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
|
|
From: Karthikraja V. <vel...@gm...> - 2011-05-27 20:17:03
|
Hello Daniel, The code you have given is simple and works fab. Thank you very much. But I wasn't able to find an example which accesses the columns of a CSV files when I import data through "datafile="filename.csv"" option. It will be great if you could help with accessing individual columns. What excatly I am looking for is to access individual coulmns (of the same CSV file), do calculations using the two coumns and plot them into seperate subplots of the same graph. I modified the script a lil bit. Please find it below: *import matplotlib.pyplot as plt import pylab datafile1 = 'ch1_s1_lrr.csv' datafile2 = 'ch1_s1_baf.csv'* *a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';') b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')* *v1 = [0,98760,0,1] v2 = [0,98760,-2,2]* *plt.figure(1)* *plt.subplot(4,1,1) print 'loading', datafile1 plt.axis(v2) plt.plot(a1, 'r.')* *plt.subplot(4,1,2) print 'loading', datafile2 plt.axis(v1) plt.plot(b1, 'b.')* *plt.show()* Thank you very much in advance for your time and suggestions. Karthik |
|
From: Daniel M. <dan...@go...> - 2011-05-30 12:18:25
|
Hi, the content of the CSV is stored as an array after reading. You can simply access rows and columns like in Matlab: firstrow = a1[0] firstcol = a1.T[0] The .T transposes the array. The second element of the third row would be elem32 = a1[2][1] which is equivalent to elem32 = a1[2,1] A range of e.g. rows 3 to 6 is range36 = a1[2:6] Please have a look here for getting started with scipy/numpy: http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html and http://www.scipy.org/NumPy_for_Matlab_Users Hope this helps, Daniel 2011/5/27 Karthikraja Velmurugan <vel...@gm...>: > Hello Daniel, > > The code you have given is simple and works fab. Thank you very much. But I > wasn't able to find an example which accesses the columns of a CSV files > when I import data through "datafile="filename.csv"" option. It will be > great if you could help with accessing individual columns. What excatly I am > looking for is to access individual coulmns (of the same CSV file), do > calculations using the two coumns and plot them into seperate subplots of > the same graph. > I modified the script a lil bit. Please find it below: > > import matplotlib.pyplot as plt > import pylab > datafile1 = 'ch1_s1_lrr.csv' > datafile2 = 'ch1_s1_baf.csv' > a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';') > b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';') > v1 = [0,98760,0,1] > v2 = [0,98760,-2,2] > plt.figure(1) > plt.subplot(4,1,1) > print 'loading', datafile1 > plt.axis(v2) > plt.plot(a1, 'r.') > plt.subplot(4,1,2) > print 'loading', datafile2 > plt.axis(v1) > plt.plot(b1, 'b.') > plt.show() > > Thank you very much in advance for your time and suggestions. > > Karthik |
|
From: Karthikraja V. <vel...@gm...> - 2011-06-03 18:48:18
|
Hello guys, I was able to plot when I only had 1 column. But now I have a CSV file that has 10,000 rows and 12 columns. I am trying to write a code to plot all these 12 columns into 12 subplots of one graph. Below found is my code for just one column in one csv file. BTW csv2rec does not work in my version of matplotlib. import matplotlib.pyplot as plt import pylab datafile1 = 'ch1_s1_lrr.csv' datafile2 = 'ch1_s1_baf.csv' a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';') b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';') v1 = [0,98760,0,1] v2 = [0,98760,-2,2] plt.figure(1) plt.subplot(2,1,1) print 'loading', datafile1 plt.axis(v2) plt.plot(a1, 'r.') plt.subplot(2,1,2) print 'loading', datafile2 plt.axis(v1) plt.plot(b1, 'b.') plt.show() Now I want to be able to import 12 columns from the same file and plot all the values of the 1st six columns and only the values less then 0.05 for the next six columns. I am a beginner for python and matplotlib and I have never used arrays before so I am stuck at this point for a more than a week. Please help!!! Any help is appreciated. Thank you for your time and valuable suggestion Karthik |
|
From: Daniel M. <dan...@go...> - 2011-06-04 12:58:03
|
Hi,
have you tried the examples that I have provided a couple days ago,
see below? I cannot see why it should not work. These are the absolute
basics that you need to understand.
Btw, there is no need to use csv2rec unless you want/need column or row headers.
Here's a full script that does what you want. Now, please take the
time and work through the example that I have provided. In case you
need further help, please don't start a new thread but reply to this
one.
Best regards,
Daniel
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import pylab
import scipy
datafile1 = 'ch1_s1_lrr.csv'
datafile2 = 'ch1_s1_baf.csv'
## create dummy data
data = pylab.rand(10000,12)
pylab.savetxt(datafile1, data, delimiter=';')
pylab.savetxt(datafile2, data, delimiter=';')
## load data and transpose
a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';').T
print 'loading', datafile1
b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';').T
print 'loading', datafile2
## axis limits
#v1 = [0,98760,0,1]
#v2 = [0,98760,-2,2]
v1 = [0,1]
v2 = [-2,2]
plt.close('all')
plt.figure()
plt.subplot(2,1,1)
#plt.axis(v2)
plt.ylim(v2)
#plt.plot(a1, 'r.')
for i in range(6):
plt.plot(a1[i])
plt.subplot(2,1,2)
#plt.axis(v1)
plt.ylim(v1)
#plt.plot(b1, 'b.')
## need masked arrays here
## http://physics.nmt.edu/~raymond/software/python_notes/paper003.html
m = b1 >= 0.05
b1masked = scipy.ma.array(b1,mask=m)
## print first two cols
print b1masked[0:2]
for i in range(6,12):
plt.plot(b1masked[i])
plt.show()
2011/6/3 Karthikraja Velmurugan <vel...@gm...>:
> import matplotlib.pyplot as plt
> import pylab
> datafile1 = 'ch1_s1_lrr.csv'
> datafile2 = 'ch1_s1_baf.csv'
>
> a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';')
> b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')
>
> v1 = [0,98760,0,1]
> v2 = [0,98760,-2,2]
>
> plt.figure(1)
>
> plt.subplot(2,1,1)
> print 'loading', datafile1
> plt.axis(v2)
> plt.plot(a1, 'r.')
>
> plt.subplot(2,1,2)
> print 'loading', datafile2
> plt.axis(v1)
> plt.plot(b1, 'b.')
>
> plt.show()
2011/5/30 Daniel Mader <dan...@go...>:
> Hi,
>
> the content of the CSV is stored as an array after reading. You can
> simply access rows and columns like in Matlab:
>
> firstrow = a1[0]
> firstcol = a1.T[0]
>
> The .T transposes the array.
>
> The second element of the third row would be
>
> elem32 = a1[2][1]
> which is equivalent to
> elem32 = a1[2,1]
>
> A range of e.g. rows 3 to 6 is
> range36 = a1[2:6]
>
> Please have a look here for getting started with scipy/numpy:
> http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html
> and
> http://www.scipy.org/NumPy_for_Matlab_Users
>
> Hope this helps,
> Daniel
>
> 2011/5/27 Karthikraja Velmurugan <vel...@gm...>:
>> Hello Daniel,
>>
>> The code you have given is simple and works fab. Thank you very much. But I
>> wasn't able to find an example which accesses the columns of a CSV files
>> when I import data through "datafile="filename.csv"" option. It will be
>> great if you could help with accessing individual columns. What excatly I am
>> looking for is to access individual coulmns (of the same CSV file), do
>> calculations using the two coumns and plot them into seperate subplots of
>> the same graph.
>> I modified the script a lil bit. Please find it below:
>>
>> import matplotlib.pyplot as plt
>> import pylab
>> datafile1 = 'ch1_s1_lrr.csv'
>> datafile2 = 'ch1_s1_baf.csv'
>> a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';')
>> b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')
>> v1 = [0,98760,0,1]
>> v2 = [0,98760,-2,2]
>> plt.figure(1)
>> plt.subplot(4,1,1)
>> print 'loading', datafile1
>> plt.axis(v2)
>> plt.plot(a1, 'r.')
>> plt.subplot(4,1,2)
>> print 'loading', datafile2
>> plt.axis(v1)
>> plt.plot(b1, 'b.')
>> plt.show()
>>
>> Thank you very much in advance for your time and suggestions.
>>
>> Karthik
|
|
From: Karthikraja V. <vel...@gm...> - 2011-06-11 08:00:15
|
*Hi Daniel, *
* *
*I used the code but there is small issue. I forgot to mention that my
values are signed and unsigned decimal values. *
*My values look like this
*
0.0023 -0.0456 0.0419 0.094 -0.0004 0.0236 -0.0237 -0.0043 -0.0718 0.0095
0.0592 -0.0417 0.0023 0.0386 -0.0023 -0.0236 -0.1045 0.098 -0.0006 0.0516
0.0463 -0.0035 -0.0442 0.1371 0.022 -0.0222 0.256 0.4903 0.0662 -0.0763
0.0064 0.1404
*After running the code the "pylab.savetxt" saves the same data something
like this*
8.205965840870644800e-01;8.034591567160346300e-01;5.493847743502982000e-01;2.581157685701491700e-01;6.409997826977161800e-01;3.719908502347885100e-01
*When I tried to extract data and print them they look like this (totally
different from the actual values!)*
[ 0.18353712 0.30468928 0.16164556 ..., 0.98860032 0.49681098
0.77393306]
*When I tried not using the "pylab.savetxt" function it gives an error like
below:*
ValueError: invalid literal for float():
0.0023,-0.0456,0.0419,0.094,0.0224,0.0365
*Is there a specific way to handle signed decimal number? If so please
suggest some changes.* And also I did try using the "array[]" to access
individual comulns but I get an error saying the numpy.ndarray object not
callable.
*import matplotlib.pyplot as plt
import pylab
import scipy
import numpy
datafile1 = 'vet1.csv'
data = pylab.rand(98760,6)
pylab.savetxt(datafile1, data, delimiter=';')
a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';').T
print 'loading', datafile1
v1 = [0,1]
v2 = [-2,2]*
*plt.close('all')
plt.figure()*
*plt.ylim(v2)
for i in range(2):
plt.plot(a1[i])*
*plt.show()*
-Karthik
|
|
From: Daniel M. <dan...@go...> - 2011-06-11 08:34:48
Attachments:
karthikIN.csv
untitled0.py
|
Hi Karthik, I cannot find any problem with your code. You are mixing modules a little too much to my taste but it's not a technical problem. Loading and saving the data works flawless here. Attached is an infile and a modified script, please try this. 2011/6/11 Karthikraja Velmurugan <vel...@gm...> > *Hi Daniel, * > * * > *I used the code but there is small issue. I forgot to mention that my > values are signed and unsigned decimal values. * > *My values look like this > * > 0.0023 -0.0456 0.0419 0.094 -0.0004 0.0236 -0.0237 -0.0043 -0.0718 > 0.0095 0.0592 -0.0417 0.0023 0.0386 -0.0023 -0.0236 -0.1045 0.098 > -0.0006 0.0516 0.0463 -0.0035 -0.0442 0.1371 0.022 -0.0222 0.256 0.4903 > 0.0662 -0.0763 0.0064 0.1404 > > *After running the code the "pylab.savetxt" saves the same data something > like this* > > > 8.205965840870644800e-01;8.034591567160346300e-01;5.493847743502982000e-01;2.581157685701491700e-01;6.409997826977161800e-01;3.719908502347885100e-01 > I assume you are confused about the many decimals. Whenever floats are processed by Python they are real floats, see here: http://docs.python.org/release/2.5.2/tut/node16.html To me, it looks as if you have truncated the lines, but otherwise there is nothing wrong... *When I tried to extract data and print them they look like this (totally > different from the actual values!)* > > [ 0.18353712 0.30468928 0.16164556 ..., 0.98860032 0.49681098 > 0.77393306] > Yes, these are different numbers. But I assume you are comparing different rows or columns?! > *When I tried not using the "pylab.savetxt" function it gives an error > like below:* > > ValueError: invalid literal for float(): > 0.0023,-0.0456,0.0419,0.094,0.0224,0.0365 > This error message tells you that you are trying to save non-numeric data to a file with that command. Eg. this will cause the same error: scipy.savetxt('asdfasdf.dat', 'asdfasdf') It is *VERY* hard to tell what you are doing since you don't provide exact pieces of code. > *Is there a specific way to handle signed decimal number? If so please > suggest some changes.* And also I did try using the "array[]" to access > individual comulns but I get an error saying the numpy.ndarray object not > callable. > I must ask again? Have you played with the examples that I provided? You are using the function in a wrong way (again, I can't tell for sure since you don't provide code): In order to acces the first row from a data array, you simply use data[0], the first column is data.T[0]. *import matplotlib.pyplot as plt > import pylab > import scipy > import numpy > datafile1 = 'vet1.csv' > data = pylab.rand(98760,6) > pylab.savetxt(datafile1, data, delimiter=';') > a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';').T > print 'loading', datafile1 > v1 = [0,1] > v2 = [-2,2]* > *plt.close('all') > plt.figure()* > *plt.ylim(v2) > for i in range(2): > plt.plot(a1[i])* > *plt.show()* > > -Karthik > Please do provide all steps that cause problems, not just the results. It is impossible to help you with assumptions and guesses :) Best regards, Daniel |