From: Lionel R. <lro...@li...> - 2006-09-15 12:50:31
|
Hi all, I try to use recarray with rec.fromrecords on time-series, datas come from a file where they are stored in csv format, with after each data colum there is one column meanning the state of the data, and the first column is for dates. Then, is it possible to directly transform column of strings to a integer one (or datetime one), and to remove a not used column? thanks -- Lionel |
From: Robert K. <rob...@gm...> - 2006-09-15 13:58:22
|
Lionel Roubeyrie wrote: > Hi all, > I try to use recarray with rec.fromrecords on time-series, datas come from a > file where they are stored in csv format, with after each data colum there is > one column meanning the state of the data, and the first column is for dates. > Then, is it possible to directly transform column of strings to a integer one > (or datetime one), and to remove a not used column? When I import CSV files into record arrays, I usually read in all of the data and transpose the list of rows to get a list of columns. Then I can remove columns and transform them _en masse_, usually with map(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Francesc A. <fa...@ca...> - 2006-09-15 14:06:05
|
A Divendres 15 Setembre 2006 15:57, Robert Kern va escriure: > Lionel Roubeyrie wrote: > > Hi all, > > I try to use recarray with rec.fromrecords on time-series, datas come > > from a file where they are stored in csv format, with after each data > > colum there is one column meanning the state of the data, and the first > > column is for dates. Then, is it possible to directly transform column = of > > strings to a integer one (or datetime one), and to remove a not used > > column? > > When I import CSV files into record arrays, I usually read in all of the > data and transpose the list of rows to get a list of columns. Then I can > remove columns and transform them _en masse_, usually with map(). Another possibility is to play with columns directly from the initial=20 recarray. The next is an example: In [101]: ra=3Dnumpy.rec.array("1"*36, dtype=3D"a4,i4,f4", shape=3D3) In [102]: ra Out[102]: recarray([('1111', 825307441, 2.5784852031307537e-09), ('1111', 825307441, 2.5784852031307537e-09), ('1111', 825307441, 2.5784852031307537e-09)], dtype=3D[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f4')]) In [103]: rb=3Dnumpy.rec.fromarrays([numpy.array(ra['f0'], 'i4'),ra['f2']],= =20 names=3D'f0,f1') In [104]: rb Out[104]: recarray([(1111, 2.5784852031307537e-09), (1111, 2.5784852031307537e-09), (1111, 2.5784852031307537e-09)], dtype=3D[('f0', '<i4'), ('f1', '<f4')]) where ra is the original recarray and rb is a derived one where its first=20 column is the original from ra, but converted to integers ('i4'), and the=20 second it's the third column from ra (so the second column from ra has been= =20 stripped out from rb). HTH, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: Lionel R. <lro...@li...> - 2006-09-18 07:39:18
|
Le vendredi 15 septembre 2006 16:05, Francesc Altet a =E9crit=A0: > Another possibility is to play with columns directly from the initial > recarray. The next is an example: > > In [101]: ra=3Dnumpy.rec.array("1"*36, dtype=3D"a4,i4,f4", shape=3D3) > In [102]: ra > Out[102]: > recarray([('1111', 825307441, 2.5784852031307537e-09), > ('1111', 825307441, 2.5784852031307537e-09), > ('1111', 825307441, 2.5784852031307537e-09)], > dtype=3D[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f4')]) > In [103]: rb=3Dnumpy.rec.fromarrays([numpy.array(ra['f0'], 'i4'),ra['f2']= ], > names=3D'f0,f1') > In [104]: rb > Out[104]: > recarray([(1111, 2.5784852031307537e-09), (1111, 2.5784852031307537e-09), > (1111, 2.5784852031307537e-09)], > dtype=3D[('f0', '<i4'), ('f1', '<f4')]) > > where ra is the original recarray and rb is a derived one where its first > column is the original from ra, but converted to integers ('i4'), and the > second it's the third column from ra (so the second column from ra has be= en > stripped out from rb). I have a problem with that : lionel[ETD-2006-01__PM2.5_DALTON]334>datas[0:5] Sortie[334]: [['Dates ', 'PM10 ', 'c10', 'PM2.5 ', 'c2.5'], ['05/01/2006', '33', 'A', '', 'N'], ['06/01/2006', '41', 'A', '30', 'A'], ['07/01/2006', '20', 'A', '16', 'A'], ['08/01/2006', '16', 'A', '13', 'A']] lionel[ETD-2006-01__PM2.5_DALTON] 335>ra=3Drec.array(datas[1:],formats=3D'a10,i2,a1,i2,a1') lionel[ETD-2006-01__PM2.5_DALTON]336>ra[0:5] Sortie[336]: recarray([[('05/01/2006', 0, '', 0, ''), ('33', 0, '', 0, ''), ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), ('30', 0, '', 0, ''), ('N\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], [('06/01/2006', 0, '', 0, ''), ('41', 0, '', 0, ''), ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), ('30', 0, '', 0, ''), ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], [('07/01/2006', 0, '', 0, ''), ('20', 0, '', 0, ''), ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), ('16', 0, '', 0, ''), ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], [('08/01/2006', 0, '', 0, ''), ('16', 0, '', 0, ''), ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), ('13', 0, '', 0, ''), ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], [('09/01/2006', 0, '', 0, ''), ('18', 0, '', 0, ''), ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), ('15', 0, '', 0, ''), ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')]], dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', '<i2'),= =20 ('f5', '|S1')]) I have some missing entries, is it for that or do I have to make some chang= es=20 on the date column? thanks =2D-=20 Lionel Roubeyrie - lro...@li... LIMAIR http://www.limair.asso.fr |
From: Francesc A. <fa...@ca...> - 2006-09-18 10:17:15
|
El dl 18 de 09 del 2006 a les 09:38 +0200, en/na Lionel Roubeyrie va escriure: > Le vendredi 15 septembre 2006 16:05, Francesc Altet a =C3=A9crit : > > Another possibility is to play with columns directly from the initial > > recarray. The next is an example: > > > > In [101]: ra=3Dnumpy.rec.array("1"*36, dtype=3D"a4,i4,f4", shape=3D3) > > In [102]: ra > > Out[102]: > > recarray([('1111', 825307441, 2.5784852031307537e-09), > > ('1111', 825307441, 2.5784852031307537e-09), > > ('1111', 825307441, 2.5784852031307537e-09)], > > dtype=3D[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f4')]) > > In [103]: rb=3Dnumpy.rec.fromarrays([numpy.array(ra['f0'], 'i4'),ra['f2= ']], > > names=3D'f0,f1') > > In [104]: rb > > Out[104]: > > recarray([(1111, 2.5784852031307537e-09), (1111, 2.5784852031307537e-09= ), > > (1111, 2.5784852031307537e-09)], > > dtype=3D[('f0', '<i4'), ('f1', '<f4')]) > > > > where ra is the original recarray and rb is a derived one where its fir= st > > column is the original from ra, but converted to integers ('i4'), and t= he > > second it's the third column from ra (so the second column from ra has = been > > stripped out from rb). >=20 > I have a problem with that : > lionel[ETD-2006-01__PM2.5_DALTON]334>datas[0:5] > Sortie[334]: > [['Dates ', 'PM10 ', 'c10', 'PM2.5 ', 'c2.5'], > ['05/01/2006', '33', 'A', '', 'N'], > ['06/01/2006', '41', 'A', '30', 'A'], > ['07/01/2006', '20', 'A', '16', 'A'], > ['08/01/2006', '16', 'A', '13', 'A']] >=20 > lionel[ETD-2006-01__PM2.5_DALTON] > 335>ra=3Drec.array(datas[1:],formats=3D'a10,i2,a1,i2,a1') >=20 > lionel[ETD-2006-01__PM2.5_DALTON]336>ra[0:5] > Sortie[336]: > recarray([[('05/01/2006', 0, '', 0, ''), ('33', 0, '', 0, ''), > ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), > ('30', 0, '', 0, ''), > ('N\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], > [('06/01/2006', 0, '', 0, ''), ('41', 0, '', 0, ''), > ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), > ('30', 0, '', 0, ''), > ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], > [('07/01/2006', 0, '', 0, ''), ('20', 0, '', 0, ''), > ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), > ('16', 0, '', 0, ''), > ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], > [('08/01/2006', 0, '', 0, ''), ('16', 0, '', 0, ''), > ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), > ('13', 0, '', 0, ''), > ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')], > [('09/01/2006', 0, '', 0, ''), ('18', 0, '', 0, ''), > ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'), > ('15', 0, '', 0, ''), > ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')]], > dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', '<i2'= ),=20 > ('f5', '|S1')]) >=20 > I have some missing entries, is it for that or do I have to make some cha= nges=20 > on the date column? You have two problems here. The first is that you shouldn't have missign entries, or conversion from empty strings to ints (or whatever) will fail: >>> int('') Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: invalid literal for int(): Second, you can't feed a string of literals directly into the rec.array constructor (it is not as intelligent to digest this yet). You can achieve what you want by first massaging the data a bit: >>> ra=3Dnumpy.rec.array(datas[1:]) >>> numpy.rec.fromarrays([ra['f1'],ra['f2'],ra['f3'],ra['f4'],ra['f5']],formats= =3D'a10,i2,a1,i2,a1') recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30, 'A'), ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, 'A')], dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', '<i2'), ('f5', '|S1')]) or, a bit more easier, >>> ca=3Dnumpy.array(datas[1:]) >>> numpy.rec.fromarrays(ca.transpose(),formats=3D'a10,i2,a1,i2,a1') recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30, 'A'), ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, 'A')], dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', '<i2'), ('f5', '|S1')]) Cheers, --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=C3=A1rabos Coop. V. Enjoy Data "-" |
From: Lionel R. <lro...@li...> - 2006-09-18 15:10:56
|
Le lundi 18 septembre 2006 12:17, Francesc Altet a =C3=A9crit=C2=A0: > You have two problems here. The first is that you shouldn't have missign > entries, or conversion from empty strings to ints (or whatever) will > > fail: > >>> int('') > > Traceback (most recent call last): > File "<stdin>", line 1, in ? > ValueError: invalid literal for int(): > > Second, you can't feed a string of literals directly into the rec.array > constructor (it is not as intelligent to digest this yet). You can > > achieve what you want by first massaging the data a bit: > >>> ra=3Dnumpy.rec.array(datas[1:]) > > numpy.rec.fromarrays([ra['f1'],ra['f2'],ra['f3'],ra['f4'],ra['f5']],forma= ts >=3D'a10,i2,a1,i2,a1') recarray([('05/01/2006', 33, 'A', 0, 'N'), > ('06/01/2006', 41, 'A', 30, 'A'), > ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, > 'A')], > dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', > '<i2'), ('f5', '|S1')]) > > or, a bit more easier, > > >>> ca=3Dnumpy.array(datas[1:]) > >>> numpy.rec.fromarrays(ca.transpose(),formats=3D'a10,i2,a1,i2,a1') > > recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30, > 'A'), > ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, > 'A')], > dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', > '<i2'), ('f5', '|S1')]) > > > Cheers, Hi, thanks for your help, but I don't understand why is not working here: lionel[ETD-2006-01__PM2.5_DALTON]624>datas Sortie[624]: [['05/01/2006', '33', 'A', '10', 'N'], ['06/01/2006', '41', 'A', '30', 'A'], ['07/01/2006', '20', 'A', '16', 'A']] lionel[ETD-2006-01__PM2.5_DALTON]625>ra=3Drec.array(datas) lionel[ETD-2006-01__PM2.5_DALTON]626>ra Sortie[626]: recarray([('05/01/2006', '33', 'A', '10', 'N'), ('06/01/2006', '41', 'A', '30', 'A'), ('07/01/2006', '20', 'A', '16', 'A')], dtype=3D[('f1', '|S10'), ('f2', '|S2'), ('f3', '|S1'), ('f4', '|S2'),= =20 ('f5', '|S1')]) lionel[ETD-2006-01__PM2.5_DALTON]627>rec.fromarrays( [ra['f1'], ra['f2'],=20 ra['f4']], formats=3D'a10,i2,i2') =2D------------------------------------------------------------------------= =2D- exceptions.TypeError Traceback (most recent= =20 call last) /home/lionel/Etudes_Techniques/ETD-2006-01__PM2.5_DALTON/<ipython console> /usr/lib/python2.4/site-packages/numpy/core/records.py in=20 fromarrays(arrayList, formats, names, titles, shape, aligned) 235 # populate the record array (makes a copy) 236 for i in range(len(arrayList)): =2D-> 237 _array[_names[i]] =3D arrayList[i] 238 239 return _array TypeError: array cannot be safely cast to required type =2D-=20 Lionel Roubeyrie - lro...@li... LIMAIR http://www.limair.asso.fr |
From: Francesc A. <fa...@ca...> - 2006-09-18 15:40:53
|
El dl 18 de 09 del 2006 a les 17:10 +0200, en/na Lionel Roubeyrie va escriure: > Le lundi 18 septembre 2006 12:17, Francesc Altet a =C3=A9crit : > > You have two problems here. The first is that you shouldn't have missig= n > > entries, or conversion from empty strings to ints (or whatever) will > > > > fail: > > >>> int('') > > > > Traceback (most recent call last): > > File "<stdin>", line 1, in ? > > ValueError: invalid literal for int(): > > > > Second, you can't feed a string of literals directly into the rec.array > > constructor (it is not as intelligent to digest this yet). You can > > > > achieve what you want by first massaging the data a bit: > > >>> ra=3Dnumpy.rec.array(datas[1:]) > > > > numpy.rec.fromarrays([ra['f1'],ra['f2'],ra['f3'],ra['f4'],ra['f5']],for= mats > >=3D'a10,i2,a1,i2,a1') recarray([('05/01/2006', 33, 'A', 0, 'N'), > > ('06/01/2006', 41, 'A', 30, 'A'), > > ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, > > 'A')], > > dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', > > '<i2'), ('f5', '|S1')]) > > > > or, a bit more easier, > > > > >>> ca=3Dnumpy.array(datas[1:]) > > >>> numpy.rec.fromarrays(ca.transpose(),formats=3D'a10,i2,a1,i2,a1') > > > > recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30, > > 'A'), > > ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13, > > 'A')], > > dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', > > '<i2'), ('f5', '|S1')]) > > > > > > Cheers, >=20 > Hi, > thanks for your help, but I don't understand why is not working here: > lionel[ETD-2006-01__PM2.5_DALTON]624>datas > Sortie[624]: > [['05/01/2006', '33', 'A', '10', 'N'], > ['06/01/2006', '41', 'A', '30', 'A'], > ['07/01/2006', '20', 'A', '16', 'A']] >=20 > lionel[ETD-2006-01__PM2.5_DALTON]625>ra=3Drec.array(datas) >=20 > lionel[ETD-2006-01__PM2.5_DALTON]626>ra > Sortie[626]: > recarray([('05/01/2006', '33', 'A', '10', 'N'), > ('06/01/2006', '41', 'A', '30', 'A'), > ('07/01/2006', '20', 'A', '16', 'A')], > dtype=3D[('f1', '|S10'), ('f2', '|S2'), ('f3', '|S1'), ('f4', '|S2'= ),=20 > ('f5', '|S1')]) >=20 > lionel[ETD-2006-01__PM2.5_DALTON]627>rec.fromarrays( [ra['f1'], ra['f2'],= =20 > ra['f4']], formats=3D'a10,i2,i2') > -------------------------------------------------------------------------= -- > exceptions.TypeError Traceback (most rece= nt=20 > call last) Mmm, this works for me: >>> datas [['05/01/2006', '33', 'A', '0', 'N'], ['06/01/2006', '41', 'A', '30', 'A'], ['07/01/2006', '20', 'A', '16', 'A'], ['08/01/2006', '16', 'A', '13', 'A']] >>> ra=3Drec.array(datas) >>> rec.fromarrays([ra['f1'],ra['f2'],ra['f4']], formats=3D'a10,i2,i2') recarray([('05/01/2006', 33, 0), ('06/01/2006', 41, 30), ('07/01/2006', 20, 16), ('08/01/2006', 16, 13)], dtype=3D[('f1', '|S10'), ('f2', '<i2'), ('f3', '<i2')]) I'm running NumPy 1.0b5. Please, check that you are using a recent version of it. Cheers, --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=C3=A1rabos Coop. V. Enjoy Data "-" |
From: Lionel R. <lro...@li...> - 2006-09-18 15:47:51
|
Le lundi 18 septembre 2006 17:40, Francesc Altet a =C3=A9crit=C2=A0: > I'm running NumPy 1.0b5. Please, check that you are using a recent > version of it. > > Cheers, Arg, sorry, version here was 0.9, an upgrade and it works fine. thanks again =2D-=20 Lionel Roubeyrie - lro...@li... LIMAIR http://www.limair.asso.fr |