From: Alvaro T. C. <al...@mi...> - 2012-12-07 19:22:56
|
Thanks Francesc, that solved it. Having the disk datastructures load compressed in memory can be a deal-breaker when you got daily 50Gb+ datasets to process! The carray google group (I had not noticed it) seems unreachable at the moment. That's why I am going to report a problem here for the moment. With the following code ct0 = ca.ctable((h5f.root.c_000[:],), names=('c_000',), rootdir= u'/lfpd1/tmp/ctable-1', mode='w', cparams=ca.cparams(5), dtype='u2', expectedlen=len(h5f.root.c_000)) for k in h5f.root._v_children.keys()[:3]: #just some of the HDF5 datasets try: col = getattr(h5f.root, k) ct0.addcol(col[:], name=k, expectedlen=len(col), dtype='u2') except ValueError: pass #exists ct0.flush() >>> ct0 ctable((303390000,), [('c_000', '<u2'), ('c_007', '<u2'), ('c_006', '<u2'), ('c_005', '<u2')]) nbytes: 2.26 GB; cbytes: 1.30 GB; ratio: 1.73 cparams := cparams(clevel=5, shuffle=True) rootdir := '/lfpd1/tmp/ctable-1' [(312, 37, 65432, 91) (313, 32, 65439, 65) (320, 24, 65433, 66) ..., (283, 597, 677, 647) (276, 600, 649, 635) (298, 607, 635, 620)] The newly-added datasets/columns exist in memory >>> ct0['c_007'] carray((303390000,), uint16) nbytes: 578.67 MB; cbytes: 333.50 MB; ratio: 1.74 cparams := cparams(clevel=5, shuffle=True) [ 37 32 24 ..., 597 600 607] but they do not appear in the rootdir, not even after .flush() /lfpd1/tmp/ctable-1]$ ls __attrs__ c_000 __rootdirs__ and something seems amiss with __rootdirs__: /lfpd1/tmp/ctable-1]$ cat __rootdirs__ {"dirs": {"c_007": null, "c_006": null, "c_005": null, "c_000": "/lfpd1/tmp/ctable-1/c_000"}, "names": ["c_000", "c_007", "c_006", "c_005"]} >>> ct0.cbytes//1024**2 1334 vs /lfpd1/tmp]$ du -h ctable-1 12K ctable-1/c_000/meta 340M ctable-1/c_000/data 340M ctable-1/c_000 340M ctable-1 and, finally, no 'open' ct0_disk = ca.open(rootdir='/lfpd1/tmp/ctable-1', mode='r') ---------------------------------------------------------------------------ValueError Traceback (most recent call last)/home/tejero/Dropbox/O/nb/nonridge/<ipython-input-26-41e1cb01ffe6> in <module>()----> 1 ct0_disk = ca.open(rootdir='/lfpd1/tmp/ctable-1', mode='r') /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/toplevel.pyc in open(rootdir, mode) 104 # Not a carray. Now with a ctable 105 try:--> 106 obj = ca.ctable(rootdir=rootdir, mode=mode) 107 except IOError: 108 # Not a ctable /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc in __init__(self, columns, names, **kwargs) 193 _new = True 194 else:--> 195 self.open_ctable() 196 _new = False 197 /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc in open_ctable(self) 282 283 # Open the ctable by reading the metadata--> 284 self.cols.read_meta_and_open() 285 286 # Get the length out of the first column /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc in read_meta_and_open(self) 40 # Initialize the cols by instatiating the carrays 41 for name, dir_ in data['dirs'].items():---> 42 self._cols[str(name)] = ca.carray(rootdir=dir_, mode=self.mode) 43 44 def update_meta(self): /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/carrayExtension.so in carray.carrayExtension.carray.__cinit__ (carray/carrayExtension.c:8637)() ValueError: You need at least to pass an array or/and a rootdir -á. On 7 December 2012 17:04, Francesc Alted <fa...@gm...> wrote: > Hmm, perhaps cythonizing by hand is your best bet: > > $ cython carray/carrayExtension.pyx > > If you continue having problems, please write to the carray mailing list. > > Francesc > > On 12/7/12 5:29 PM, Alvaro Tejero Cantero wrote: > > I have now similar dependencies as you, except for Numpy 1.7 beta 2. > > > > I wish I could help with the carray flavor. > > > > -- > > Running setup.py install for carray > > * Found Cython 0.17.2 package installed. > > * Found numpy 1.6.2 package installed. > > * Found numexpr 2.0.1 package installed. > > building 'carray.carrayExtension' extension > > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > > --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > > -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 > > -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Iblosc > > > -I/home/tejero/Local/Envs/test/lib/python2.7/site-packages/numpy/core/include > > -I/usr/include/python2.7 -c' > > extra options: '-msse2' > > gcc: blosc/blosclz.c > > gcc: carray/carrayExtension.c > > gcc: error: carray/carrayExtension.c: No such file or directory > > gcc: fatal error: no input files > > compilation terminated. > > gcc: error: carray/carrayExtension.c: No such file or directory > > gcc: fatal error: no input files > > compilation terminated. > > error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe > > -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > > --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > > -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 > > -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Iblosc > > > -I/home/tejero/Local/Envs/test/lib/python2.7/site-packages/numpy/core/include > > -I/usr/include/python2.7 -c carray/carrayExtension.c -o > > build/temp.linux-x86_64-2.7/carray/carrayExtension.o -msse2" failed > > with exit status 4 > > > > > > > > -á. > > > > > > > > On 7 December 2012 12:47, Francesc Alted <fa...@gm... > > <mailto:fa...@gm...>> wrote: > > > > On 12/6/12 1:42 PM, Alvaro Tejero Cantero wrote: > > > Thank you for the comprehensive round-up. I have some ideas and > > > reports below. > > > > > > What about ctables? The documentation says that it is specificly > > > column-access optimized, which is what I need in this scenario > > > (sometimes sequential, sometimes random). > > > > Yes, ctables is optimized for column access. > > > > > > > > Unfortunately I could not get the rootdir parameter for ctables > > > __init__ to work in carray 0.4 and pip-installing 0.5 or 0.5.1 > leads > > > to compilation errors. > > > > Yep, persistence for carray/ctables objects was added in 0.5. > > > > > > > > This is the ctables-to-disk error: > > > > > > ct2 = ca.ctable((np.arange(30000000),), names=('range2',), > > > rootdir='/tmp/ctable2.ctable') > > > > > > --------------------------------------------------------------------------- > > > TypeError Traceback (most > > recent call last) > > > > > /home/tejero/Dropbox/O/nb/nonridge/<ipython-input-29-255842877a0b> > > in<module>() > > > ----> 1 ct2= ca.ctable((np.arange(30000000),), > > names=('range2',), rootdir='/tmp/ctable2.ctable') > > > > > > > > > /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc > > in__init__(self, cols, names, **kwargs) > > > 158 if column.dtype== np.void: > > > 159 raise ValueError, "`cols` > > elements cannot be of type void" > > > --> 160 column= ca.carray(column, **kwargs) > > > 161 elif ratype: > > > 162 column= ca.carray(cols[name], **kwargs) > > > > > > > > > /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/carrayExtension.so > > incarray.carrayExtension.carray.__cinit__ > > (carray/carrayExtension.c:3917)() > > > > > > TypeError: __cinit__() got an unexpected keyword argument 'rootdir' > > > > > > > > > And this is cut from the pip output when trying to upgrade carray. > > > > > > gcc: carray/carrayExtension.c > > > > > > gcc: error: carray/carrayExtension.c: No such file or directory > > > > Hmm, that's strange, because the carrayExtension should have been > > cythonized automatically. Here it is part of my install process > > with pip: > > > > Running setup.py install for carray > > * Found Cython 0.17.1 package installed. > > * Found numpy 1.7.0b2 package installed. > > * Found numexpr 2.0.1 package installed. > > cythoning carray/carrayExtension.pyx to carray/carrayExtension.c > > building 'carray.carrayExtension' extension > > C compiler: gcc -fno-strict-aliasing > > -I/Users/faltet/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 > > -Wall -Wstrict-prototypes > > > > Hmm, perhaps you need a newer version of Cython? > > > > > > > > > > > Two more notes: > > > > > > * a way was added to check in-disk (compressed) vs in-memory > > > (uncompressed) node sizes. I was unable to find the way to use it > > > either from the 2.4.0 release notes or from the git issue > > > > https://github.com/PyTables/PyTables/issues/141#issuecomment-5018763 > > > > You already found the answer. > > > > > > > > * is/will it be possible to load PyTables carrays as in-memory > > carrays > > > without decompression? > > > > Actually, that has been my idea from the very beginning. The > > concept of > > 'flavor' for the returned objects when reading is already there, so > it > > should be relatively easy to add a new 'carray' flavor. Maybe you > can > > contribute this? > > > > -- > > Francesc Alted > > > > > > > ------------------------------------------------------------------------------ > > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > > Remotely access PCs and mobile devices and provide instant support > > Improve your efficiency, and focus on delivering more value-add > > services > > Discover what IT Professionals Know. Rescue delivers > > http://p.sf.net/sfu/logmein_12329d2d > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > <mailto:Pyt...@li...> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > > > ------------------------------------------------------------------------------ > > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > > Remotely access PCs and mobile devices and provide instant support > > Improve your efficiency, and focus on delivering more value-add services > > Discover what IT Professionals Know. Rescue delivers > > http://p.sf.net/sfu/logmein_12329d2d > > > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |