From: Seref A. <ser...@gm...> - 2013-06-04 17:30:24
|
I think I've seen this in the release notes of 3.0. This is actually something that I'm looking into as well. So any experience/feedback about creating files in memory would be much appreciated. Best regards Seref On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll <li...@hi...> wrote: > On 04.06.2013 05:35, Tim Burgess wrote: > > My thoughts are: > > > > - try it without any compression. Assuming 32 bit floats, your monthly > > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and > > at the least it will give you a baseline to work from - and will help if > > you are investigating IO tuning. > > > > - I have found with CArray that the auto chunksize works fairly well. > > Experiment with that chunksize and with some chunksizes that you think > > are more appropriate (maybe temporal rather than spatial in your case). > > > > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote: > > > >> On 03.06.2013 14:43, Andreas Hilboll wrote: > >> > Hi, > >> > > >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray > >> > (the last dimension represents time, and once per month there'll be > one > >> > more 5760x2880 array to add to the end). > >> > > >> > Now, extracting timeseries at one index location is slow; e.g., for > four > >> > indices, it takes several seconds: > >> > > >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1)) > >> > > >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)]) > >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s > >> > Wall time: 7.17 s > >> > > >> > I have the feeling that this performance could be improved, but I'm > not > >> > sure about how to properly use the `chunkshape` parameter in my case. > >> > > >> > Any help is greatly appreciated :) > >> > > >> > Cheers, Andreas. > >> > >> PS: If I could get significant performance gains by not using an EArray > >> and therefore re-creating the whole database each month, then this would > >> also be an option. > >> > >> -- Andreas. > > Thanks a lot, Anthony and Tim! I was able to get down the readout time > considerably using chunkshape=(32, 32, 256) for my 5760x2880x150 array. > Now, reading times are about as fast as I expected. > > the downside is that now, building up the database takes up a lot of > time, because i get the data in chunks of 5760x2880x1. So I guess that > writing the data to disk like this causes a load of IO operations ... > > My new question: Is there a way to create a file in-memory? If possible, > I could then build up my database in-memory and then, once it's done, > just copy the arrays to an on-disk file. Is that possible? If so, how? > > Thanks a lot for your help! > > -- Andreas. > > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |