From: Francesc A. <fa...@op...> - 2003-09-18 08:50:12
|
A Dijous 18 Setembre 2003 01:10, vareu escriure: > I'll update to the latest version and try it out. I'll let you know if I > notice any slowness or excessive memory use. What do you mean by *lots* of > memory? The data itself that is written out varies from 10's to 100's of > megabytes. Is it comparable to that? By *lots* of memory I meant that, roughly, 6 KB will be booked for each single Group object, 6 KB for each Array and 12 KB for each Table. That's is where most of memory consumption goes, building the structure (i.e. the object tree) from metadata. So, a tree with 1000 Groups, 2000 Arrays and 4000 Tables can account for 66 MB (add 10 MB for the python interpreter and pytables modules). Also, the I/O buffers for Tables are larger for large datasets; so as to see the difference they can be 5 KB for small datasets (less than 100 KB), up to 60 KB for larger ones (greater than 200 MB). However, these buffers are built dinamically (this is new in the CVS version, I forgot to mention that!), so if you don't access the actual data in a Table, this memory will not be booked. I'm pondering now if I should release these buffers after use or keep them into memory for a possible use afterwards (it's a problem of balance between CPU and memory consumption), but I probably will release all buffers after a read or write operation, at expense of more CPU consumption. Array objects are not buffered (you only can read it completely or don't read at all), so the amount of actual data saved (whether your Array is 1 byte or 1 TB on size) is not going to affect too much your memory demands, except by the fact that you will need enough memory to keep a large Array if you want to actually read its data!. > > In the files, I am writing time-dependent data produced by my code. I > write the data out as arrays and not as tables since the data size varies > from step to step. The data that is written each step is seven arrays all > the same size and an integer. Would it be more optimal to create a table > for each step and write the seven arrays as elements of the table? The > total number of steps is typically on the order of one thousand. Well, I'm afraid that your best bet would be to use Variable Length Arrays, like the ones Nicola Larosa was asking for in an earlier message, but this will take some time to be implemented. In the meantime, if you use Tables, you would reduce the number of nodes by a factor of seven. On the other hand, Tables needs more memory per node than Arrays (two times more for the object, and twice more for the internal buffer, if working with small datasets), so one can conclude that, if you use Tables you will need 4/7 times the memory you are using now. In addition, Tables are quite more flexible than Array entities (you can do selections without loading all the info in memory, or just load parts of the dataset), so I would recommend you to use Tables with arrays as columns. Keep in mind too, that Array entities do not support compression on-the-flight, while Tables do. Another possibility with Tables is to define several Tables with two columns, one to store the actual array and the other one to save the actual length of the array. You can then set the series of Tables with different array column lengths on such a way that your arrays fit well on one of them (I mean, without wasting too much space). For example, if your arrays are in the range of (2,1) to (2, 100), you can setup several Tables with columns taking the values (2,10), (2,20), ..., (2,100). You can then save your array in the appropriate Table and save the actual shape in the other field. After retrieving the arrays, you can use the length field to strip out the data you are not interested in. I agree that this solution is a bit affected, but if you have a large amounts of arrays, it can be your best choice until VLArrays are done. Cheers, -- Francesc Alted |