From: Francesc A. <fa...@op...> - 2003-07-16 16:11:04
|
A Dimecres 16 Juliol 2003 17:42, vareu escriure: > The loading of the object tree is dependant on data access not on > metadata access. > > For example, let's say we have a new function named hdf_load which > understands caching. > > a=hdf_load(file) > > At this point, you have not loaded anything except for the cache in the > hdf file, which may just be a simple python dictionary. The object > structure is not there. With the cache in place, you do know all of the > groups. You just do not have access to the data. > > So, when I do a: > > for i in a.keys(): > > I am looking through the cache of group names (groups represent keys). > As soon as I do the a[i], then I load the object tree for a[i] and pull > out the table. > The approach is lazy, I only load the part of object tree that it is > actually needed and only when the data is accessed. > > for i in a.keys(): > #i equals fred,barney,wilma . . . betty > print a[i] > #print returns [1,2,3,4,5,6] for a['fred'] > #to get to the data, I load the object tree for a['fred'] > > > Clearer? I think so. It's a great idea!. The only thing is that you should not worry about the cache right now (specially if it is only a few percent better than the current code). I think it would be better to start with the lazy implementation without additional cache complications, because, if cache finally can't be accelerated, we can include the new code without further work. And, if cache will finally speeds significanly things up, we can always include that later on. I think it's always better to factorize things and adding them bit a bit after they have been completely tested. BTW, I'm sending a copy of some messages to the pytables user's list. Maybe somebody wants to contribute with fresh ideas. Cheers, -- Francesc Alted |