From: Stef B. <st...@gm...> - 2012-10-30 14:06:14
|
Hi, I'm looking for a way to index entries to: a. enable lookup based upon parent ino and name. Much functions like lookup and mknod use the parent ino and the name, so a lookup function based upon parent ino and name is required here. b. get the directory entries given the parent ino. A readdir with no backend (pure virtual) requires a function to get the entries based upon ino (and entry->parent==parent). Do any of you have some idea to do this? I'm thinking about an hash table: struct entry_struct **name_hash_table and name_hash_table=calloc(maxsize, sizeof(struct entry_struct *)) Now every entry should have a field like: entry->nameindex_next entry->nameindex_prev of type entry_struct So ino % maxsize points to a ordered double linked list of entries. Now it's possible to search through this list using the search "in the middle" and compare the entry found. If bigger, take the upperhalf, and lower take the lower half and so on. (there is a name for it I do not know anymore..) Now this will work, but is silly. You have to compare names over and over again. For example: let's take an ordered list where the first name starts with a a, and the latest name with a z. Now I've got an name starting with a c. So when doing a lookup using the above algoritm, it will look in the middle between a and z, while of course it's much better to get a value much closer to a than to z: a + 26 * (c-a) / (z -a ) when you expect the data is ordered random. Now here only the first letter is used. Much better would be the first 3 or 4 letters, not all I think, I don't know. Does anyone have some experience or advice? Stef Bon |
From: Stef B. <st...@gm...> - 2012-10-30 14:23:45
|
2012/10/30 Stef Bon <st...@gm...>: > Hi, > > Now here only the first letter is used. Much better would be the first > 3 or 4 letters, not all I think, I don't know. > > Does anyone have some experience or advice? Oh first I see already one thing: the index cannot be made like this, it has to be a memory block made of one part. That's the first issue. So fields like nameindex_next and nameindex_prev do not work. Better is seperate double linked list: struct nameindex_struct { uint64_t namevalue; struct entry_struct *entry; } But then for every entry the index should be of fixed size, not too small. normally a directory contains 50 entries average, so this will result in: 50 * sizeof(struct nameindex_struct) Is this doable? Stef |
From: Roger W. <ro...@fi...> - 2012-10-30 14:31:19
|
On 30 Oct 2012, at 14:06, Stef Bon wrote: > Hi, > > I'm looking for a way to index entries to: > > a. enable lookup based upon parent ino and name. > > Much functions like lookup and mknod use the parent ino and the name, > so a lookup function based upon parent ino and name is required here. > > b. get the directory entries given the parent ino. > > A readdir with no backend (pure virtual) requires a function to get > the entries based upon ino (and entry->parent==parent). > > Do any of you have some idea to do this? > I'm thinking about an hash table: > > struct entry_struct **name_hash_table > > and > > name_hash_table=calloc(maxsize, sizeof(struct entry_struct *)) > > Now every entry should have a field like: > > entry->nameindex_next > entry->nameindex_prev > > of type entry_struct > > So > > ino % maxsize > > points to a ordered double linked list of entries. > > Now it's possible to search through this list using the search "in the > middle" and compare the entry found. > If bigger, take the upperhalf, and lower take the lower half and so on. > (there is a name for it I do not know anymore..) > > Now this will work, but is silly. You have to compare names over and > over again. For example: > > let's take an ordered list where the first name starts with a a, and > the latest name with a z. > Now I've got an name starting with a c. So when doing a lookup using > the above algoritm, it will look > in the middle between a and z, while of course it's much better to get > a value much closer > to a than to z: > > a + 26 * (c-a) / (z -a ) > > when you expect the data is ordered random. > > Now here only the first letter is used. Much better would be the first > 3 or 4 letters, not all I think, I don't know. > > Does anyone have some experience or advice? > > Stef Bon 'a + 26 * (c-a) / (z -a )' doesn't do the right thing if all your filenames begin with 'a'. You probably want something like a red-black tree indexed by filename. If it were me, I'd use an stl::map and not worry about efficiency unless and until it turned out to be an issue. -- Roger |
From: Stef B. <st...@gm...> - 2012-10-31 11:18:31
|
2012/10/30 Roger Willcocks <ro...@fi...>: > >> >> a + 26 * (c-a) / (z -a ) >> >> when you expect the data is ordered random. >> >> Now here only the first letter is used. Much better would be the first >> 3 or 4 letters, not all I think, I don't know. >> >> Does anyone have some experience or advice? >> >> Stef Bon > > > 'a + 26 * (c-a) / (z -a )' doesn't do the right thing if all your filenames begin with 'a'. You probably want something like a red-black tree indexed by filename. If it were me, I'd use an stl::map and not worry about efficiency unless and until it turned out to be an issue. Thanks a lot. About the first: yes if all filenames begin with a a, this does not give extra optimization. But that counts for every construction. There are cases where it does not offer extra. What is a read black tree ? Stef |