|
From: Geoff H. <ghu...@ws...> - 2002-09-19 23:14:09
|
On Thu, 19 Sep 2002, Neal Richter wrote: > Merging Loic's latest mifluz is supposed to fix this problem (Geoff > and I have been working on this), but so far the merge is fairly complex > and needs much more work and long term testing. This is a decent > interim solution. Obviously I'm more concerned with the mifluz merge and figuring out the lousy performance. But if you've seen that switching to zlib or the newer codec seem to solve the database bugs, then I'm happy with this as an interim solution. We could use this for a 3.2.0b4 (which we need) and then work on the mifluz merge for 3.2.0b5. > WORD DOCID LOCATION > affect 323 43 So first off, I should point out that it's not quite as bad as this. Loic and I worked on "key compression," which means that the database doesn't actually store multiple keys when they're only slightly different. There's also a rationale behind this system--it was faster to keep all these keys than changing the length of the records: > affect 323 43, 53 > affect 336 14, 148, 155 > Value-field in BDB by making it a fixed width. > > Ex: Let's say this LOCATION-value is 'Full' @ 32 characters. Further > locations of 'affect' in doc 400 get new rows OK, so having a fixed width and multiple rows may be a reasonable idea, but your description isn't very workable. For one, the keys need to be unique. So you'd want something like: Key: WORD DOCID ROW Record: Location/Flags/Anchor designation list The key would be to come up with a compact binary representation of these. Using characters to store integers is a bit inefficient. :-) More on that later, perhaps. -Geoff |