Ah. True, I was suggesting something that would operate at the record level, which can be both less than and more than a page.
-----Original Message-----
From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Tuesday, September 27, 2005 8:41 PM
To: 'JDBM Developer listserv '
Subject: re[2]: [Jdbm-developer] More thoughts on new translation page

I don't think we actually have the logical id -> physical id map that we need...  We'd have to have all records on the source and page in cache, read from their logical row ids, in order for it to work.  This is not at all likely.
This, of course, assumes that we are just moving pages around.
I suppose it would also be possible to actually move records around instead of entire pages...  The strategy would be to just order the entire file by the order of logical row ids.  This would take a huge amount of disk thrashing, but it would wind up with a fully packed database.  My idea of doing this at the page level would be much faster (in fact, it might be fast enough to do on the fly), but would not result in as tight of a packing.
Even if we sweep the logical row id list, we still have to have the ability of moving a record where we only have the physical location, so a reverse map is still needed...
- K
 > Well,

There is one time when we have the logical id -- that is when we are
in the middle of a jdbm operation.  We could do on the fly relocation
of pages.  The main goal of relocation is, I think, to migrate pages
from further down in the store into free pages earlier in the store.
If we always had the block of the first free page in the store, then
could we do runtime relocation of any fetched page with a greater

But that seems wildly suboptimal.  Unless we are trying to get all
pages of a certain type up front, it seems that it is only pages at
the end of the store that we want to swap for free pages earlier in
the store.  There is probably a very simple operation that we can do
to test if a page qualifies.  I.e., there are N free pages so the N
pages from the end of the store are valid candidates for relocation.
If the page is already a free page, then we don't relocate it.  Else
we swap it with the first free page from earlier in the store.

Not a complete algorithm, but it seems possible to devise one.


-----Original Message-----
From: jdbm-developer-admin@lists.sourceforge.net
To: JDBM Developer listserv
Sent: 9/27/2005 12:30 AM
Subject: [Jdbm-developer] More thoughts on new translation page

So I've been thinking...  The main purpose of re-implementing the
translation page is to allow it to be moved around without messing
things up.

I started thinking about what other things would prevent us from moving
pages around, and, unless I'm missing something, the current
architecture will not allow for moving data pages around at all...  The
problem is that there is no efficient way to look up the logical row id
given a physical row id.  If we are trying to re-arrange blocks, we will
have to do a linear search of the entire translation table for every
single block we move.  Ugly.  Same problem applies if we are wanting to
move a record inside a block.

I'm pretty sure that this calls for reverse loopup capability, and that
the best place to do it is going to be in the record header.  We could
store it in an external lookup list, but that just complicates things
(and the same amount of disk space still gets used).

So, this got me thinking about the record header...

If we are going to implement a split() operation on a record, then I
think we can actually drastically change the way the current record
headers work.

Instead of storing the available and used lengths (as integers), we can
store available and unused lengths.  available will still be an integer,
but we should be able to make the unused length a single byte.

When we add in the logical row id, the record header becomes 13 bytes
instead of the current 8.  We may also want to carefully consider the
actual # of bytes we need to capture the available range of logical row
ids that can exist in a file.

Let's say that we have a file completely full of records with an average
of 8 bytes each (I think this is a very conserative average).  If we had
only data pages in the file, and we honor the current jdbm limit on the
maximum block id (2^47), we will need only 7 bytes to capture all
possible logical row ids.  So, we can actually drop the new header size
down to 12 bytes.  Just in case anyone cares, the average data size
would have to be 4096 bytes before we could recover an additional byte.

Another way to approach this is to think about the biggest size that we
could reasonably see being used (the addressing above allows for file
sizes up to ~8,000,000 TB - yup 8 million Terabytes).  This seems quite
excessive, but I'm not sure where the reasonable break point should be -
I'm sure that 20 MB used to sound like a lot...

Anyway, I think that implementing an effective defrag system is going to
require this kind of reverse pointer - I'd love to hear if any of you
have any other ideas.

BTW - I haven't heard from Alex in awhile - is he OK?  Did he maybe get
hit with one of these hurricanes?  Or is he just ignoring all this
chatter hoping we'll go away?


- K

------------------------------------------------------- SF.Net email is
sponsored by: Tame your development challenges with Apache's Geronimo
App Server. Download it for free - -and be entered to win a 42" plasma
tv or your very own Sony(tm)PSP. Click here to play:
_______________________________________________ Jdbm-developer mailing
list Jdbm-developer@lists.sourceforge.net

------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Jdbm-developer mailing list Jdbm-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer