[Jdbm-general] Further thoughts on free adjacent record combination

 [Jdbm-general] Further thoughts on free adjacent record combination From: Kevin Day - 2005-09-22 15:54:34 ```
OK - I spent some more time thinking about this, and I'm pretty sure that one of my initial assumptions was incorrect.  I spent a lot of time worrying about how to de-allocate slot in the free logical row list, when that isn't at all necessary.

New strategy for combining adjacent free records:

Let's say we have a record (X), bounded by record (X-1) and (X+1).

For the following discussion (X-1) and (X+1) are both free (they have been allocated at some point in the past, and then freed).

If we then free (X), the strategy would be as follows:

1.  If first record on current page is (X), then move to previous page(s) that contains a record start (i.e. first record offset != 0).  At this point, we have a "current page" that contains the start of record (X-1)

2.  Follow records on the current page until we get to the end of the page, or we hit record (X).  At this point, we have the actual block and offset of record (X-1)

3.  Check the available size of (X-1) - if it is negative, then it is a free record and can be merged with (X)

4.  Merging (X-1) with (X) is easy - just increase the available size of (X-1) by the available length of (X) and the header of (X).

5.  We also have to update the length field for (X-1) in the free phyiscal row list.  The slot containing the location of the free physical row id entry for (X-1) will be in the first 10 bytes of the (X-1) data segment.  Read that location, and update the length of the (X-1) free physical row id entry to the new value.

6.  Now we are ready to look at merging (X-1) and (X+1)  (note that there is no longer any record (X)

7.  Check the available size of (X+1) - if it is negative, then it is a free record and can be merged with (X-1)

8.  Increase the size of (X-1) by the available length of (X+1) and the header of (X+1)

9.  Update the length field for (X-1) in the free physical row list

We are now done with freeing (X)

1.  an alloc request is made for a length Y that is less than the length of (X)

2.  The free phyiscal row list is searched for the first available row with length greater than or equal to Y.  Row (X) is found.

We now need to split (X) into two parts:  A used part with data length Y and a free part with data length len(X) - len(Y) - 2 len(header)

3.  Mark free physical row slot containing (X) as available

4.  Set the header of (X) so available and used length are both len(Y)

5.  Create a new free physical row id entry for (X+1) with available length len(X) - len(Y) - 2 len(header)

6.  Set header of (X+1) so used length is negative and available length is len(X) - len(Y) - 2 len(header)

We are now done with allocating (X)

Boundary scenarios (Special cases that need to be addressed):

In the current implementation, the length of a row is limited to Short.MAX_VALUE (this restriction could be aleviated by using my packed long format for the available and length fields - but that makes the row header variable size, which would require careful attention to detail to make sure a fixed size assumption isn't used anywhere).  If we don't want to dink with allowing row lengths larger than Short.MAX_VALUE, then we'll need to come up with some set of rules to determine when to stop merging records.

If splitting (X) would result in len(X+1) being smaller than some threshold, then we wouldn't want to do it

If an entire page becomes free, we may want to consider moving it back to the free page list.  We could use this as a first step to actually recovering physical disk space, or it could facilitate a packing operation that rearranges pages.

If splitting (X) would result in the available space on the current page being smaller than some threshold, then we wouldn't want to do it (this is actually already part of the jdbm row allocation alogorithm)

If the available space in a free row is less than 10 bytes, then we can't store the free physical row slot location.  We could get away with using only 8 bytes (give up the top 2 bytes of the block number), or use a packed representation - but we still need to deal with the situation where a record's available space is small.  We may just say that records will always have available space greater than 10 - with the 4 byte over head of the record header, storing individual items with that little data in the record manager is pretty much insane - but I'm not sure how best to deal with it.

All of the above still doesn't give us packing capability, but I think that a strategy could be developed to pack at the page level.  Moving pages will require careful attention to detail (some types of pages just need to be relinked, others also need to have updates made to the logical row id manager), but I think it's doable.  Maybe we have a policy that says "when there are more than XX pages in the free page list, we run the pack algorithm".

Any others you can think of?

Cheers!

- K
```

 [Jdbm-general] Further thoughts on free adjacent record combination From: Kevin Day - 2005-09-22 15:54:34 ```
OK - I spent some more time thinking about this, and I'm pretty sure that one of my initial assumptions was incorrect.  I spent a lot of time worrying about how to de-allocate slot in the free logical row list, when that isn't at all necessary.

New strategy for combining adjacent free records:

Let's say we have a record (X), bounded by record (X-1) and (X+1).

For the following discussion (X-1) and (X+1) are both free (they have been allocated at some point in the past, and then freed).

If we then free (X), the strategy would be as follows:

1.  If first record on current page is (X), then move to previous page(s) that contains a record start (i.e. first record offset != 0).  At this point, we have a "current page" that contains the start of record (X-1)

2.  Follow records on the current page until we get to the end of the page, or we hit record (X).  At this point, we have the actual block and offset of record (X-1)

3.  Check the available size of (X-1) - if it is negative, then it is a free record and can be merged with (X)

4.  Merging (X-1) with (X) is easy - just increase the available size of (X-1) by the available length of (X) and the header of (X).

5.  We also have to update the length field for (X-1) in the free phyiscal row list.  The slot containing the location of the free physical row id entry for (X-1) will be in the first 10 bytes of the (X-1) data segment.  Read that location, and update the length of the (X-1) free physical row id entry to the new value.

6.  Now we are ready to look at merging (X-1) and (X+1)  (note that there is no longer any record (X)

7.  Check the available size of (X+1) - if it is negative, then it is a free record and can be merged with (X-1)

8.  Increase the size of (X-1) by the available length of (X+1) and the header of (X+1)

9.  Update the length field for (X-1) in the free physical row list

We are now done with freeing (X)

1.  an alloc request is made for a length Y that is less than the length of (X)

2.  The free phyiscal row list is searched for the first available row with length greater than or equal to Y.  Row (X) is found.

We now need to split (X) into two parts:  A used part with data length Y and a free part with data length len(X) - len(Y) - 2 len(header)

3.  Mark free physical row slot containing (X) as available

4.  Set the header of (X) so available and used length are both len(Y)

5.  Create a new free physical row id entry for (X+1) with available length len(X) - len(Y) - 2 len(header)

6.  Set header of (X+1) so used length is negative and available length is len(X) - len(Y) - 2 len(header)

We are now done with allocating (X)

Boundary scenarios (Special cases that need to be addressed):

In the current implementation, the length of a row is limited to Short.MAX_VALUE (this restriction could be aleviated by using my packed long format for the available and length fields - but that makes the row header variable size, which would require careful attention to detail to make sure a fixed size assumption isn't used anywhere).  If we don't want to dink with allowing row lengths larger than Short.MAX_VALUE, then we'll need to come up with some set of rules to determine when to stop merging records.

If splitting (X) would result in len(X+1) being smaller than some threshold, then we wouldn't want to do it

If an entire page becomes free, we may want to consider moving it back to the free page list.  We could use this as a first step to actually recovering physical disk space, or it could facilitate a packing operation that rearranges pages.

If splitting (X) would result in the available space on the current page being smaller than some threshold, then we wouldn't want to do it (this is actually already part of the jdbm row allocation alogorithm)

If the available space in a free row is less than 10 bytes, then we can't store the free physical row slot location.  We could get away with using only 8 bytes (give up the top 2 bytes of the block number), or use a packed representation - but we still need to deal with the situation where a record's available space is small.  We may just say that records will always have available space greater than 10 - with the 4 byte over head of the record header, storing individual items with that little data in the record manager is pretty much insane - but I'm not sure how best to deal with it.

All of the above still doesn't give us packing capability, but I think that a strategy could be developed to pack at the page level.  Moving pages will require careful attention to detail (some types of pages just need to be relinked, others also need to have updates made to the logical row id manager), but I think it's doable.  Maybe we have a policy that says "when there are more than XX pages in the free page list, we run the pack algorithm".

Any others you can think of?

Cheers!

- K
```