Kevin,
 
I'm afraid that I did not see a strategy laid out in enough detail to judge.  Did I miss something?
 
The notion of a analog of DBCache in which objects live in the safe and cache might prove interesting.
However, one of the things which has become evident in prototying is that the safe is designed for
fast sequential writes of fixed size records.  I tried a design in which the writes on the safe only contained
the region of a page which was actually updated (this often corresponds to a physical row, unless the physical
row spans more than one page) and rejected it for several reasons.  First, since the safe is oriented at sustained
sequential I/O, I am not sure that there will be a performance gain from only logging updated regions.  Second,
during recovery you can not simply retrieve the page from the safe since all you have is a part of the page.  You
have to go to the DB and do a random access to get the BFIM and then merge the updated region from the safe.
This means that we are sacrificing fast restart.  Third, when you are making many changes to the same page,
logging those changes as distinct entries on the safe will actually take more I/O than just logging the entire page.
Finally, the design of the safe management algorithms are entirely oriented towards fixed size records in the safe.
We would need a new design which was thought through in depth for the safe if it only stored serialized objects
(or parts of a page).
 
I general, I feel that we are better off by moving incrementally from a design which has been "tested".  We
can build that, critique it, and look for further improvements.  There are more than enough gotcha's in doing
just that. Call it risk management.
 
-bryan
-----Original Message-----
From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Tuesday, January 31, 2006 4:43 PM
To: JDBM Developer listserv
Subject: re[2]: [Jdbm-developer] page locking, row locking.

Bryan-
 
I went down this path as well (had a nifty locking mechanism based on a bitmap, etc...), but after hearing Alex's idea, I changed my mind...  What are your comments on his strategy?
 
BTW - Alex's approach is also conducive to leaving the current file structure in place - what's even better, it allows for a completely orthoganol paging optimization strategy for the underlying file...
 
- K
 
 
> Kevin,
 
I've  been working through the DBCache design.  My present inclination is that it  is best suited to a transactional manager of page
memory.  I have modified the design to include exclusive write  locks on regions of a page, which would serve to support the  jdbm
requirement for managing physical rows.  (A physical row which spans  multiple pages would require a lock on the appropriate
region  of each page.)  The use of exclusive write locks on regions of a page also  supports the existing logical to physical row id
translation pages since we can lock and updateindividual slots on a  page.  I hope to send along a UML model in a day or so  which
lays  out the data structures to support this design.
 
Given  this design, DBCache will appear as an undifferentiated linear address space  supporting shared read locks (on pages) and
concurrent exclusive write locks on regions of pages with transactional  processing.  jdbm can then impose its existing data  structure
on  that linear address space.  Ideally we will even be able to  preserve the existing file structure - that would be quite  something!
In  order to support transactional processing in jdbm we will have to modify the  jdbm API to providea record-oriented  transaction
API  (based on RecordManager).
 
The  goal of object locking is more difficult.  While DBCache would support  locking at the page image level, there are still design
issues  at the object layer for shared read-locks on objects combined with exclusive  write locks on copies of objects (if we use the
same  locking scheme at the object layer).  It is certainly feasible to develop a  protocol either in a common base class for persistent
objects (I know your objections to this!) or an interface which  persistent objects would implement (it might be difficult to get  the
interface semantics correct asan application developer).  Another  option is BCE (byte code engineering) as you have mentioned
before.  The purpose of object locking would be to facilitate shared  access to read-only objects within the JVM heap and  exclusive
access  to an updatable copy of an object within a Tx.  This provides for safety  transaction isolation at the object layer together and
a  savings in the heap since read-only objects may be shared across  transactions. The alternative is to have distinct instances  of
each  persistent object within each transaction, in which case DBCache will provide  transaction isolation by rejecting requests for
shared  write locks.  That alternative will fall out naturally, so we can then look  at the added funtionality of a locking protocol for the
object  layer.
 
My own  view is that we need work incrementally.  To that end I am currently  exploring the design and a prototype for DBCache.  I
hope  to send along a UML model shortly.  Once that is working,  I would like to re-factor the RecordManager APIs and their  current
implementation to support transactions.  With that  in hand, I believe that we will be in a good position to consider object  locking.
 
-bryan

-----Original Message-----
From:  jdbm-developer-admin@lists.sourceforge.net  [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin  Day
Sent: Monday, January 30, 2006 8:24 PM
To: JDBM  Developer listserv
Subject: re: [Jdbm-developer] page locking, row  locking.


Bryan-
 

I think that working with pages in the dbcache  may prove to not be the way to go...  Nothing about the way that people  actually use jdbm has to do with pages - they all deal directly with rows (or  objects reconstructed from those rows).  I think that trying to manage  locking at the page level creates an artificial barrier, when we can go  directly to the row level and work there...
 
I'm actually quite partial to Alex's idea of  ditching the page cache altogether and use the dbcache for managing individual  rows...  As rows are evicted from dbcache, the underlying page manager  can do whatever it wants to with them (i.e. write rows in place, write them to  a new page, optimize, etc...).
 
There would still be a need for a page cache in  the underlying page manager, but this would be mostly for  read optimization purposes only - not transaction management (in other  words, the dbCache would operate transactionally at a level above the  page manager, so the page manager would not need to do any transaction  handling itself - rollback or transaction failure would be managed entirely at  the row level in the dbCache).
 
If we go down that path, then we have a very big  question to ask about the Isolation requirement of ACID...
 
If we strictly enforce isolation, then we must,  by definition, create an object per transaction (which will require one copy  of each row per transaction, plus the 'current' copy).  The  alternative is to ditch isolation in favor of application level consistency  semantics (e.g. synchronization at the object level to ensure that  simultaneous changes can not force an object into an invalid  state).
 
At the current time, I don't see a good way  around this tradeoff - maybe there is one I'm just not thinking  of...
 
 
For what it's worth, the documented dbCache  design does allow for multiple copies of a given page to exist in  memory.  The safe itself would only contain one version of the page - the  'current' version.  Basically, you have to be able to reconstruct the  state of the cache, with all transactions rolled back, by restoring from the  safe.
 
- K
 
 
>All,

I've been  doing some modeling and some prototyping in order to understand
the  DBCache design a little bit better.  DBCache used page locking  and
therefore never had more than one image of a page in the cache.   This means
that: (a) the pageId was a unique key for the cache  -- and we need a unique
key in order to support an MRU for the cache;  and (b) an original image of a
page was no longer available once a  transaction had written on that page --
even if the transaction has  not committed yet.

Imagine that we have two transactions, one of  which is read only.  Here is
one possible serialization of  events:

Tx1 begins (readOnly).

Tx1 reads the state of  page12.

Tx2 begins.

Tx2 updates page10 (copy on write  converts an original of page10 into a
copy).

Tx1 reads page10  and is aborted since page10 is not an original.

Tx2  commits.

This shows how a transaction which starts earlier can be  aborted by an
updated on a page which it has not read yet.  Some  stores allow this and
expose the original state of the store to  transactions which have begun
before the writer commits.  This  could be done by maintaining the original
image of page10 during the  copy-on-write.  Any transaction wanting to read
page10 can still  do so until Tx2 commits.  I am not clear on the effect of
Tx2  committing.  Does it result in Tx1 seeing the version of page10  from
when Tx1 started or the updated version of page10 which was  committed by Tx2
-- presumably the latter.

I think that we  might want to create or research such serializations of
events in  multiple transactions in order to make some decisions about  the
polices we want to support for jdbm.  Those cases can be  readily captured as
unit tests for validation.

I have also  been thinking about strategies to support row locking.   My
inclination is to expose the page image as an nio ByteByffer  wrapping the
byte[].  This makes it possible to expose an  immutable view of the data.  It
also makes it possible to expose  only a region of the image for a page.  In
the context of jdbm,  and with some additional state for row locks on the
page, this could  be used to request only the region of a page corresponding
to a  physical row.  This would effect a read (or write) lock on that  part of
the page.  Since a physical row can span more than one  page in jdbm, row
locking in DBCache would be in terms of an offset  and length on a page
image, so jdbm would create a row lock on the  appropriate region of each
page that was spanned by the physical  record.

I've done a little thinking about the data structure for  maintaining such
row locks.  I am inclined to start out with an  insertion sort on the offset
into the page and a constraint test  during the insert to verify that the
offset + range is less than the  offset of next row lock on the page.

It seems to me that both  allowing page reads of originals which have been
written on by  uncommitted transactions and allowing row locking will require
that  we allow up to 2C page images into the cache, where C is the  capacity
of the safe.  DBCache as published only permitted a  single image for a page
into the  cache.

Cheers,

-bryan


-------------------------------------------------------
This  SF.net email is sponsored by: Splunk Inc. Do you grep through log  files
for problems?  Stop!  Download the new AJAX search  engine that makes
searching your log files as easy as surfing the   web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Jdbm-developer  mailing list
Jdbm-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jdbm-developer

<
-------------------------------------------------------  This SF.net email is sponsored by: Splunk Inc. Do you grep through log files  for problems? Stop! Download the new AJAX search engine that makes searching  your log files as easy as surfing the web. DOWNLOAD SPLUNK!  http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642  _______________________________________________ Jdbm-developer mailing list  Jdbm-developer@lists.sourceforge.net  https://lists.sourceforge.net/lists/listinfo/jdbm-developer <
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdbm-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer