From: Kevin D. <ke...@tr...> - 2006-01-17 00:35:45
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Alex-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>On NIO:</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>If you have your ByteBuffer is direct, then transfer between it and another direct ByteBuffer are performed using very efficient system level calls. Technically, it is up to the JVM implementation to decide whether to use the efficient system calls or not, but I know for certain that Sun's Windows JRE does (and I suspect the Solaris, etc... versions do as well).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Here's the link to the ByteBuffer spec: <A href="http://java.sun.com/j2se/1.4.2/docs/api/java/nio/ByteBuffer.html">http://java.sun.com/j2se/1.4.2/docs/api/java/nio/ByteBuffer.html</A></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Take a look at the "Direct vs. non-direct buffers" commentary. I could definitely see setting up the DB Cache ring buffer as a mapped file with a direct buffer, and just blow changed page content directly into the file instead of messing with byte[] copy of a given page.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>At a higher level, bulk copies can be made directly on a FileChannel without need to create a direct ByteBuffer yourself (or mapping a file into memory). Take a look at <A href="http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html">http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html</A> and the transferFrom and transferTo methods.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>To give yourself a handle on how much more efficient this is, try copying a large file using the regular stream (or RAF) approach, then try doing it using FileChannel.transferFrom - it's remarkable how much faster it is. I suspect in a network environment the difference would be even more pronounced, because you could technically do the copy without actually pulling data over the wire (but I haven't actually tested this, so that may be bunk).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Awhile back I kicked around the idea of having the log file become the primary database during a long transaction, then use these fast file operations to copy the data into the db file.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Even during the current log sync, this would have a drastic impact on performance - but it requires the transaction log to be implemented using ByteBuffers instad of ObjectOutputStream.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=red>> <BR>Kevin,<BR><BR>See comments inline.<BR><BR>Kevin Day wrote:<BR>> Thought 1: The first question has to do with long transaction support. <BR>> I believe that the long transaction support described in the paper has a <BR>> problem with it, and I was wondering if you could help me understand... <BR>> In the paper, they suggest writing the *pre* transaction version of a <BR>> page to a transaction specific file, and writing actual changes into the <BR>> DB itself. If a transaction has to be rolled back, the pre transaction <BR>> version is restored into the DB.<BR>> <BR>> This seems like it has one very serious problem when it comes to <BR>> multiple transaction support: If there are other transactions that <BR>> begin after the long transaction begins, they will wind up restoring a <BR>> changed page from the DB (the page won't be in the cache). This could <BR>> lead to reads of inconsistent data...<BR>> <BR>> Am I missing something here? It sure seems like it would make more <BR>> sense to write changed pages (for pages that overflow the cache due to <BR>> long transaction) to a per-transaction file. Roll-back is performed by <BR>> deleting the file. Commit marks the transaction file, then copies data <BR>> from the transaction file into the DB. If the copy fails, restart can <BR>> detect that the transaction file is marked as complete, and the copy and <BR>> occur (similar to jdbm's current log file - but one file per long <BR>> transaction).<BR>> <BR>> Is there something important I'm missing here? I can't imagine a <BR>> scenario where storing a pre-transaction version of a page external to <BR>> the DB, then updating the DB directly would ever make any sense...<BR><BR>I believe they assume some form of concurrency control (e.g. locks) <BR>which would prevent conflicting changes made in the first place.<BR><BR>> Thought 2: Because roll-back is supported at a page level only, some <BR>> higher level synchronization and/or transaction mechanism is going to be <BR>> required. I do not think it is generally acceptable to have a high <BR>> level transaction (i.e. the storage of an object/row of data) fail due <BR>> to locking problems unless there is actually a locking issue with that <BR>> particular object. If the transactions are managed at the page level <BR>> only, then an update to one row could fail due to work being done on a <BR>> different row on the same page by a different transaction. I think that <BR>> if we are really going to do this, that we need to have some sort of <BR>> blocking operation that kicks in when a page conflict arises that is not <BR>> also a row conflict.<BR><BR>This issue is not addressed by the paper -- in fact I believe they do <BR>mention this issue and point out that it would be much more complex.<BR><BR>> Thought 3: Java itself may have some interesting implications to the <BR>> implementation of the ring buffer/safe concept. Using NIO, it may be <BR>> faster to write data into the DB directly from the Safe itself, instead <BR>> of from RAM. It is possible to use low level transfer operations in NIO <BR>> that allow the file system to handle the byte transfer, instead of <BR>> having to move the bytes across the native/jvm boundary...<BR><BR>I wasn't aware of this feature. Do you have any pointers to specific <BR>NIO calls to achieve this?<BR><BR>alex<BR><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |