re[2]: [Jdbm-developer] commit: jdbm.recman.DumpUtility

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
 Transitional//EN">
<HTML><HEAD>
<STYLE type=text/css> P, UL, OL, DL, DIR,
 MENU, PRE { margin: 0 auto;}</STYLE>

<META content="MSHTML 6.00.2900.2668" name=GENERATOR></HEAD>
<BODY leftMargin=1 topMargin=1 rightMargin=1><FONT
 face=Tahoma size=2>
<DIV>Bryan-<BR></DIV>
<DIV>"If a data &nbsp;page is solely used
 as a continuation page for a &nbsp;record,
 does it show up in the used pages list? &nbsp;"</DIV>
<DIV>&nbsp;</DIV>
<DIV>KD - There is no used pages list, per
 se - just the linked list that actually contains
 the page.&nbsp; But, to answer the question:&nbsp;
 Sure.&nbsp; It's a used page so it is in
 the linked list (pointed to by the prior
 page).</DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV>"If so, what is the return &nbsp;value
 for getFirst() on such a data page since
 &nbsp;there is no record header?"</DIV>
<DIV>&nbsp;</DIV>
<DIV>KD - It returns 0.&nbsp; Each page has
 a header, so it is impossible for actual
 data to reside at position 0.</DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV>""interesting" information to extract
 from &nbsp;the free page list"</DIV>
<DIV>&nbsp;</DIV>
<DIV>KD - The data that I'm interested in
 is:</DIV>
<DIV>&nbsp;</DIV>
<DIV>a)&nbsp; The number of pages in the
 entire free page list</DIV>
<DIV>b)&nbsp; Total bytes in the free page
 list</DIV>
<DIV>c)&nbsp; Total DATA bytes in the free
 page list (i.e. total bytes less header bytes)</DIV>
<DIV>d)&nbsp; Number of free pages at the
 end of the file</DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV>One general comment:&nbsp; Having free
 pages scattered around the file is not a
 bad thing.&nbsp; The file system is doing
 this anyway, so any concern about whether
 two blocks that are adjacent in the list
 are also adjacent on disk isn't going to
 be worth it.&nbsp; Once you are reading data
 in pages, the actual location of those pages
 becomes much less important.</DIV>
<DIV>&nbsp;</DIV>
<DIV>The only time that it actually does
 matter is if we are trying to shrink the
 size of the database file.&nbsp; If we are
 going to be going there (and I'm not entirely
 certain that it is really necessary, given
 most usage patterns), then pages can be moved
 physically towards the front of the file
 without changing their logical location in
 the linked list.&nbsp; For the pages in the
 free list, this can be done easily.&nbsp;
 For the other page types, we'd have to make
 sure that we properly update the translation
 table and free lists...</DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV>Cheers!</DIV>
<DIV>&nbsp;</DIV>
<DIV>- K</DIV>
<DIV>&nbsp;</DIV>
<DIV><BR>-------------------<BR>&nbsp;&gt;
 Kevin, <BR><BR>Two &nbsp;questions related
 to the traversal of the "used pages" list.
 &nbsp;If a data &nbsp;page is solely used
 as a continuation page <BR>for a &nbsp;record,
 does it show up in the used pages list? &nbsp;If
 so, what is the return &nbsp;value for getFirst()
 on such a data page <BR>since &nbsp;there
 is no record header? <BR><BR>Also, &nbsp;do
 you have thoughts on what would be "interesting"
 information to extract from &nbsp;the free
 page list? &nbsp;I am thinking <BR>of a &nbsp;free
 page count and perhaps determining how much
 fragmentation there is in &nbsp;the free
 page list. &nbsp;Which gets back <BR>to an
 &nbsp;earlier question -- does jdbm memory
 allocation make any effort to allocation
 &nbsp;contiguous pages for large records?
 <BR><BR>-bryan <BR><BR>-----Original Message-----<BR>From:
 &nbsp;<A href="mailto:jdb...@li..."><FONT
 color=#0000ff>jdb...@li...</FONT></A>
 &nbsp;<A href="mailto:jdb...@li..."><FONT
 color=#0000ff>[mailto:jdb...@li...]</FONT></A>
 On Behalf Of &nbsp;Thompson, Bryan B.<BR>Sent:
 Friday, September 23, 2005 8:15 &nbsp;AM<BR>To:
 Kevin Day<BR>Cc: &nbsp;<A href="mailto:jdb...@li..."><FONT
 color=#0000ff>jdb...@li...</FONT></A><BR>Subject:
 RE: [Jdbm-developer] &nbsp;commit: jdbm.recman.DumpUtility<BR><BR><BR>I'll
 &nbsp;try this today. &nbsp;One thing that
 I am not clear about with jdbm is whether
 &nbsp;it makes any attempt (or guarentee)
 <BR>that &nbsp;records which span a page
 will be on contiguous pages. <BR><BR>As I
 &nbsp;mentioned, I have been playing with
 some blob/clob support. &nbsp;My original
 &nbsp;take was to have the records form <BR>a
 &nbsp;linked list. &nbsp;Since this is exactly
 what the jdbm record headers are doing, it
 &nbsp;should be possible to incrementally
 <BR>allocate new pages into a jdbm record,
 thereby supporting streaming &nbsp;from the
 application in addition to the current <BR>Serializer
 approach. &nbsp;An interesting &nbsp;thought.
 <BR><BR>However, it occurred to me that greater
 efficiency could be obtained by &nbsp;blocking
 the linked list of records (in my <BR>current
 implementation of blob/clob) into an array
 of recids in a &nbsp;header record for the
 blob. &nbsp;That would make it <BR>possible
 to do pre-fetch strategies for subsequent
 segments of the &nbsp;blob. &nbsp;If we use
 the record header approach as <BR>it &nbsp;stands
 today, the most pre-fetch that we could do
 is one page read ahead at a &nbsp;time. <BR><BR>You
 &nbsp;mentioned in another thread I/O efficiency.
 &nbsp;The guiding principle as I &nbsp;understand
 it is that you want to get as <BR>much &nbsp;I/O
 concurrency as possible so that you can get
 as many disk arms behind your &nbsp;application
 as possible. <BR>This &nbsp;leads to the
 use of striped disk arrays and cluster storage
 solutions. &nbsp;&nbsp;jdbm today is single
 threaded for read <BR>and &nbsp;write, so
 it is not possible to get I/O concurrency.
 &nbsp;Even if youhave &nbsp;I/O concurrency
 at the store layer, your <BR>application
 has to support it as well. &nbsp;For us,
 I/O concurrency is &nbsp;gained by high level
 query languages that can <BR>use &nbsp;parallel
 read operations against the store or (in
 a different application &nbsp;which is a
 fuzzy inference engine based <BR>on a neural
 network model) by &nbsp;being able to model
 the main computation using &nbsp;a parallel
 processing approach. <BR><BR>One &nbsp;change
 that I would like to introduce (if we wind
 up introducing changes in &nbsp;the jdbm
 file structure) is a long <BR>class identifier
 in the record header. &nbsp;This would make
 it &nbsp;possible to accurrately profile
 the contents of a jdbm <BR>store. &nbsp;For
 certain key classes (BTree, BPage, HashDictionary,
 &nbsp;HashBucket, String, Long) we would
 have "magic" <BR>pre-defined values that
 used negative recids. &nbsp;All other classes
 &nbsp;would be assigned recids by interning
 the class name <BR>in a &nbsp;string table.
 &nbsp;The string table itself could be just
 a BTree using a &nbsp;compressed key index
 whose keys are the <BR>string values and
 whose values are the jdbm records whose content
 is &nbsp;that string. &nbsp;The latter is
 necessary so &nbsp;that <BR>you can lookup
 the class &nbsp;name. <BR><BR>I am certainly
 going to do this at the object manager &nbsp;layer
 for our application since some use of Externalizable
 &nbsp;or <BR>even Serializable appears to
 bewhy the store is so &nbsp;bloated (I had
 no idea just how bad java serialization was!).
 &nbsp;&nbsp;I <BR>think that it would be
 a nice feature for jdbm, but &nbsp;not one
 that we could introduce while maintaining
 binary &nbsp;compatibility <BR>in thestore
 file. <BR><BR>At the same time, I would also
 like to introduce &nbsp;version numbers to
 critical classes (BTree, BPage, etc.) so
 that &nbsp;we <BR>have more flexibility in
 evolving jdbm without &nbsp;breaking binary
 compatibility. <BR><BR>-bryan <BR><BR>-----Original
 Message-----<BR>From: Kevin Day &nbsp;<A
 href="mailto:ke...@tr..."><FONT
 color=#0000ff>[mailto:ke...@tr...]</FONT></A>
 <BR>Sent: Thursday, September 22, 2005 &nbsp;11:26
 PM<BR>To: Thompson, Bryan B.<BR>Subject:
 re: &nbsp;[Jdbm-developer] commit: jdbm.recman.DumpUtility<BR><BR><BR>Bryan-
 <BR><BR>Thanks for picking this up right
 now - I am &nbsp;completely slammed with
 work that I'm not able to really write any
 jdbm &nbsp;code. &nbsp;For some reason it's
 a lot easier for me to think of conceptual
 &nbsp;design and algorithms in the evening
 than actually bang on the &nbsp;keyboard.
 <BR><BR>Things should clear up a bit in the
 next week &nbsp;or so. &nbsp;Please let me
 know if you need a hand walking the data
 pages - &nbsp;I think that is going to be
 your best bet for getting a solid unused
 space &nbsp;percentage. <BR><BR>- K <BR><BR>&nbsp;<BR>&gt;I
 finally got &nbsp;through the sourceforge
 CVS. &nbsp;I will pick up &nbsp;work on the
 &nbsp;free lists tomorrow. &nbsp;-bryan &nbsp;&lt;<BR>&lt;</DIV></FONT></BODY></HTML>