From: Alex B. <boi...@in...> - 2005-10-12 16:55:49
|
Thompson, Bryan B. wrote: >Perhaps some further reading will make it possible to decide whether the >introduction of a proposed: > > long insert( long hintId, Object obj, Serializer serializer ); > >method to the RecordManager interface will be more help or hinderance? > > My take on this -- also based on reading up on the log-structured object store litterature --- is that the serializer should be responsible for providing information back to the object store about the relationships of the object instance with other instances. In more concrete terms, this would take the form of an additional method on the Serializer: /** * Return recids of referenced objects for storage clustering purposes. */ long[] getReferences(); That way, when the RecordManager does reorganization of the database or moves objects from the log to the stable storage, it can easily do object-clustering based on some form of policy (breath-first, depth-first, ...). This also centers the clustering concern on the developer of the object (and associated serializer) rather than the user of the object. alex |
From: Thompson, B. B. <BRY...@sa...> - 2005-10-12 18:18:38
|
Alex, This seems simple enough. My only critique is that it does not account for the actual usage patterns, which could be tracked by an adaptive technique, and that I don't know off hand what heuristic would be used to make the clustering decisions. Do you have some references on using this approach so that I can read up? I had been under the impression that jdbm had an object (record) cache but not a page cache until I did a read through on RecordFile and TransactionManager. Given that a page cache already exists there are clearly going to be co-materialization benefits for placing records on the same page since that will mean that we do not have to fetch those records from disk once the page is in cache. Thanks, -bryan -----Original Message----- From: Alex Boisvert To: Thompson, Bryan B. Cc: 'Kevin Day '; jdb...@li...; 'JDBM Developer listserv ' Sent: 10/12/2005 11:49 AM Subject: Re: [Jdbm-developer] logical containers and translation pages Thompson, Bryan B. wrote: >Perhaps some further reading will make it possible to decide whether the >introduction of a proposed: > > long insert( long hintId, Object obj, Serializer serializer ); > >method to the RecordManager interface will be more help or hinderance? > > My take on this -- also based on reading up on the log-structured object store litterature --- is that the serializer should be responsible for providing information back to the object store about the relationships of the object instance with other instances. In more concrete terms, this would take the form of an additional method on the Serializer: /** * Return recids of referenced objects for storage clustering purposes. */ long[] getReferences(); That way, when the RecordManager does reorganization of the database or moves objects from the log to the stable storage, it can easily do object-clustering based on some form of policy (breath-first, depth-first, ...). This also centers the clustering concern on the developer of the object (and associated serializer) rather than the user of the object. alex |
From: Alex B. <boi...@in...> - 2005-10-13 16:46:39
|
Bryan, Sorry I don't have specific references for this approach, although it is common in most ODBMS for the engine to introspect/obtain references within an object for consistency (referencial integrity) and performance optimization. I don't know of any system that explicitly track actual usage pattern to optimize object clustering. Most systems I'm familiar with use pre-defined heuristics that happen to work well for a wide range of access patterns, and some allow you to tune or substitute the heuristics if your application doesn't perform well under the common ones. alex Thompson, Bryan B. wrote: >Alex, > >This seems simple enough. My only critique is that it does not account >for the actual usage patterns, which could be tracked by an adaptive >technique, and that I don't know off hand what heuristic would be used >to make the clustering decisions. Do you have some references on using >this approach so that I can read up? > >I had been under the impression that jdbm had an object (record) cache >but not a page cache until I did a read through on RecordFile and >TransactionManager. Given that a page cache already exists there are >clearly going to be co-materialization benefits for placing records on >the same page since that will mean that we do not have to fetch those >records from disk once the page is in cache. > >Thanks, > >-bryan > >-----Original Message----- >From: Alex Boisvert >To: Thompson, Bryan B. >Cc: 'Kevin Day '; jdb...@li...; 'JDBM >Developer listserv ' >Sent: 10/12/2005 11:49 AM >Subject: Re: [Jdbm-developer] logical containers and translation pages > >Thompson, Bryan B. wrote: > > > >>Perhaps some further reading will make it possible to decide whether >> >> >the > > >>introduction of a proposed: >> >> long insert( long hintId, Object obj, Serializer serializer ); >> >>method to the RecordManager interface will be more help or hinderance? >> >> >> >> >My take on this -- also based on reading up on the log-structured object > >store litterature --- is that the serializer should be responsible for >providing information back to the object store about the relationships >of the object instance with other instances. > >In more concrete terms, this would take the form of an additional method > >on the Serializer: > > /** > * Return recids of referenced objects for storage clustering >purposes. > */ > long[] getReferences(); > >That way, when the RecordManager does reorganization of the database or >moves objects from the log to the stable storage, it can easily do >object-clustering based on some form of policy (breath-first, >depth-first, ...). > >This also centers the clustering concern on the developer of the object >(and associated serializer) rather than the user of the object. > >alex > > |
From: Kevin D. <ke...@tr...> - 2005-10-13 17:19:15
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2722" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma size=2> <DIV>As a general comment: The ObjReference implementation I'm thinking about would provide this type of information implicitly... It would require that object clustering be performed at a higher layer in the stack (at the object level instead of the record level), but there may be something there that would help.</DIV> <DIV> </DIV> <DIV>Right now I do not maintain a list of references, but I think it would be relatively easy to store a forward and reverse object reference map. When objects are deleted, the object manager would check to see if there are any references to that object, and either clear those, or throw an exception. That same map could probably be used to provide hints to determine proximity rules during a pack&rebuild optimization...</DIV> <DIV> </DIV> <DIV>It's a bit complicated, but it seems cleaner to me than requiring the programmer to provides explicit hints to the record manager directly. (Of course, providing such hints to a container is fine, because that's the purpose of the container).</DIV> <DIV> </DIV> <DIV>- K<BR><BR><BR> > <BR>Bryan,<BR><BR>Sorry I don't have specific references for this approach, although it is <BR>common in most ODBMS for the engine to introspect/obtain references <BR>within an object for consistency (referencial integrity) and performance <BR>optimization.<BR><BR>I don't know of any system that explicitly track actual usage pattern to <BR>optimize object clustering. Most systems I'm familiar with use <BR>pre-defined heuristics that happen to work well for a wide range of <BR>access patterns, and some allow you to tune or substitute the heuristics <BR>if your application doesn't perform well under the common ones.<BR><BR>alex<BR><BR><BR>Thompson, Bryan B. wrote:<BR><BR>>Alex,<BR>><BR>>This seems simple enough. My only critique is that it does not account<BR>>for the actual usage patterns, which could be tracked by an adaptive<BR>>technique, and that I don't know off hand what heuristic would be used<BR>>to make the clustering decisions. Do you have some references on using<BR>>this approach so that I can read up?<BR>><BR>>I had been under the impression that jdbm had an object (record) cache<BR>>but not a page cache until I did a read through on RecordFile and<BR>>TransactionManager. Given that a page cache already exists there are<BR>>clearly going to be co-materialization benefits for placing records on<BR>>the same page since that will mean that we do not have to fetch those<BR>>records from disk once the page is in cache.<BR>><BR>>Thanks,<BR>><BR>>-bryan<BR>><BR>>-----Original Message-----<BR>>From: Alex Boisvert<BR>>To: Thompson, Bryan B.<BR>>Cc: 'Kevin Day '; <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...;</FONT></A> 'JDBM<BR>>Developer listserv '<BR>>Sent: 10/12/2005 11:49 AM<BR>>Subject: Re: [Jdbm-developer] logical containers and translation pages<BR>><BR>>Thompson, Bryan B. wrote:<BR>><BR>> <BR>><BR>>>Perhaps some further reading will make it possible to decide whether<BR>>> <BR>>><BR>>the<BR>> <BR>><BR>>>introduction of a proposed:<BR>>><BR>>> long insert( long hintId, Object obj, Serializer serializer );<BR>>><BR>>>method to the RecordManager interface will be more help or hinderance?<BR>>> <BR>>><BR>>> <BR>>><BR>>My take on this -- also based on reading up on the log-structured object<BR>><BR>>store litterature --- is that the serializer should be responsible for <BR>>providing information back to the object store about the relationships <BR>>of the object instance with other instances.<BR>><BR>>In more concrete terms, this would take the form of an additional method<BR>><BR>>on the Serializer:<BR>><BR>> /**<BR>> * Return recids of referenced objects for storage clustering<BR>>purposes.<BR>> */<BR>> long[] getReferences();<BR>><BR>>That way, when the RecordManager does reorganization of the database or <BR>>moves objects from the log to the stable storage, it can easily do <BR>>object-clustering based on some form of policy (breath-first, <BR>>depth-first, ...).<BR>><BR>>This also centers the clustering concern on the developer of the object <BR>>(and associated serializer) rather than the user of the object.<BR>><BR>>alex<BR>> <BR>><BR><BR><BR><</DIV></FONT></BODY></HTML> |