From: Thompson, B. B. <BRY...@sa...> - 2005-10-13 17:47:25
|
Kevin, I've tried a variation of this in which it did extend Serializer and backed out of it since it introduced what appeared to be additional complexity. If we go this way, then we wind up in a position where we have no common interface, but we could add an ISerializer for that. So, what you are proposing might look like: interface ISerializer {}; // marker interface. interface Serializer extends ISerializer, Serializable {...}; // as now. interface CompoundSerializer extends ISerializer, Serializable {}; // w/ the new API methods I did find that the "state" requirements for the "recman aware" serializer were close to those for the "serialization handler", which is another reason that I didn't introduce a serialization handler interface and backed out of a design in which I used another interface for handling compound records. It seemed a bit confusing and I was not sure that the additional interfaces would clarify anything for people. Another reason that I hesitate to do this is that the additional interfaces might make it more complicated to define a stream-based serialization API since that could be orthogonal to the question of "recman aware" resulting in a 2 x 2 design, which just seems too much. With respect to the record header, the extensible serialization approach is not placing any data into the record header. It all winds up in a "data" header before the rest of the serialized data. I was initially using the record header for this metadata, but that (a) breaks binary compatibility and (b) does not support the use of the extensible serializer within compound records, hence the revision to use a "data" header. Can you elaborate on your point a bit more given that the metadata is part of the record and not the record header? Thanks, -bryan -----Original Message----- From: Kevin Day To: Thompson, Bryan B.; 'jdb...@li... ' Sent: 10/13/2005 1:16 PM Subject: re: [Jdbm-developer] extensible serialization Bryan- I'm really of two minds about this myself... There are two competing requirements: Making the system easy to use for end users (people who are just using it as a data store), and making it easier for developers who are creating their own containers... The tradeoff in the design so far has been to lean towards the end users' requirements. I have one suggestion that may work around all of this. What do you think about adding a RecManAwareSerializer interface? It would *not* inherit from Serializer, and it's interface would be: byte[] serialize(RecMan, recid, Obj) Object deserialize(RecMan, recid, byte[]) The record manager can do a quick instanceof check on the supplied serializer (or the recovered serializer if we are going to store that info in the record header) and call the appropriate method. The reality of things is that even my technique of storing meta data for each object requires that the recman be made available during deserialization of the ObjReference objects (but that's the ONLY time it is needed). I'm using a factory to deal with this right now, but your comments on the limitations of factories in the BTree and HTree construction are dead on, and I am sick of jumping through the hoops that requires. I think that if we recognize that there are actually two layers of objects that need to get stored in the record manager, with two completely different serialization needs, then we can address both needs. Business objects continue to use the Serializer interface (which allows them to migrate quickly over to one of the containers if the developer decides to do that). Container objects will use the new interface (we'll have to update the BTree and HTree implementation - but that is well worth it if it allows us to easily sub-class them). This keeps the persistence logic separation for the business objects, but doesn't artificially prevent it for lower level container type objects that actually have valid reason to have access to the record manager implementation. None of the above, of course, gets at whether we need to store the class and serializer in the record header (or whether it should be encoded by some sort of serialization handler interface)... What do you think about my comment about including the serializer id in the record header, but not the class ID (Because the class will be recoverable from the serializer if it's needed)? That would help to keep record header overhead down, and I don't think we are actually losing any functionality/information by doing so. It also doesn't address the question of adjusting the API to include hints (I'm actually quite interested in Alex's comment that this behavior really belongs in the serializer...) - but hopefully it will give some food for thought. What do you guys think? Any problems with the instanceof technique that I'm not seeing ( researched performance hit of instanceof (just web research - no actual testing), and the literature says that instanceof check overhead is very, very low)? - K > Kevin, Thanks for your excellent feedback. I think that the question goes (1) to the complexity of supporting serializers that can encapsulate the required state (recman and possibly recid) vs having that state in the API with a stateless serializer and (2) whether the recman and recid are required to be passed through to support compound records (the practice of using serializers within a record as well as for the overall record). My position on (1) is that it is more complicated to provide for serializer constructor using MySerializer( RecordManager recman, long recid ) or to provide a callback on ISerializer that notes the recman and recid than it is to pass these through the API. In fact I require Serializers used by the extensible serialization (the serialization handler) to be stateless and to have a public zero argument constructor. The serialization handler is the only thing with state. My position on (2) is that I have a use case for compound records. Without passing through the required state (whether it is encapsulated or not) it is not possible to use compound records. BPage succeeds at this practice because the recman and recid are available as transient state on the BPage. By addressing (1) we also open up the possibility of writing constructors that insert objects (including BTrees) into the store, which means that people can now subclass BTree - a significant advantage in my mind. With respect to the interesting practice of using a weak hash map to recover the recid (or a reference object), that is a nice way of handling things. Of course it does require access to the recman, which means that we can't practice that inside of a serializer if the intention is to mark that information on the object, which, I know, goes against your recommendation. I am not against the practice of hiding the persistence layer, quite the contrary! However I handle encapsulation in a different manner with a framework over jdbm (or other persistence layers). Without arguing for (or against) any specific encapsulation technique, I feel that it raises a significant barrier to other practices by not have this information (recman, recid) in the Serializer API. What I would like to do at this point is update the write up the extensible serialization framework, incorporate your recommendation on how to encapsulate references using a weak hash map approach, develop a runtime RecordManager option to support that approach, and commit the existing extensible serialization code which passes the recman and recid through a modified Serializer API. I think that this supports multiple approaches to encapsulating persistence and guides people towards some alternatives. If you had some code that you would like to contribute to support the practice you have outlined I would be happy to integrate and test that as part of this effort. Thanks, -bryan ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. <http://solutions.newsforge.com/ibmarch.tmpl> http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Jdbm-developer mailing list <mailto:Jdb...@li...> Jdb...@li... <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> https://lists.sourceforge.net/lists/listinfo/jdbm-developer < |
From: 'Kevin D. ' <ke...@tr...> - 2005-10-13 19:43:42
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2722" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma size=2> <DIV>Bryan-<BR></DIV> <DIV>Your comment on the marker interface are dead-on - I should have mentioned that in my original email. It's a tad ugly, but not as ugly as some of the alternatives, and for the most part, it will be transparent to jdbm users (they will just implement Serializer and be done with it).</DIV> <DIV> </DIV> <DIV>I'm not sure, but this kind of change may force a recompile (source code compatible, but not byte code compatible)... We may have to give some thought to that.</DIV> <DIV> </DIV> <DIV>On the combinatorial explosion of different types of low level serializers, I agree in principal with your concerns, but I don't see a good way around it (without exposing recman details to objects that really shouldn't be concerned with them). No matter what we do, we are going to have to deal with that possibility. If we do it via serialization handlers, then we still have the same problem (unless we only want to support one type of serializer interface at a time, and that just doesn't make sense).</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Here is what I was thinking with respect to having the serializer information in the record header. The record manager is responsible for managing serialization (trying to use the record manager without knowledge of the recid->serializer mapping is impossible), so it may make sense to include this information in the record header itself.</DIV> <DIV> </DIV> <DIV>This allows for a significant amount of abstraction in that it removes specifying of the serializer from where the objects are actually used. I can envision an architecture that has two development tasks: configuration of the serializers that are going to be used with a given jdbm store, and business logic. By capturing the recid->serializer map in the record manager itself, we remove yet another aspect of the persistence layer from the business logic layer.</DIV> <DIV> </DIV> <DIV>It also allows us (as you've pointed out) to have a very easy, consistent mechanism for upgrading serializers. For small apps, that's not a big deal, but in my applications, I am finding that I am constantly having to create ad-hoc versioning information in my serializers. </DIV> <DIV> </DIV> <DIV>It also provides a framework for efficiently storing and retrieving sub-classes. The business logic code just applies a cast to the level of the inheritence that they need. In the current recman implementation, the business logic needs to know which serializer to provide, then do the down-cast.</DIV> <DIV> </DIV> <DIV>Again, I've wound up creating ad-hoc serializers that manage this, but it's ugly and non-standard.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>If the serializer mapping was part of the recman, then the fetch(long, Serializer) method gets deprecated and replaced with fetch(long). update(long, Serializer) becomes a simple update(long). Insert() we may want to hold on to the old form of the method so we can automatically populate the serializer mapping on the fly (I still think that configuring it all up front is a better approach, but what the heck).</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Another aspect of this type of change is (as you've pointed out) that it makes the externalizable interface efficiently usable in jdbm. You'd still lose the ability to upgrade serializers, but for classes where that's really not an issue, externalizable is a nice option.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Anyway, just some thinking...</DIV> <DIV> </DIV> <DIV>- K</DIV> <DIV> </DIV> <DIV><BR><BR> > Kevin,<BR><BR>I've tried a variation of this in which it did extend Serializer and<BR>backed out of it since it introduced what appeared to be additional<BR>complexity. If we go this way, then we wind up in a position where<BR>we have no common interface, but we could add an ISerializer for that.<BR>So, what you are proposing might look like:<BR><BR>interface ISerializer {}; // marker interface.<BR>interface Serializer extends ISerializer, Serializable {...}; // as now.<BR>interface CompoundSerializer extends ISerializer, Serializable {}; // w/ the<BR>new API methods<BR><BR>I did find that the "state" requirements for the "recman aware"<BR>serializer were close to those for the "serialization handler", which<BR>is another reason that I didn't introduce a serialization handler<BR>interface and backed out of a design in which I used another interface<BR>for handling compound records. It seemed a bit confusing and I was<BR>not sure that the additional interfaces would clarify anything for<BR>people. Another reason that I hesitate to do this is that the additional<BR>interfaces might make it more complicated to define a stream-based<BR>serialization API since that could be orthogonal to the question of<BR>"recman aware" resulting in a 2 x 2 design, which just seems too much.<BR><BR>With respect to the record header, the extensible serialization<BR>approach is not placing any data into the record header. It all<BR>winds up in a "data" header before the rest of the serialized<BR>data. I was initially using the record header for this metadata,<BR>but that (a) breaks binary compatibility and (b) does not support<BR>the use of the extensible serializer within compound records, hence<BR>the revision to use a "data" header.<BR><BR>Can you elaborate on your point a bit more given that the metadata<BR>is part of the record and not the record header?<BR><BR>Thanks,<BR><BR>-bryan<BR><BR>-----Original Message-----<BR>From: Kevin Day<BR>To: Thompson, Bryan B.; <A href="mailto:jdb...@li..."><FONT color=#0000ff>'jdb...@li...</FONT></A> '<BR>Sent: 10/13/2005 1:16 PM<BR>Subject: re: [Jdbm-developer] extensible serialization<BR><BR>Bryan-<BR><BR>I'm really of two minds about this myself...<BR><BR>There are two competing requirements: Making the system easy to use for<BR>end users (people who are just using it as a data store), and making it<BR>easier for developers who are creating their own containers...<BR><BR>The tradeoff in the design so far has been to lean towards the end<BR>users' requirements.<BR><BR>I have one suggestion that may work around all of this.<BR><BR>What do you think about adding a RecManAwareSerializer interface? It<BR>would *not* inherit from Serializer, and it's interface would be:<BR><BR>byte[] serialize(RecMan, recid, Obj)<BR>Object deserialize(RecMan, recid, byte[])<BR><BR><BR><BR>The record manager can do a quick instanceof check on the supplied<BR>serializer (or the recovered serializer if we are going to store that<BR>info in the record header) and call the appropriate method.<BR><BR>The reality of things is that even my technique of storing meta data for<BR>each object requires that the recman be made available during<BR>deserialization of the ObjReference objects (but that's the ONLY time it<BR>is needed). I'm using a factory to deal with this right now, but your<BR>comments on the limitations of factories in the BTree and HTree<BR>construction are dead on, and I am sick of jumping through the hoops<BR>that requires.<BR><BR>I think that if we recognize that there are actually two layers of<BR>objects that need to get stored in the record manager, with two<BR>completely different serialization needs, then we can address both<BR>needs.<BR><BR>Business objects continue to use the Serializer interface (which allows<BR>them to migrate quickly over to one of the containers if the developer<BR>decides to do that). Container objects will use the new interface<BR>(we'll have to update the BTree and HTree implementation - but that is<BR>well worth it if it allows us to easily sub-class them). This keeps the<BR>persistence logic separation for the business objects, but doesn't<BR>artificially prevent it for lower level container type objects that<BR>actually have valid reason to have access to the record manager<BR>implementation.<BR><BR><BR>None of the above, of course, gets at whether we need to store the class<BR>and serializer in the record header (or whether it should be encoded by<BR>some sort of serialization handler interface)... What do you think<BR>about my comment about including the serializer id in the record header,<BR>but not the class ID (Because the class will be recoverable from the<BR>serializer if it's needed)? That would help to keep record header<BR>overhead down, and I don't think we are actually losing any<BR>functionality/information by doing so.<BR><BR>It also doesn't address the question of adjusting the API to include<BR>hints (I'm actually quite interested in Alex's comment that this<BR>behavior really belongs in the serializer...) - but hopefully it will<BR>give some food for thought.<BR><BR>What do you guys think? Any problems with the instanceof technique that<BR>I'm not seeing ( researched performance hit of instanceof (just web<BR>research - no actual testing), and the literature says that instanceof<BR>check overhead is very, very low)?<BR><BR>- K<BR><BR><BR><BR>> Kevin,<BR><BR>Thanks for your excellent feedback. I think that the question goes<BR>(1) to the complexity of supporting serializers that can encapsulate<BR>the required state (recman and possibly recid) vs having that state<BR>in the API with a stateless serializer and (2) whether the recman and<BR>recid are required to be passed through to support compound records<BR>(the practice of using serializers within a record as well as for the<BR>overall record).<BR><BR>My position on (1) is that it is more complicated to provide for <BR>serializer constructor using MySerializer( RecordManager recman,<BR>long recid ) or to provide a callback on ISerializer that notes<BR>the recman and recid than it is to pass these through the API. In<BR>fact I require Serializers used by the extensible serialization<BR>(the serialization handler) to be stateless and to have a public<BR>zero argument constructor. The serialization handler is the only<BR>thing with state.<BR><BR>My position on (2) is that I have a use case for compound records.<BR>Without passing through the required state (whether it is encapsulated<BR>or not) it is not possible to use compound records. BPage succeeds at<BR>this practice because the recman and recid are available as transient<BR>state on the BPage. By addressing (1) we also open up the possibility<BR>of writing constructors that insert objects (including BTrees) into the<BR>store, which means that people can now subclass BTree - a significant<BR>advantage in my mind.<BR><BR>With respect to the interesting practice of using a weak hash map to<BR>recover the recid (or a reference object), that is a nice way of<BR>handling things. Of course it does require access to the recman,<BR>which means that we can't practice that inside of a serializer if<BR>the intention is to mark that information on the object, which, I<BR>know, goes against your recommendation. I am not against the practice<BR>of hiding the persistence layer, quite the contrary! However I handle<BR>encapsulation in a different manner with a framework over jdbm (or<BR>other persistence layers). Without arguing for (or against) any<BR>specific encapsulation technique, I feel that it raises a significant<BR>barrier to other practices by not have this information (recman, recid)<BR>in the Serializer API.<BR><BR>What I would like to do at this point is update the write up the<BR>extensible serialization framework, incorporate your recommendation on<BR>how to encapsulate references using a weak hash map approach, develop<BR>a runtime RecordManager option to support that approach, and commit the<BR>existing extensible serialization code which passes the recman and recid<BR>through a modified Serializer API. I think that this supports multiple<BR>approaches to encapsulating persistence and guides people towards some<BR>alternatives.<BR><BR>If you had some code that you would like to contribute to support the<BR>practice you have outlined I would be happy to integrate and test that<BR>as part of this effort.<BR><BR>Thanks,<BR><BR>-bryan<BR><BR><BR>-------------------------------------------------------<BR>This SF.Net email is sponsored by:<BR>Power Architecture Resource Center: Free content, downloads,<BR>discussions,<BR>and more. <A href="http://solutions.newsforge.com/ibmarch.tmpl"><FONT color=#0000ff><http://solutions.newsforge.com/ibmarch.tmpl></FONT></A><BR><A href="http://solutions.newsforge.com/ibmarch.tmpl"><FONT color=#0000ff>http://solutions.newsforge.com/ibmarch.tmpl</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff><mailto:Jdb...@li...></FONT></A><BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff><https://lists.sourceforge.net/lists/listinfo/jdbm-developer></FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR><BR><BR>-------------------------------------------------------<BR>This SF.Net email is sponsored by:<BR>Power Architecture Resource Center: Free content, downloads, discussions,<BR>and more. <A href="http://solutions.newsforge.com/ibmarch.tmpl"><FONT color=#0000ff>http://solutions.newsforge.com/ibmarch.tmpl</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><</DIV></FONT></BODY></HTML> |