From: Conal T. <con...@vu...> - 2008-03-10 05:23:42
|
I have an old topicmap provider implementation which no longer links against recent builds of TM4j because of the addition of the equalsByID method to the TopicMapObject interface. It is certainly an easy method to implement, but I'm concerned because I don't even understand why the method has been added to TopicMapObject. It appears it may have been a "rough idea" that never really developed? In any case, it seems like a needless complication to me, and I am inclined to remove it. First I thought I would ask about it here. The point of the method appears to be to hide the data type of the object's id, presumably so that it could potentially be something other than a String. But this seems pointless to me, since actually these id properties are constrained to be strings (actually, XML names) by the XML Topic Maps specification. There's no way they could be anything other than strings! When I looked at each provider's implementation of the TopicMapObject interface in the TM4j source tree, they were all exactly the same: public boolean equalsByID(TopicMapObject o) { return getID().equals(o.getID()); } That fact in itself is a good indicator that the method doesn't properly belong in the TopicMapProvider interface. If there is no need for different providers to implement the method differently (and indeed the JavaDoc for the interface actually specifies the implementation!), then it belongs in a helper class, or as a protected helper method, but not as a public API. Each of the tm4j topic map providers had an implementation of TopicMapObject which included this method. These methods were used in at most a few places within that provider's package (e.g. org.tm4j.topicmap.hibernate.TopicImpl calls the method as a helper within isOfType(Topic)). In all of those cases the method could just as well have been protected - it didn't need to be part of the public API. Even if it were just a protected method of TopicMapObjectImpl (and not required by the TopicMapObject interface) it would seem hardly justifiable because it is just a single line of code, and used only a few times (not used at all in the unified provider!). Incidentally, I'm not sure if the implementation in the "unified" provider is correct - it's subtly inconsistent with equals(). The only use I could see of the method outside of provider classes was in org.tm4j.topicmap.utils.TopicMapWalker, where it was used just once, and could I think have just as well used equals() or at least getID().equals(o.getID()) So I propose to remove the equalsByID method from the interface, and replace its use in TopicMapWalker with equivalent code comparing id values. Any objections? Con -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-11 04:06:21
|
Hi Xuân I'm not convinced, actually. It seems to me you should have introduced a new, TMDM-flavoured TopicMapObject interface, rather than changing the old, XTM 1.0-flavoured interface. As you yourself pointed out, the getID() method isn't appropriate for TMDM-based topicmaps, so a distinct interface for TMDM TopicMapObjects is probably better (rather than an extended one, with "deprecated" bits). Personally, I think your effort to unify the XTM 1 and TMDM models is too ambitious (although I don't know how you intend to achieve this goal, because you haven't described your plan). It seems to me though that you should instead have two distinct models, and to use a layering approach to unify them. The two models are IMHO not sufficiently compatible to be simply "merged" (as you have started to do). Mainly, though, I just think it's important to fully discuss these kinds of disruptive changes before instituting them, or else they should be done in a branch. The trunk is not the proper place for this kind of speculative work. Regards Con -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-11 04:16:02
|
On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > If I remember correctly, I think > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). Why not just use the equals() method then, Xuan? Isn't that the purpose of equalsByID()? Or is there some difference in semantics between equals() and equalsByID()? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-03-11 08:58:37
|
Conal Tuohy wrote: > Hi Xuân > > I'm not convinced, actually. > > It seems to me you should have introduced a new, TMDM-flavoured > TopicMapObject interface, I did, see here: http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/TopicMapConstruct.java?revision=1.1&view=markup > rather than changing the old, XTM > 1.0-flavoured interface. > I did not change the old XTM 1.0-flavoured interface except for this one abstraction of equality. > As you yourself pointed out, the getID() method isn't appropriate for > TMDM-based topicmaps, so a distinct interface for TMDM TopicMapObjects > is probably better (rather than an extended one, with "deprecated" > bits). > There is, as org.tm4j.topicmap.tmdm.TopicMapConstruct.getItemIdentifiers() shows. > Personally, I think your effort to unify the XTM 1 and TMDM models is > too ambitious I don't unify into a greater model. > (although I don't know how you intend to achieve this > goal, because you haven't described your plan). It seems to me though > that you should instead have two distinct models, I do. > and to use a layering > approach to unify them. I do. http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/ is the translation layer. > The two models are IMHO not sufficiently > compatible to be simply "merged" I agree. > (as you have started to do). > I do not think so. > Mainly, though, I just think it's important to fully discuss these kinds > of disruptive changes before instituting them, or else they should be > done in a branch. The trunk is not the proper place for this kind of > speculative work. > The equalsByID() thing was done as fix to a specific problem, as far as I remember. It is not breaking any code using the API, and I was not aware that there was any code outside of TM4J actually implementing the API. Thus, it was reasonable to not expect any problems. Additionally, I considered (and still consider) this API "public, but extendable" (although I do not intend any further changes). But even though I do know now that there is external software implementing the API, I still think an approach like it is done with the Linux kernel is appropriate: 1. What is in the kernel gets refactored appropriately by those who do the refactoring. 2. What is outside but implements an inside interface has to be updated by the outside maintainers in order to mirror the kernel development. 3. Development freeze (=stopping changing APIs) over extended periods of time is unacceptable. 4. Who wants to be really conservative and not go the way of being up-to-date (both regarding breaking code and regarding reaping the fruits of the development) may stick with an old version. So, I consider a change in a project imposing a need of a change in an unknown external dependent project of 3 lines not necessarily disruptive. It is just normal that APIs evolve for certain needs (such as speed and reduction of side effects), and that external projects implementing these APIs need to implement new methods in these APIs if they want to mirror the development. Additionally, specific to TM4J, there is low to virtually no activity on the project in the last 12 months (at least when I exclude commits by myself), so even for major (API) changes, there is not really any reply expected. Thus, it is quite unreasonable to ask and wait for no reply. Thus, I think, if there is any progress to be expected at all, the burden of disagreeing (like writing e-mails and debating pros and cons) should be on those being otherwise passive (i.e. by default every change is allowed unless challenged, not by default every change is denied, unless allowed). I'd like to ask you to make your API implementation publicly writable (e.g. import it into the TM4J project under some open source licenses), then bad surprises like a need to adapt to new versions can be avoided. > Regards > > Con > > ciao, Xuân. |
From: Lars H. <he...@se...> - 2008-03-11 09:45:54
|
Hi Xuân, [...] > Additionally, specific to TM4J, there is low to virtually no activity on > the project in the last 12 months (at least when I exclude commits by > myself), so even for major (API) changes, there is not really any reply > expected. Thus, it is quite unreasonable to ask and wait for no reply. Leaving the technical perspective aside, I disagree. Pushing TM4J forward is good, but you should have tried to explain what you want to do with TM4J. I think this is the way how Open Source works. Maybe the new "tmdm" package is the holy grail, but you should communicate with the other developers and the user base. How should others commit something if they do not have a plan? There is almost no documentation and you introduced dependencies to your libs (where is the source?). Ignoring the user base and potentially new TM4J-developers is the wrong way to achieve pretended progress. Best regards, Lars -- http://www.semagia.com |
From: Xuân B. <xua...@ba...> - 2008-03-11 09:37:43
|
Conal Tuohy wrote: > On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > > >> If I remember correctly, I think >> http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). >> > > Why not just use the equals() method then, Xuan? > Personally, I do not care about the name. However, care should be taken because we have multiple notions of equality in topic maps and in TM4J (like: Are two Topic objects, which are merged, equal? Are two Topic objects, which are not yet merged, but should be merged, equal?). > Isn't that the purpose of equalsByID()? > Maybe, maybe not. The purpose of equalsByID() is "If I would take the IDs of the two topic map items involved and compare them, would that comparison yield true?". Whether this definition happens to be equivalent or not to other definitions of equality: I do not know, and this is out of the scope of equalsByID(), and this is because I chose "equalsByID()" as name and not "equals()". > Or is there some difference in semantics between equals() and > equalsByID()? > Depends on the intended semantics of equals() and, as they do not seem to be defined currently (like equals() is not overriden), a difference between equals() and equalsByID() is also not defined. ciao, Xuân. P.S.: If you happen to be in Auckland in the next months, then just drop me a note, then we can meet in person. :-) |
From: Conal T. <con...@vu...> - 2008-03-13 00:54:36
|
On Tue, 2008-03-11 at 22:36 +1300, Xuân Baldauf wrote: > Conal Tuohy wrote: > > On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > > > > > > > If I remember correctly, I think > > > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&amp;view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which > > > should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). > > > > > > > Why not just use the equals() method then, Xuan? > > > Personally, I do not care about the name. However, care should be > taken because we have multiple notions of equality in topic maps and > in TM4J (like: Are two Topic objects, which are merged, equal? Are two > Topic objects, which are not yet merged, but should be merged, > equal?). > > Isn't that the purpose of equalsByID()? > > > Maybe, maybe not. The purpose of equalsByID() is "If I would take the > IDs of the two topic map items involved and compare them, would that > comparison yield true?". What does "equals by ID" mean in the case of TMDM topic maps in which objects may have multiple ids? Would the comparison require that the two topic map items had at least one item identifier in common? > Whether this definition happens to be equivalent or not to other > definitions of equality: I do not know, and this is out of the scope > of equalsByID(), and this is because I chose "equalsByID()" as name > and not "equals()". OK. I think it would be good to actually determine the existing semantics of equals() (in the existing providers), and if this is compatible with the requirements of equalsByID() (or can easily be made so), to use the name "equals" rather than "equalsByID". > > Or is there some difference in semantics between equals() and > > equalsByID()? > > > Depends on the intended semantics of equals() and, as they do not seem > to be defined currently (like equals() is not overriden), a difference > between equals() and equalsByID() is also not defined. > > ciao, > Xuân. > > P.S.: If you happen to be in Auckland in the next months, then just > drop me a note, then we can meet in person. :-) I don't get up there much, but you never know; I might well be. When are you planning to be in Auckland? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-13 02:20:14
|
Hi Xuan I can see that you have defined a new set of interfaces for the new data model. The point I'm trying to make is that I think you should have done that WITHOUT ALSO making (any) change to the existing interfaces for the old data model. You say you've only made this one change, and it's true it's not in itself a big change. I would be quite happy for this method to exist (though I'd prefer it to be called equals() as I said earlier), but more importantly, the fact that you added the method at all is indicative of a design approach you have taken which I think is mistaken, and which has actually caused me some problems. You agreed with me that the data models are incompatible, yet your change to the TopicMapObject interface is precisely for the purpose of establishing (limited) compatibility. This still doesn't make sense to me: it seems to me that if the 2 models are really incompatible, then there is no need at all to change the old interface. So I think there's a contradiction in what you have said. It appears to me that the change to the interface was to make it convenient to use common code to process both XTM 1 and XTM 2. Specifically, I'm referring to the revisions to org.tm4j.topicmap.utils in which you introduced some XTM 2 support: http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20 http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73 These classes used to parse XTM 1 and build an XTM1-flavoured topic map graph. If I understand it correctly, it still builds a topic map conforming to the old model (i.e. not the TMDM), but you have changed it so that it will parse documents which are either: 1) XTM 1; 2) a subset of (?) XTM 2; or 3) some kind of hybrid of XTM 1 and 2; Dealing with those in order: 1) The XTM 1 support is currently broken in HEAD because of a change to the handling of xtm:member/xtm:topicRef (there's even a comment in the code to this effect, saying that it causes problems but "is correct" - something I don't understand at all). 2) I don't understand how you are planning to finish this; how will you handle the other discrepancies between the 2 data model? How would you handle typed names for example? 3) I don't think is desirable at all. IMHO the parser/builder would be much improved by adding validation to ensure that the markup conformed to the correct schema. Allowing for a mixture of the 2 is a step away from that. In any case I don't think it's a good idea to mix up the 2 markup languages (XTM 1 and 2) in the same parser. In my opinion you should create a distinct XTM2Parser, and it should use an XTM2Builder to build a TMDM graph. At the moment we have a parser/builder which does not fully handle either XTM 1 or 2, which is regrettable. > Additionally, I considered (and still consider) this API "public, but > extendable" (although I do not intend any further changes). Well - my perspective is different. My view is that the old interface is adequate for XTM 1 and should not in general be changed (certainly not without consensus). Just as the XTM 1 standard still exists and remains stable these last several years, so the interface should remain stable as well. Naturally XTM 2 is entitled to exist and to formally "supercede" XTM 1, but it does not actually modify XTM 1, and neither, IMHO, should TMDM support in TM4J2 require changes to the XTM1 parts of TM4J. > But even > though I do know now that there is external software implementing the > API, I still think an approach like it is done with the Linux kernel is > appropriate: > > 1. What is in the kernel gets refactored appropriately by those who > do the refactoring. yes > 2. What is outside but implements an inside interface has to be > updated by the outside maintainers in order to mirror the kernel > development. yes > 3. Development freeze (=stopping changing APIs) over extended periods > of time is unacceptable. Here I disagree. I believe that certain APIs should indeed be frozen over extended periods. These are the APIs that correspond closely to e.g. the standard markup languages and data models. At the very least, when these public APIs are changed they MUST be announced. > 4. Who wants to be really conservative and not go the way of being > up-to-date (both regarding breaking code and regarding reaping the > fruits of the development) may stick with an old version. I believe that being conservative has its place. I am not at all opposed to adding new APIs, but changing existing APIs should not to be done lightly IMHO. There is nothing stopping you from adding new APIs, and adding XTM 2 support - the only problem is that you have disrupted the XTM 1 support in the process. I am currently working from an old revision becaue of bugs in XTM 1 support in the current version. This is why agreement of the other committers to adding XTM 2 support was conditional on that change being made in a branch, and kept separate from the XTM 1-related code. Particularly for people who are running TM4J in production, it's important to be able to have the ability to fix bugs, while otherwise retaining stability. Our project at NZETC is heavily reliant on XTM 1, with little or no prospect of adopting XTM 2. For us, XTM 2 development is significantly less important (at least in the short term) than retaining stability of the XTM 1-compatible code base. So this is why I believe we must now create a branch (based on a revision prior to the introduction of XTM 2 code). We can call the branch "TM4J_1" and we can keep all XTM 2 features out of it. This branch can then be developed until the release of TM4J 1.0. In the meantime, Xuan, you can continue your work on XTM 2 in the trunk. > Additionally, specific to TM4J, there is low to virtually no activity on > the project in the last 12 months (at least when I exclude commits by > myself), so even for major (API) changes, there is not really any reply > expected. Thus, it is quite unreasonable to ask and wait for no reply. > Thus, I think, if there is any progress to be expected at all, the > burden of disagreeing (like writing e-mails and debating pros and cons) > should be on those being otherwise passive (i.e. by default every change > is allowed unless challenged, not by default every change is denied, > unless allowed). I disagree strongly with this. Disruptive changes to the code should be discussed in public, or at least proposed publicly, so that they CAN be discussed. I think there are great advantages to following such a collaborative process. Anyway ... consider your revision "challenged". I am not just going to revert your changes but I do expect that we attempt to reach to a consensus now. I want to come to an arrangement in which the TM4J_1 branch is established purely for XTM 1.0, and the trunk is used for developing XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I explained earlier, my primary interest is in XTM 1.0 (because of my other software which produces XTM 1), and hence I don't think I'll be able to do much on the XTM 2 branch, but all the same I still have an opinion on how it should be done, and I don't want to be left out of the loop :-) > I'd like to ask you to make your API implementation publicly writable > (e.g. import it into the TM4J project under some open source licenses), > then bad surprises like a need to adapt to new versions can be avoided. Yeah - I will be contributing the sqlprovider code when it is substantially more complete. At present it is still missing quite a few bits - a number of TopicMapUtils methods, and most of the indexing interface. It's probably still a couple of months off, at least. -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-04-05 12:08:27
|
Conal Tuohy wrote: > Hi Xuan > > I can see that you have defined a new set of interfaces for the new > data model. The point I'm trying to make is that I think you should > have done that WITHOUT ALSO making (any) change to the existing > interfaces for the old data model. > > You say you've only made this one change, and it's true it's not in > itself a big change. I would be quite happy for this method to exist > (though I'd prefer it to be called equals() as I said earlier), but > more importantly, the fact that you added the method at all is > indicative of a design approach you have taken which I think is > mistaken, and which has actually caused me some problems. > > You agreed with me that the data models are incompatible, yet your > change to the TopicMapObject interface is precisely for the purpose of > establishing (limited) compatibility. This still doesn't make sense to > me: it seems to me that if the 2 models are really incompatible, then > there is no need at all to change the old interface. So I think > there's a contradiction in what you have said. > > It appears to me that the change to the interface was to make it > convenient to use common code to process both XTM 1 and XTM 2. If you mean the equalsByID() change, then, no, this change was independent of the XTM 2 reading support. > Specifically, I'm referring to the revisions to > org.tm4j.topicmap.utils in which you introduced some XTM 2 support: > > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20 > <http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20> > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73 > <http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73> > > These classes used to parse XTM 1 and build an XTM1-flavoured topic > map graph. > > If I understand it correctly, it still builds a topic map conforming > to the old model (i.e. not the TMDM), but you have changed it so that > it will parse documents which are either: > > 1) XTM 1; > 2) a subset of (?) XTM 2; or > 3) some kind of hybrid of XTM 1 and 2; > > Dealing with those in order: > > 1) The XTM 1 support is currently broken in HEAD because of a change > to the handling of xtm:member/xtm:topicRef (there's even a comment in > the code to this effect, saying that it causes problems but "is > correct" - something I don't understand at all). Well, that's a bug. Maybe there is a testcase available? > > 2) I don't understand how you are planning to finish this; how will > you handle the other discrepancies between the 2 data model? How would > you handle typed names for example? > > 3) I don't think is desirable at all. IMHO the parser/builder would be > much improved by adding validation to ensure that the markup conformed > to the correct schema. Allowing for a mixture of the 2 is a step away > from that. > > In any case I don't think it's a good idea to mix up the 2 markup > languages (XTM 1 and 2) in the same parser. In my opinion you should > create a distinct XTM2Parser, and it should use an XTM2Builder to > build a TMDM graph. Yes, I agree one should build a nice XTM2 parser because the current parser actually parses a subset of the union of the XTM1 and XTM2 languages. However, building a separate XTM2 parser is a considerable effort, and the current XTM2 support is a hack (that is, it works for what it was needed, it was written relatively quickly using existing infrastructure, but it also allows more than should be allowed). Allowing more than it should is not really a problem, as there are external XML validators available which can reject everything which is a real superset of either XTM1 or XTM2. Breaking correct code is a problem, though (but one which can be solved). If someone finds time to write a clean XTM2 parser - great. But XTM2 is not that far from XTM1 away, so for supporting typed names, a XTM1+2 parser could peek and poke through the wrapper for this special case. That is not best design, however, given that XTM1 and XTM2 are fixed standards and there is only a small list of differences between them, it is probably considerably more economic to make a current XTM1 parser XTM2 aware than to build a new XTM2 parser from scratch. > At the moment we have a parser/builder which does not fully handle > either XTM 1 or 2, which is regrettable. > >> Additionally, I considered (and still consider) this API "public, but >> extendable" (although I do not intend any further changes). >> > > Well - my perspective is different. My view is that the old interface > is adequate for XTM 1 and should not in general be changed (certainly > not without consensus). Just as the XTM 1 standard still exists and > remains stable these last several years, so the interface should > remain stable as well. Naturally XTM 2 is entitled to exist and to > formally "supercede" XTM 1, but it does not actually modify XTM 1, and > neither, IMHO, should TMDM support in TM4J2 require changes to the > XTM1 parts of TM4J. It should not, but TM4J1 (and the Java language itself, too) are not that modular that TM4J2 support can just be "merged in" like you can "merge in" a topic map. TMDM support in TM4J2 is not intended to be a wholly separate TM engine, it should leverage all the existing TM4J1 applications, else I would have made the TMDM support in TM4J2 being a separate open source project right from the start. So if two systems are to connect, I think it is reasonable to build bridges at both ends (i.e. XTM1 parts as one end and and the TMDM support as the other end) where it makes sense most. > >> But even >> though I do know now that there is external software implementing the >> API, I still think an approach like it is done with the Linux kernel is >> appropriate: >> >> 1. What is in the kernel gets refactored appropriately by those who >> do the refacto ring. >> > yes >> 2. What is outside but implements an inside interface has to be >> updated by the outside maintainers in order to mirror the kernel >> develop ment. >> > yes >> 3. Development freeze (=stopping changing APIs) over extended periods >> of time is unaccept able. >> > Here I disagree. I believe that certain APIs should indeed be frozen > over extended periods. > > These are the APIs that correspond closely to e.g. the standard markup > languages and data models. > > At the very least, when these public APIs are changed they MUST be > announced. I agree. As I said, I had a closed-world assumption (I expected that the only recipients of such an announcement would be the 3 existing backends, and in this case it was easier and faster to just change them). >> 4. Who wants to be really conservative and not go the way of being >> up-to-date (both regarding breaking code and regarding reaping the >> fruits of the development) may stick with an old ver sion. >> > I believe that being conservative has its place. I am not at all > opposed to adding new APIs, but changing existing APIs should not to > be done lightly IMHO. > > There is nothing stopping you from adding new APIs, and adding XTM 2 > support - the only problem is that you have disrupted the XTM 1 > support in the process. I am currently working from an old revision > becaue of bugs in XTM 1 support in the current version. This is why > agreement of the other committers to adding XTM 2 support was > conditional on that change being made in a branch, and kept separate > from the XTM 1-related code. Particularly for people who are running > TM4J in production, it's important to be able to have the ability to > fix bugs, while otherwise retaining stability. Our project at NZETC is > heavily reliant on XTM 1, with little or no prospect of adopting XTM > 2. For us, XTM 2 development is significantly less important (at least > in the short term) than retaining stability of the XTM 1-compatible > code base. > > So this is why I believe we must now create a branch (based on a > revision prior to the introduction of XTM 2 code). We can call the > branch "TM4J_1" and we can keep all XTM 2 features out of it. Now, for what it is worth, I have created such a branch long ago, and I have called it "TM4J_1_x", very similar to what you propose. However, I created this branch deliberately _after_ introducing the XTM2 reading code (which, however, only reads XTM2 files into the XTM1 data model), because I felt it would benefit TM4J1 users by allowing them to remain longer on TM4J1 before switching to TM4J2 while the rest of the world starts emitting XTM2 documents, similar to the .odt support in OpenOffice 1.1.5 (.odt is the file format of OpenOffice 2). Of course, I was not aware of that XTM1 topicRef bug. Do you think if I fix this bug (both in the trunk and in the "TM4J_1_x" branch), then this "TM4J_1_x" branch reflects your intended "TM4J_1" branch enough? If not, please feel free to create another branch before the introduction of XTM2 reading code. > This branch can then be developed until the release of TM4J 1.0. In > the meantime, Xuan, you can continue your work on XTM 2 in the trunk. :-) > >> Additionally, specific to TM4J, there is low to virtually no activity on >> the project in the last 12 months (at least when I exclude commits by >> myself), so even for major (API) changes, there is not really any reply >> expected. Thus, it is quite unreasonable to ask and wait for no reply. >> Thus, I think, if there is any progress to be expected at all, the >> burden of disagreeing (like writing e-mails and debating pros and cons) >> should be on those being otherwise passive (i.e. by default every change >> is allowed unless challenged, not by default every change is denied, >> unless allowed). >> > I disagree strongly with this. Disruptive changes to the code should > be discussed in public, or at least proposed publicly, so that they > CAN be discussed. I think there are great advantages to following such > a collaborative process. > > Anyway ... consider your revision "challenged". I am not just going to > revert your changes but I do expect that we attempt to reach to a > consensus now. > > I want to come to an arrangement in which the TM4J_1 branch is > established purely for XTM 1.0, and the trunk is used for developing > XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I > explained earlier, my primary interest is in XTM 1.0 (because of my > other software which produces XTM 1), and hence I don't think I'll be > able to do much on the XTM 2 branch, but all the same I still have an > opinion on how it should be done, and I don't want to be left out of > the loop :-) I think, with your branching suggestion, we are well at such an agreement (or at least pretty near to it, as I still think that it sometimes makes sense to change the TM4J1-legacy within TM4J2 to integrate with the TMDM backend). What do you think? (It is a little bit a shame that the code is not (and presumably never has been) in the state where all the testcases succeeded. If so, we could switch to test-driven development where nearly every change is okay unless a testcase fails.) > >> I'd like to ask you to make your API implementation publicly writable >> (e.g. import it into the TM4J project under some open source licenses), >> then bad surprises like a need to adapt to new versions can be avoided. >> > > Yeah - I will be contributing the sqlprovider code when it is > substantially more complete. At present it is still missing quite a > few bits - a number of TopicMapUtils methods, and most of the indexing > interface. It's probably still a couple of months off, at least. Maybe you want to share it anyways, even if it is not complete. :-) This is okay in the CVS HEAD (not in releases, though), and it would give other people the opportunity to do some parts of the work needed. > > -- > Conal Tuohy > New Zealand Electronic Text Centre > www.nzetc.org <http://www.nzetc.org> > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Tm4j-developers mailing list > Tm4...@li... > https://lists.sourceforge.net/lists/listinfo/tm4j-developers > |
From: Xuân B. <xua...@ba...> - 2008-04-05 10:59:58
|
Conal Tuohy wrote: > On Tue, 2008-03-11 at 22:36 +1300, Xuân Baldauf wrote: > >> Conal Tuohy wrote: >> >>> On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: >>> >>> >>> >>>> If I remember correctly, I think >>>> http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&amp;view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sens >>>> e (which >>>> should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). >>>> >>>> >>> Why not just use the equals() method then, Xuan? >>> >>> >> Personally, I do not care about the name. However, care should be >> taken because we have multiple notions of equality in topic maps and >> in TM4J (like: Are two Topic objects, which are merged, equal? Are two >> Topic objects, which are not yet merged, but should be merged, >> equal?). >> >>> Isn't that the purpose of equalsByID()? >>> >>> >> Maybe, maybe not. The purpose of equalsByID() is "If I would take the >> IDs of the two topic map items involved and compare them, would that >> comparison yield true?". >> > > What does "equals by ID" mean in the case of TMDM topic maps in which > objects may have multiple ids? Would the comparison require that the two > topic map items had at least one item identifier in common? > I have been digging into this (again). When looking at the call sites where equalsByID() is used apart from being used internally to a backend, there is only http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/TopicMapWalker.java?r1=1.24&r2=1.25 . In this case, the walk() method first walks over all topic objects and then over all association objects. Over all topic objects? No. It does not walk over topic objects which are in "slave" mode when being merged. That needs a little bit more elaboration. In TM4J1, topic objects just exist initially. However, when two topic objects are merged, they still exist independently as two distinct Java objects. However (unless specified explicitly by the user), after merging, the two topic objects are somehow connected internally and behave like one merged topic. However, in one respect they cannot behave like one merged topic, and this is regarding the pointers to them, the pointers are still not equal. So if somebody is encountering both pointers (i.e. by walking over the list of topic objects of a topic map object or by other means), the topic will be walked over twice, which should not happen, because two topics are merged into one. Now, in TM4J1, the "somehow connected internally" between two or more topic objects is implemented by electing one master topic object, all the other ones being slave topic objects. For example, in the memory backend, the slave topic objects have the "m_baseTopic" field set to point to the master topic object, while the master topic object has the "m_mergedTopics" field set to an ArrayList of slave topic objects. The topic objects in the slave mode and topic objects in the master mode (should) behave identical, however, apparently, they differ with respect to the method getID(). Thus, the expression "t.getID().equals(t.getBaseTopic().getID())" (which was scattered all over TM4J1 before I changed this a little bit) evaluates only to "true" if "t" is actually the master mode topic object of a merged topic. (Topic objects which were not merged with any other topic object are in master mode by default.) Because there is only one master mode topic object per merged topic, using this expression, it was tried to ensure that all topics, being merged or not, are walked over only once. What the author of this expression presumably meant, is "is this topic object in master mode?", and thus the author should have refactored all occurrences of this expression to "t.isInMasterMode()" (or "t.isBaseTopic()" when retaining the nomenclature). However, he did not, and refactoring "t.getID().equals(t.getBaseTopic().getID())" into "t.equalsByID(t.getBaseTopic())" was one step into this direction. For the way TMDM topic maps are implemented in TM4J2, there is no notion at all of a master mode or a slave mode a topic object can be in. In the TMDM implementation, topic objects are either always in slave mode (then they are instance of class BasicTopic, see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/basic/ ), or they are always in master mode (then they are currently instance of class MergedTopic, see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/merged/ ). There is a distinction of where you apply the changes (in BasicTopic) and where you see the consequences (in MergedTopic). Why this is so, is a different, well, topic (but I'm happy to explain that). For what is needed here, both BasicTopic and MergedTopic are wrapped by org.tm4j.topicmap.tmdm.tm4j1.TopicImpl to an TM4J1-compatible interface, and the method tm.getTopicsIterator() called by method TopicMapWalker.walk() always yields (in the TMDM wrapping layer case) a list of these TopicImpl objects, where merged topic objects do not appear twice. Thus, "t.equalsByID(t.getBaseTopic())" needs to yield "true" in every case when it is called at this call site, and, indeed, it does, as TopicImpl.getBaseTopic() always returns "this" (see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicImpl.java?revision=1.10&view=markup#l_553 ). > >> Whether this definition happens to be equivalent or not to other >> definitions of equality: I do not know, and this is out of the scope >> of equalsByID(), and this is because I chose "equalsByID()" as name >> and not "equals()". >> > > OK. > > I think it would be good to actually determine the existing semantics of > equals() (in the existing providers), and if this is compatible with the > requirements of equalsByID() (or can easily be made so), to use the name > "equals" rather than "equalsByID". > Having the analysis of above in mind, I'd suggest doing the refactoring into the direction of Topic.isInMasterMode() (or Topic.isBaseTopic() respectively). Then we can get rid of equalsByID() in the interface. (However, we can go even further by defining TopicMap.getMasterModeTopicsIterator(), that is, this method's contract guarantees that no slave topics (i.e. "double topics") are returned. This is what the TMDM wrapper backend can provide (and other backends can provide, too, by using the legacy method TopicMap.getTopicsIterator() and filtering out any slave topic object), while Topic.isInMasterMode() is not meaningful at all for the TMDM wrapper backend.) > >>> Or is there some difference in semantics between equals() and >>> equalsByID()? >>> >>> >> Depends on the intended semantics of equals() and, as they do not seem >> to be defined currently (like equals() is not overriden), a difference >> between equals() and equalsByID() is also not defined. >> >> ciao, >> Xuân. >> >> P.S.: If you happen to be in Auckland in the next months, then just >> drop me a note, then we can meet in person. :-) >> > > I don't get up there much, but you never know; I might well be. When are > you planning to be in Auckland? > From between now and the next 3 month (I'm already there). ciao, Xuân. |
From: Conal T. <con...@vu...> - 2008-04-15 02:45:37
|
On Sat, 2008-04-05 at 23:58 +1300, Xuân Baldauf wrote: > What the author of this expression presumably meant, is "is this topic > object in master mode?", and thus the author should have refactored > all occurrences of this expression to "t.isInMasterMode()" (or > "t.isBaseTopic()" when retaining the nomenclature). However, he did > not, and refactoring "t.getID().equals(t.getBaseTopic().getID())" into > "t.equalsByID(t.getBaseTopic())" was one step into this direction. I understand your analysis, but I don't see why, on the basis of that analysis, you did not just refactor it to "t.isBaseTopic()"? Or did you make this analysis after the earlier refactoring to use "t.equalsByID(t.getBaseTopic())"? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |