From: Conal T. <con...@vu...> - 2008-03-10 05:23:43
|
I have an old topicmap provider implementation which no longer links against recent builds of TM4j because of the addition of the equalsByID method to the TopicMapObject interface. It is certainly an easy method to implement, but I'm concerned because I don't even understand why the method has been added to TopicMapObject. It appears it may have been a "rough idea" that never really developed? In any case, it seems like a needless complication to me, and I am inclined to remove it. First I thought I would ask about it here. The point of the method appears to be to hide the data type of the object's id, presumably so that it could potentially be something other than a String. But this seems pointless to me, since actually these id properties are constrained to be strings (actually, XML names) by the XML Topic Maps specification. There's no way they could be anything other than strings! When I looked at each provider's implementation of the TopicMapObject interface in the TM4j source tree, they were all exactly the same: public boolean equalsByID(TopicMapObject o) { return getID().equals(o.getID()); } That fact in itself is a good indicator that the method doesn't properly belong in the TopicMapProvider interface. If there is no need for different providers to implement the method differently (and indeed the JavaDoc for the interface actually specifies the implementation!), then it belongs in a helper class, or as a protected helper method, but not as a public API. Each of the tm4j topic map providers had an implementation of TopicMapObject which included this method. These methods were used in at most a few places within that provider's package (e.g. org.tm4j.topicmap.hibernate.TopicImpl calls the method as a helper within isOfType(Topic)). In all of those cases the method could just as well have been protected - it didn't need to be part of the public API. Even if it were just a protected method of TopicMapObjectImpl (and not required by the TopicMapObject interface) it would seem hardly justifiable because it is just a single line of code, and used only a few times (not used at all in the unified provider!). Incidentally, I'm not sure if the implementation in the "unified" provider is correct - it's subtly inconsistent with equals(). The only use I could see of the method outside of provider classes was in org.tm4j.topicmap.utils.TopicMapWalker, where it was used just once, and could I think have just as well used equals() or at least getID().equals(o.getID()) So I propose to remove the equalsByID method from the interface, and replace its use in TopicMapWalker with equivalent code comparing id values. Any objections? Con -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-10 05:55:28
|
I have an old topicmap provider implementation which no longer links against recent builds of TM4j because of the addition of the equalsByID method to the TopicMapObject interface. It is certainly an easy method to implement, but I'm concerned because I don't even understand why the method has been added to TopicMapObject. It appears it may have been a "rough idea" that never really developed? In any case, it seems like a needless complication to me, and I am inclined to remove it. First I thought I would ask about it here. The point of the method appears to be to hide the data type of the object's id, presumably so that it could potentially be something other than a String. But this seems pointless to me, since actually these id properties are constrained to be strings (actually, XML names) by the XML Topic Maps specification. There's no way they could be anything other than strings! When I looked at each provider's implementation of the TopicMapObject interface in the TM4j source tree, they were all exactly the same: public boolean equalsByID(TopicMapObject o) { return getID().equals(o.getID()); } That fact in itself is a good indicator that the method doesn't properly belong in the TopicMapProvider interface. If there is no need for different providers to implement the method differently (and indeed the JavaDoc for the interface actually specifies the implementation!), then it belongs in a helper class, or as a protected helper method, but not as a public API. Each of the tm4j topic map providers had an implementation of TopicMapObject which included this method. These methods were used in at most a few places within that provider's package (e.g. org.tm4j.topicmap.hibernate.TopicImpl calls the method as a helper within isOfType(Topic)). In all of those cases the method could just as well have been protected - it didn't need to be part of the public API. Even if it were just a protected method of TopicMapObjectImpl (and not required by the TopicMapObject interface) it would seem hardly justifiable because it is just a single line of code, and used only a few times (not used at all in the unified provider!). Incidentally, I'm not sure if the implementation in the "unified" provider is correct - it's subtly inconsistent with equals(). The only use I could see of the method outside of provider classes was in org.tm4j.topicmap.utils.TopicMapWalker, where it was used just once, and could I think have just as well used equals() or at least getID().equals(o.getID()) So I propose to remove the equalsByID method from the interface, and replace its use in TopicMapWalker with equivalent code comparing id values. Any objections? Con -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-03-11 03:40:42
|
Hi Conal, nice to read you. :-) Conal Tuohy wrote: > I have an old topicmap provider implementation which no longer links > against recent builds of TM4j because of the addition of the > equalsByID method to the TopicMapObject interface. > > It is certainly an easy method to implement, but I'm concerned because > I don't even understand /why/ the method has been added to > TopicMapObject. It appears it may have been a "rough idea" that never > really developed? In any case, it seems like a needless complication > to me, and I am inclined to remove it. First I thought I would ask > about it here. > > The point of the method *appears* to be to hide the data type of the > object's id, presumably so that it could potentially be something > other than a String. But this seems pointless to me, since actually > these id properties are constrained to be strings (actually, XML > names) by the XML Topic Maps specification. There's no way they could > be anything other than strings! > > When I looked at each provider's implementation of the TopicMapObject > interface in the TM4j source tree, they were all exactly the same: > > public boolean equalsByID(TopicMapObject o) { > return getID().equals(o.getID()); > } > > That fact in itself is a good indicator that the method doesn't > properly belong in the TopicMapProvider interface. If there is no need > for different providers to implement the method differently (and > indeed the JavaDoc for the interface actually /specifies/ the > implementation!), then it belongs in a helper class, or as a protected > helper method, but not as a public API. > > Each of the tm4j topic map providers had an implementation of > TopicMapObject which included this method. These methods were used in > at most a few places within that provider's package (e.g. > org.tm4j.topicmap.hibernate.TopicImpl calls the method as a helper > within isOfType(Topic)). In all of those cases the method could just > as well have been protected - it didn't need to be part of the public API. > > Even if it were just a protected method of TopicMapObjectImpl (and not > required by the TopicMapObject interface) it would seem hardly > justifiable because it is just a single line of code, and used only a > few times (not used at all in the unified provider!). > > Incidentally, I'm not sure if the implementation in the "unified" > provider is correct - it's subtly inconsistent with equals(). > > The only use I could see of the method outside of provider classes was > in org.tm4j.topicmap.utils.TopicMapWalker, where it was used just > once, and could I think have just as well used equals() or at least > getID().equals(o.getID()) > > So I propose to remove the equalsByID method from the interface, and > replace its use in TopicMapWalker with equivalent code comparing id > values. > > Any objections? If I remember correctly, I think http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). Thus, equalsByID() avoids all the performance- and other penalties involved with calling getID(). I do not want to make any changes to the TM4J 0.9.x public API (because of exactly the reasons of incompatibilities on the side of the users or implementors of the API). However, because providing compatibility with old users of the old API is the very reason for the existence of the bridge between XML Topic Maps and TMDM, and because there are much more users than implementors for that API, I think that some sacrifice on the part of the implementors of the API (like implementing a new method in 3 lines) is justified in this case. I also did not know that your provider implementation existed. If I did (and I had write-access to your implementation), I would have changed yours, too, as I did with the other implementations I've been aware of. It would also be nice if there was something like an AbstractTopicMapObject which could be mixed into each of the providers implementation such that such a change would only be needed to be done once. However, Java does not support mixins or multiple inheritance, and thus, new methods in public interfaces cannot have a default implementation in every case. However, this should not prevent us from providing a decent implementation, it just means a little bit more pain and more work on each implementation when refactoring becomes necessary, including sudden compile errors when some provider, which should have been covered by refactoring, actually has been left out. So, how do you think now regarding this issue? > > Con ciao, Xuân. :-) |
From: Conal T. <con...@vu...> - 2008-03-11 04:06:21
|
Hi Xuân I'm not convinced, actually. It seems to me you should have introduced a new, TMDM-flavoured TopicMapObject interface, rather than changing the old, XTM 1.0-flavoured interface. As you yourself pointed out, the getID() method isn't appropriate for TMDM-based topicmaps, so a distinct interface for TMDM TopicMapObjects is probably better (rather than an extended one, with "deprecated" bits). Personally, I think your effort to unify the XTM 1 and TMDM models is too ambitious (although I don't know how you intend to achieve this goal, because you haven't described your plan). It seems to me though that you should instead have two distinct models, and to use a layering approach to unify them. The two models are IMHO not sufficiently compatible to be simply "merged" (as you have started to do). Mainly, though, I just think it's important to fully discuss these kinds of disruptive changes before instituting them, or else they should be done in a branch. The trunk is not the proper place for this kind of speculative work. Regards Con -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-11 04:16:03
|
On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > If I remember correctly, I think > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). Why not just use the equals() method then, Xuan? Isn't that the purpose of equalsByID()? Or is there some difference in semantics between equals() and equalsByID()? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Conal T. <con...@vu...> - 2008-03-13 02:20:14
|
Hi Xuan I can see that you have defined a new set of interfaces for the new data model. The point I'm trying to make is that I think you should have done that WITHOUT ALSO making (any) change to the existing interfaces for the old data model. You say you've only made this one change, and it's true it's not in itself a big change. I would be quite happy for this method to exist (though I'd prefer it to be called equals() as I said earlier), but more importantly, the fact that you added the method at all is indicative of a design approach you have taken which I think is mistaken, and which has actually caused me some problems. You agreed with me that the data models are incompatible, yet your change to the TopicMapObject interface is precisely for the purpose of establishing (limited) compatibility. This still doesn't make sense to me: it seems to me that if the 2 models are really incompatible, then there is no need at all to change the old interface. So I think there's a contradiction in what you have said. It appears to me that the change to the interface was to make it convenient to use common code to process both XTM 1 and XTM 2. Specifically, I'm referring to the revisions to org.tm4j.topicmap.utils in which you introduced some XTM 2 support: http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20 http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73 These classes used to parse XTM 1 and build an XTM1-flavoured topic map graph. If I understand it correctly, it still builds a topic map conforming to the old model (i.e. not the TMDM), but you have changed it so that it will parse documents which are either: 1) XTM 1; 2) a subset of (?) XTM 2; or 3) some kind of hybrid of XTM 1 and 2; Dealing with those in order: 1) The XTM 1 support is currently broken in HEAD because of a change to the handling of xtm:member/xtm:topicRef (there's even a comment in the code to this effect, saying that it causes problems but "is correct" - something I don't understand at all). 2) I don't understand how you are planning to finish this; how will you handle the other discrepancies between the 2 data model? How would you handle typed names for example? 3) I don't think is desirable at all. IMHO the parser/builder would be much improved by adding validation to ensure that the markup conformed to the correct schema. Allowing for a mixture of the 2 is a step away from that. In any case I don't think it's a good idea to mix up the 2 markup languages (XTM 1 and 2) in the same parser. In my opinion you should create a distinct XTM2Parser, and it should use an XTM2Builder to build a TMDM graph. At the moment we have a parser/builder which does not fully handle either XTM 1 or 2, which is regrettable. > Additionally, I considered (and still consider) this API "public, but > extendable" (although I do not intend any further changes). Well - my perspective is different. My view is that the old interface is adequate for XTM 1 and should not in general be changed (certainly not without consensus). Just as the XTM 1 standard still exists and remains stable these last several years, so the interface should remain stable as well. Naturally XTM 2 is entitled to exist and to formally "supercede" XTM 1, but it does not actually modify XTM 1, and neither, IMHO, should TMDM support in TM4J2 require changes to the XTM1 parts of TM4J. > But even > though I do know now that there is external software implementing the > API, I still think an approach like it is done with the Linux kernel is > appropriate: > > 1. What is in the kernel gets refactored appropriately by those who > do the refactoring. yes > 2. What is outside but implements an inside interface has to be > updated by the outside maintainers in order to mirror the kernel > development. yes > 3. Development freeze (=stopping changing APIs) over extended periods > of time is unacceptable. Here I disagree. I believe that certain APIs should indeed be frozen over extended periods. These are the APIs that correspond closely to e.g. the standard markup languages and data models. At the very least, when these public APIs are changed they MUST be announced. > 4. Who wants to be really conservative and not go the way of being > up-to-date (both regarding breaking code and regarding reaping the > fruits of the development) may stick with an old version. I believe that being conservative has its place. I am not at all opposed to adding new APIs, but changing existing APIs should not to be done lightly IMHO. There is nothing stopping you from adding new APIs, and adding XTM 2 support - the only problem is that you have disrupted the XTM 1 support in the process. I am currently working from an old revision becaue of bugs in XTM 1 support in the current version. This is why agreement of the other committers to adding XTM 2 support was conditional on that change being made in a branch, and kept separate from the XTM 1-related code. Particularly for people who are running TM4J in production, it's important to be able to have the ability to fix bugs, while otherwise retaining stability. Our project at NZETC is heavily reliant on XTM 1, with little or no prospect of adopting XTM 2. For us, XTM 2 development is significantly less important (at least in the short term) than retaining stability of the XTM 1-compatible code base. So this is why I believe we must now create a branch (based on a revision prior to the introduction of XTM 2 code). We can call the branch "TM4J_1" and we can keep all XTM 2 features out of it. This branch can then be developed until the release of TM4J 1.0. In the meantime, Xuan, you can continue your work on XTM 2 in the trunk. > Additionally, specific to TM4J, there is low to virtually no activity on > the project in the last 12 months (at least when I exclude commits by > myself), so even for major (API) changes, there is not really any reply > expected. Thus, it is quite unreasonable to ask and wait for no reply. > Thus, I think, if there is any progress to be expected at all, the > burden of disagreeing (like writing e-mails and debating pros and cons) > should be on those being otherwise passive (i.e. by default every change > is allowed unless challenged, not by default every change is denied, > unless allowed). I disagree strongly with this. Disruptive changes to the code should be discussed in public, or at least proposed publicly, so that they CAN be discussed. I think there are great advantages to following such a collaborative process. Anyway ... consider your revision "challenged". I am not just going to revert your changes but I do expect that we attempt to reach to a consensus now. I want to come to an arrangement in which the TM4J_1 branch is established purely for XTM 1.0, and the trunk is used for developing XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I explained earlier, my primary interest is in XTM 1.0 (because of my other software which produces XTM 1), and hence I don't think I'll be able to do much on the XTM 2 branch, but all the same I still have an opinion on how it should be done, and I don't want to be left out of the loop :-) > I'd like to ask you to make your API implementation publicly writable > (e.g. import it into the TM4J project under some open source licenses), > then bad surprises like a need to adapt to new versions can be avoided. Yeah - I will be contributing the sqlprovider code when it is substantially more complete. At present it is still missing quite a few bits - a number of TopicMapUtils methods, and most of the indexing interface. It's probably still a couple of months off, at least. -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-03-11 08:58:37
|
Conal Tuohy wrote: > Hi Xuân > > I'm not convinced, actually. > > It seems to me you should have introduced a new, TMDM-flavoured > TopicMapObject interface, I did, see here: http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/TopicMapConstruct.java?revision=1.1&view=markup > rather than changing the old, XTM > 1.0-flavoured interface. > I did not change the old XTM 1.0-flavoured interface except for this one abstraction of equality. > As you yourself pointed out, the getID() method isn't appropriate for > TMDM-based topicmaps, so a distinct interface for TMDM TopicMapObjects > is probably better (rather than an extended one, with "deprecated" > bits). > There is, as org.tm4j.topicmap.tmdm.TopicMapConstruct.getItemIdentifiers() shows. > Personally, I think your effort to unify the XTM 1 and TMDM models is > too ambitious I don't unify into a greater model. > (although I don't know how you intend to achieve this > goal, because you haven't described your plan). It seems to me though > that you should instead have two distinct models, I do. > and to use a layering > approach to unify them. I do. http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/ is the translation layer. > The two models are IMHO not sufficiently > compatible to be simply "merged" I agree. > (as you have started to do). > I do not think so. > Mainly, though, I just think it's important to fully discuss these kinds > of disruptive changes before instituting them, or else they should be > done in a branch. The trunk is not the proper place for this kind of > speculative work. > The equalsByID() thing was done as fix to a specific problem, as far as I remember. It is not breaking any code using the API, and I was not aware that there was any code outside of TM4J actually implementing the API. Thus, it was reasonable to not expect any problems. Additionally, I considered (and still consider) this API "public, but extendable" (although I do not intend any further changes). But even though I do know now that there is external software implementing the API, I still think an approach like it is done with the Linux kernel is appropriate: 1. What is in the kernel gets refactored appropriately by those who do the refactoring. 2. What is outside but implements an inside interface has to be updated by the outside maintainers in order to mirror the kernel development. 3. Development freeze (=stopping changing APIs) over extended periods of time is unacceptable. 4. Who wants to be really conservative and not go the way of being up-to-date (both regarding breaking code and regarding reaping the fruits of the development) may stick with an old version. So, I consider a change in a project imposing a need of a change in an unknown external dependent project of 3 lines not necessarily disruptive. It is just normal that APIs evolve for certain needs (such as speed and reduction of side effects), and that external projects implementing these APIs need to implement new methods in these APIs if they want to mirror the development. Additionally, specific to TM4J, there is low to virtually no activity on the project in the last 12 months (at least when I exclude commits by myself), so even for major (API) changes, there is not really any reply expected. Thus, it is quite unreasonable to ask and wait for no reply. Thus, I think, if there is any progress to be expected at all, the burden of disagreeing (like writing e-mails and debating pros and cons) should be on those being otherwise passive (i.e. by default every change is allowed unless challenged, not by default every change is denied, unless allowed). I'd like to ask you to make your API implementation publicly writable (e.g. import it into the TM4J project under some open source licenses), then bad surprises like a need to adapt to new versions can be avoided. > Regards > > Con > > ciao, Xuân. |
From: Lars H. <he...@se...> - 2008-03-11 09:45:54
|
Hi Xuân, [...] > Additionally, specific to TM4J, there is low to virtually no activity on > the project in the last 12 months (at least when I exclude commits by > myself), so even for major (API) changes, there is not really any reply > expected. Thus, it is quite unreasonable to ask and wait for no reply. Leaving the technical perspective aside, I disagree. Pushing TM4J forward is good, but you should have tried to explain what you want to do with TM4J. I think this is the way how Open Source works. Maybe the new "tmdm" package is the holy grail, but you should communicate with the other developers and the user base. How should others commit something if they do not have a plan? There is almost no documentation and you introduced dependencies to your libs (where is the source?). Ignoring the user base and potentially new TM4J-developers is the wrong way to achieve pretended progress. Best regards, Lars -- http://www.semagia.com |
From: Xuân B. <xua...@ba...> - 2008-03-11 09:37:44
|
Conal Tuohy wrote: > On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > > >> If I remember correctly, I think >> http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). >> > > Why not just use the equals() method then, Xuan? > Personally, I do not care about the name. However, care should be taken because we have multiple notions of equality in topic maps and in TM4J (like: Are two Topic objects, which are merged, equal? Are two Topic objects, which are not yet merged, but should be merged, equal?). > Isn't that the purpose of equalsByID()? > Maybe, maybe not. The purpose of equalsByID() is "If I would take the IDs of the two topic map items involved and compare them, would that comparison yield true?". Whether this definition happens to be equivalent or not to other definitions of equality: I do not know, and this is out of the scope of equalsByID(), and this is because I chose "equalsByID()" as name and not "equals()". > Or is there some difference in semantics between equals() and > equalsByID()? > Depends on the intended semantics of equals() and, as they do not seem to be defined currently (like equals() is not overriden), a difference between equals() and equalsByID() is also not defined. ciao, Xuân. P.S.: If you happen to be in Auckland in the next months, then just drop me a note, then we can meet in person. :-) |
From: Conal T. <con...@vu...> - 2008-03-13 00:54:36
|
On Tue, 2008-03-11 at 22:36 +1300, Xuân Baldauf wrote: > Conal Tuohy wrote: > > On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: > > > > > > > If I remember correctly, I think > > > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&amp;view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sense (which > > > should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). > > > > > > > Why not just use the equals() method then, Xuan? > > > Personally, I do not care about the name. However, care should be > taken because we have multiple notions of equality in topic maps and > in TM4J (like: Are two Topic objects, which are merged, equal? Are two > Topic objects, which are not yet merged, but should be merged, > equal?). > > Isn't that the purpose of equalsByID()? > > > Maybe, maybe not. The purpose of equalsByID() is "If I would take the > IDs of the two topic map items involved and compare them, would that > comparison yield true?". What does "equals by ID" mean in the case of TMDM topic maps in which objects may have multiple ids? Would the comparison require that the two topic map items had at least one item identifier in common? > Whether this definition happens to be equivalent or not to other > definitions of equality: I do not know, and this is out of the scope > of equalsByID(), and this is because I chose "equalsByID()" as name > and not "equals()". OK. I think it would be good to actually determine the existing semantics of equals() (in the existing providers), and if this is compatible with the requirements of equalsByID() (or can easily be made so), to use the name "equals" rather than "equalsByID". > > Or is there some difference in semantics between equals() and > > equalsByID()? > > > Depends on the intended semantics of equals() and, as they do not seem > to be defined currently (like equals() is not overriden), a difference > between equals() and equalsByID() is also not defined. > > ciao, > Xuân. > > P.S.: If you happen to be in Auckland in the next months, then just > drop me a note, then we can meet in person. :-) I don't get up there much, but you never know; I might well be. When are you planning to be in Auckland? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-04-05 10:59:58
|
Conal Tuohy wrote: > On Tue, 2008-03-11 at 22:36 +1300, Xuân Baldauf wrote: > >> Conal Tuohy wrote: >> >>> On Tue, 2008-03-11 at 16:39 +1300, Xuân Baldauf wrote: >>> >>> >>> >>>> If I remember correctly, I think >>>> http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicMapObjectImpl.java?revision=1.5&amp;view=markup#l_126 is the reason for equalsByID(). This provider is a bridge between XML Topic Maps (in the style of TM4J 0.9.x) and TMDM (in the style of TM4J2). TMDM in general, and the TM4J2-implementation of TMDM in particular, does not require a TopicMapObject to have exactly one ID. Consequently, getID() is not defined, strictly speaking. If getID() is called anyways, some compatibility magic auto-creates an ID if no ID cannot be found or somehow inferred. However, this is a kludge (as a read-only method has a state-changing behaviour), and should be avoided, especially if the only purpose of calling getID() is comparing whether two TopicMapObjects are equal in some sens >>>> e (which >>>> should be read-only-process in turn). You do not want to have an OutOfMemoryError just because you called getID(). >>>> >>>> >>> Why not just use the equals() method then, Xuan? >>> >>> >> Personally, I do not care about the name. However, care should be >> taken because we have multiple notions of equality in topic maps and >> in TM4J (like: Are two Topic objects, which are merged, equal? Are two >> Topic objects, which are not yet merged, but should be merged, >> equal?). >> >>> Isn't that the purpose of equalsByID()? >>> >>> >> Maybe, maybe not. The purpose of equalsByID() is "If I would take the >> IDs of the two topic map items involved and compare them, would that >> comparison yield true?". >> > > What does "equals by ID" mean in the case of TMDM topic maps in which > objects may have multiple ids? Would the comparison require that the two > topic map items had at least one item identifier in common? > I have been digging into this (again). When looking at the call sites where equalsByID() is used apart from being used internally to a backend, there is only http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/TopicMapWalker.java?r1=1.24&r2=1.25 . In this case, the walk() method first walks over all topic objects and then over all association objects. Over all topic objects? No. It does not walk over topic objects which are in "slave" mode when being merged. That needs a little bit more elaboration. In TM4J1, topic objects just exist initially. However, when two topic objects are merged, they still exist independently as two distinct Java objects. However (unless specified explicitly by the user), after merging, the two topic objects are somehow connected internally and behave like one merged topic. However, in one respect they cannot behave like one merged topic, and this is regarding the pointers to them, the pointers are still not equal. So if somebody is encountering both pointers (i.e. by walking over the list of topic objects of a topic map object or by other means), the topic will be walked over twice, which should not happen, because two topics are merged into one. Now, in TM4J1, the "somehow connected internally" between two or more topic objects is implemented by electing one master topic object, all the other ones being slave topic objects. For example, in the memory backend, the slave topic objects have the "m_baseTopic" field set to point to the master topic object, while the master topic object has the "m_mergedTopics" field set to an ArrayList of slave topic objects. The topic objects in the slave mode and topic objects in the master mode (should) behave identical, however, apparently, they differ with respect to the method getID(). Thus, the expression "t.getID().equals(t.getBaseTopic().getID())" (which was scattered all over TM4J1 before I changed this a little bit) evaluates only to "true" if "t" is actually the master mode topic object of a merged topic. (Topic objects which were not merged with any other topic object are in master mode by default.) Because there is only one master mode topic object per merged topic, using this expression, it was tried to ensure that all topics, being merged or not, are walked over only once. What the author of this expression presumably meant, is "is this topic object in master mode?", and thus the author should have refactored all occurrences of this expression to "t.isInMasterMode()" (or "t.isBaseTopic()" when retaining the nomenclature). However, he did not, and refactoring "t.getID().equals(t.getBaseTopic().getID())" into "t.equalsByID(t.getBaseTopic())" was one step into this direction. For the way TMDM topic maps are implemented in TM4J2, there is no notion at all of a master mode or a slave mode a topic object can be in. In the TMDM implementation, topic objects are either always in slave mode (then they are instance of class BasicTopic, see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/basic/ ), or they are always in master mode (then they are currently instance of class MergedTopic, see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/merged/ ). There is a distinction of where you apply the changes (in BasicTopic) and where you see the consequences (in MergedTopic). Why this is so, is a different, well, topic (but I'm happy to explain that). For what is needed here, both BasicTopic and MergedTopic are wrapped by org.tm4j.topicmap.tmdm.tm4j1.TopicImpl to an TM4J1-compatible interface, and the method tm.getTopicsIterator() called by method TopicMapWalker.walk() always yields (in the TMDM wrapping layer case) a list of these TopicImpl objects, where merged topic objects do not appear twice. Thus, "t.equalsByID(t.getBaseTopic())" needs to yield "true" in every case when it is called at this call site, and, indeed, it does, as TopicImpl.getBaseTopic() always returns "this" (see http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/tmdm/tm4j1/TopicImpl.java?revision=1.10&view=markup#l_553 ). > >> Whether this definition happens to be equivalent or not to other >> definitions of equality: I do not know, and this is out of the scope >> of equalsByID(), and this is because I chose "equalsByID()" as name >> and not "equals()". >> > > OK. > > I think it would be good to actually determine the existing semantics of > equals() (in the existing providers), and if this is compatible with the > requirements of equalsByID() (or can easily be made so), to use the name > "equals" rather than "equalsByID". > Having the analysis of above in mind, I'd suggest doing the refactoring into the direction of Topic.isInMasterMode() (or Topic.isBaseTopic() respectively). Then we can get rid of equalsByID() in the interface. (However, we can go even further by defining TopicMap.getMasterModeTopicsIterator(), that is, this method's contract guarantees that no slave topics (i.e. "double topics") are returned. This is what the TMDM wrapper backend can provide (and other backends can provide, too, by using the legacy method TopicMap.getTopicsIterator() and filtering out any slave topic object), while Topic.isInMasterMode() is not meaningful at all for the TMDM wrapper backend.) > >>> Or is there some difference in semantics between equals() and >>> equalsByID()? >>> >>> >> Depends on the intended semantics of equals() and, as they do not seem >> to be defined currently (like equals() is not overriden), a difference >> between equals() and equalsByID() is also not defined. >> >> ciao, >> Xuân. >> >> P.S.: If you happen to be in Auckland in the next months, then just >> drop me a note, then we can meet in person. :-) >> > > I don't get up there much, but you never know; I might well be. When are > you planning to be in Auckland? > From between now and the next 3 month (I'm already there). ciao, Xuân. |
From: Conal T. <con...@vu...> - 2008-04-15 02:45:36
|
On Sat, 2008-04-05 at 23:58 +1300, Xuân Baldauf wrote: > What the author of this expression presumably meant, is "is this topic > object in master mode?", and thus the author should have refactored > all occurrences of this expression to "t.isInMasterMode()" (or > "t.isBaseTopic()" when retaining the nomenclature). However, he did > not, and refactoring "t.getID().equals(t.getBaseTopic().getID())" into > "t.equalsByID(t.getBaseTopic())" was one step into this direction. I understand your analysis, but I don't see why, on the basis of that analysis, you did not just refactor it to "t.isBaseTopic()"? Or did you make this analysis after the earlier refactoring to use "t.equalsByID(t.getBaseTopic())"? -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |
From: Xuân B. <xua...@ba...> - 2008-04-05 12:08:27
|
Conal Tuohy wrote: > Hi Xuan > > I can see that you have defined a new set of interfaces for the new > data model. The point I'm trying to make is that I think you should > have done that WITHOUT ALSO making (any) change to the existing > interfaces for the old data model. > > You say you've only made this one change, and it's true it's not in > itself a big change. I would be quite happy for this method to exist > (though I'd prefer it to be called equals() as I said earlier), but > more importantly, the fact that you added the method at all is > indicative of a design approach you have taken which I think is > mistaken, and which has actually caused me some problems. > > You agreed with me that the data models are incompatible, yet your > change to the TopicMapObject interface is precisely for the purpose of > establishing (limited) compatibility. This still doesn't make sense to > me: it seems to me that if the 2 models are really incompatible, then > there is no need at all to change the old interface. So I think > there's a contradiction in what you have said. > > It appears to me that the change to the interface was to make it > convenient to use common code to process both XTM 1 and XTM 2. If you mean the equalsByID() change, then, no, this change was independent of the XTM 2 reading support. > Specifically, I'm referring to the revisions to > org.tm4j.topicmap.utils in which you introduced some XTM 2 support: > > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20 > <http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMParser.java?r1=1.19&r2=1.20> > http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73 > <http://tm4j.cvs.sourceforge.net/tm4j/tm4j/src/org/tm4j/topicmap/utils/XTMBuilder.java?r1=1.72&r2=1.73> > > These classes used to parse XTM 1 and build an XTM1-flavoured topic > map graph. > > If I understand it correctly, it still builds a topic map conforming > to the old model (i.e. not the TMDM), but you have changed it so that > it will parse documents which are either: > > 1) XTM 1; > 2) a subset of (?) XTM 2; or > 3) some kind of hybrid of XTM 1 and 2; > > Dealing with those in order: > > 1) The XTM 1 support is currently broken in HEAD because of a change > to the handling of xtm:member/xtm:topicRef (there's even a comment in > the code to this effect, saying that it causes problems but "is > correct" - something I don't understand at all). Well, that's a bug. Maybe there is a testcase available? > > 2) I don't understand how you are planning to finish this; how will > you handle the other discrepancies between the 2 data model? How would > you handle typed names for example? > > 3) I don't think is desirable at all. IMHO the parser/builder would be > much improved by adding validation to ensure that the markup conformed > to the correct schema. Allowing for a mixture of the 2 is a step away > from that. > > In any case I don't think it's a good idea to mix up the 2 markup > languages (XTM 1 and 2) in the same parser. In my opinion you should > create a distinct XTM2Parser, and it should use an XTM2Builder to > build a TMDM graph. Yes, I agree one should build a nice XTM2 parser because the current parser actually parses a subset of the union of the XTM1 and XTM2 languages. However, building a separate XTM2 parser is a considerable effort, and the current XTM2 support is a hack (that is, it works for what it was needed, it was written relatively quickly using existing infrastructure, but it also allows more than should be allowed). Allowing more than it should is not really a problem, as there are external XML validators available which can reject everything which is a real superset of either XTM1 or XTM2. Breaking correct code is a problem, though (but one which can be solved). If someone finds time to write a clean XTM2 parser - great. But XTM2 is not that far from XTM1 away, so for supporting typed names, a XTM1+2 parser could peek and poke through the wrapper for this special case. That is not best design, however, given that XTM1 and XTM2 are fixed standards and there is only a small list of differences between them, it is probably considerably more economic to make a current XTM1 parser XTM2 aware than to build a new XTM2 parser from scratch. > At the moment we have a parser/builder which does not fully handle > either XTM 1 or 2, which is regrettable. > >> Additionally, I considered (and still consider) this API "public, but >> extendable" (although I do not intend any further changes). >> > > Well - my perspective is different. My view is that the old interface > is adequate for XTM 1 and should not in general be changed (certainly > not without consensus). Just as the XTM 1 standard still exists and > remains stable these last several years, so the interface should > remain stable as well. Naturally XTM 2 is entitled to exist and to > formally "supercede" XTM 1, but it does not actually modify XTM 1, and > neither, IMHO, should TMDM support in TM4J2 require changes to the > XTM1 parts of TM4J. It should not, but TM4J1 (and the Java language itself, too) are not that modular that TM4J2 support can just be "merged in" like you can "merge in" a topic map. TMDM support in TM4J2 is not intended to be a wholly separate TM engine, it should leverage all the existing TM4J1 applications, else I would have made the TMDM support in TM4J2 being a separate open source project right from the start. So if two systems are to connect, I think it is reasonable to build bridges at both ends (i.e. XTM1 parts as one end and and the TMDM support as the other end) where it makes sense most. > >> But even >> though I do know now that there is external software implementing the >> API, I still think an approach like it is done with the Linux kernel is >> appropriate: >> >> 1. What is in the kernel gets refactored appropriately by those who >> do the refacto ring. >> > yes >> 2. What is outside but implements an inside interface has to be >> updated by the outside maintainers in order to mirror the kernel >> develop ment. >> > yes >> 3. Development freeze (=stopping changing APIs) over extended periods >> of time is unaccept able. >> > Here I disagree. I believe that certain APIs should indeed be frozen > over extended periods. > > These are the APIs that correspond closely to e.g. the standard markup > languages and data models. > > At the very least, when these public APIs are changed they MUST be > announced. I agree. As I said, I had a closed-world assumption (I expected that the only recipients of such an announcement would be the 3 existing backends, and in this case it was easier and faster to just change them). >> 4. Who wants to be really conservative and not go the way of being >> up-to-date (both regarding breaking code and regarding reaping the >> fruits of the development) may stick with an old ver sion. >> > I believe that being conservative has its place. I am not at all > opposed to adding new APIs, but changing existing APIs should not to > be done lightly IMHO. > > There is nothing stopping you from adding new APIs, and adding XTM 2 > support - the only problem is that you have disrupted the XTM 1 > support in the process. I am currently working from an old revision > becaue of bugs in XTM 1 support in the current version. This is why > agreement of the other committers to adding XTM 2 support was > conditional on that change being made in a branch, and kept separate > from the XTM 1-related code. Particularly for people who are running > TM4J in production, it's important to be able to have the ability to > fix bugs, while otherwise retaining stability. Our project at NZETC is > heavily reliant on XTM 1, with little or no prospect of adopting XTM > 2. For us, XTM 2 development is significantly less important (at least > in the short term) than retaining stability of the XTM 1-compatible > code base. > > So this is why I believe we must now create a branch (based on a > revision prior to the introduction of XTM 2 code). We can call the > branch "TM4J_1" and we can keep all XTM 2 features out of it. Now, for what it is worth, I have created such a branch long ago, and I have called it "TM4J_1_x", very similar to what you propose. However, I created this branch deliberately _after_ introducing the XTM2 reading code (which, however, only reads XTM2 files into the XTM1 data model), because I felt it would benefit TM4J1 users by allowing them to remain longer on TM4J1 before switching to TM4J2 while the rest of the world starts emitting XTM2 documents, similar to the .odt support in OpenOffice 1.1.5 (.odt is the file format of OpenOffice 2). Of course, I was not aware of that XTM1 topicRef bug. Do you think if I fix this bug (both in the trunk and in the "TM4J_1_x" branch), then this "TM4J_1_x" branch reflects your intended "TM4J_1" branch enough? If not, please feel free to create another branch before the introduction of XTM2 reading code. > This branch can then be developed until the release of TM4J 1.0. In > the meantime, Xuan, you can continue your work on XTM 2 in the trunk. :-) > >> Additionally, specific to TM4J, there is low to virtually no activity on >> the project in the last 12 months (at least when I exclude commits by >> myself), so even for major (API) changes, there is not really any reply >> expected. Thus, it is quite unreasonable to ask and wait for no reply. >> Thus, I think, if there is any progress to be expected at all, the >> burden of disagreeing (like writing e-mails and debating pros and cons) >> should be on those being otherwise passive (i.e. by default every change >> is allowed unless challenged, not by default every change is denied, >> unless allowed). >> > I disagree strongly with this. Disruptive changes to the code should > be discussed in public, or at least proposed publicly, so that they > CAN be discussed. I think there are great advantages to following such > a collaborative process. > > Anyway ... consider your revision "challenged". I am not just going to > revert your changes but I do expect that we attempt to reach to a > consensus now. > > I want to come to an arrangement in which the TM4J_1 branch is > established purely for XTM 1.0, and the trunk is used for developing > XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I > explained earlier, my primary interest is in XTM 1.0 (because of my > other software which produces XTM 1), and hence I don't think I'll be > able to do much on the XTM 2 branch, but all the same I still have an > opinion on how it should be done, and I don't want to be left out of > the loop :-) I think, with your branching suggestion, we are well at such an agreement (or at least pretty near to it, as I still think that it sometimes makes sense to change the TM4J1-legacy within TM4J2 to integrate with the TMDM backend). What do you think? (It is a little bit a shame that the code is not (and presumably never has been) in the state where all the testcases succeeded. If so, we could switch to test-driven development where nearly every change is okay unless a testcase fails.) > >> I'd like to ask you to make your API implementation publicly writable >> (e.g. import it into the TM4J project under some open source licenses), >> then bad surprises like a need to adapt to new versions can be avoided. >> > > Yeah - I will be contributing the sqlprovider code when it is > substantially more complete. At present it is still missing quite a > few bits - a number of TopicMapUtils methods, and most of the indexing > interface. It's probably still a couple of months off, at least. Maybe you want to share it anyways, even if it is not complete. :-) This is okay in the CVS HEAD (not in releases, though), and it would give other people the opportunity to do some parts of the work needed. > > -- > Conal Tuohy > New Zealand Electronic Text Centre > www.nzetc.org <http://www.nzetc.org> > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Tm4j-developers mailing list > Tm4...@li... > https://lists.sourceforge.net/lists/listinfo/tm4j-developers > |
From: Conal T. <con...@vu...> - 2008-04-15 00:17:41
|
Back from TM2008! On Sun, 2008-04-06 at 01:07 +1300, Xuân Baldauf wrote: > Yes, I agree one should build a nice XTM2 parser because the current > parser actually parses a subset of the union of the XTM1 and XTM2 > languages. However, building a separate XTM2 parser is a considerable > effort, and the current XTM2 support is a hack (that is, it works for > what it was needed, it was written relatively quickly using existing > infrastructure, but it also allows more than should be allowed). > Allowing more than it should is not really a problem, as there are > external XML validators available which can reject everything which is > a real superset of either XTM1 or XTM2. I disagree ... I think allowing more than should be allowed IS a problem, which should be fixed. I think the parsers should in fact enforce their schemas, rather than leaving this entirely to a separate validation process. To me, a parser which accepts an invalid input is a *bug*, pure and simple. Therefore I think the "hack" should be rolled back. > If someone finds time to write a clean XTM2 parser - great. But XTM2 > is not that far from XTM1 away, so for supporting typed names, a XTM1 > +2 parser could peek and poke through the wrapper for this special > case. That is not best design, however, given that XTM1 and XTM2 are > fixed standards and there is only a small list of differences between > them, it is probably considerably more economic to make a current XTM1 > parser XTM2 aware than to build a new XTM2 parser from scratch. I disagree with this too. IMHO it would have been easier to copy and paste the XTM1 code and then convert it into XTM2. Then we'd have 2 distinct parsers/builders. > > So this is why I believe we must now create a branch (based on a > > revision prior to the introduction of XTM 2 code). We can call the > > branch "TM4J_1" and we can keep all XTM 2 features out of it. > Now, for what it is worth, I have created such a branch long ago, and > I have called it "TM4J_1_x", very similar to what you propose. > However, I created this branch deliberately _after_ introducing the > XTM2 reading code (which, however, only reads XTM2 files into the XTM1 > data model), because I felt it would benefit TM4J1 users by allowing > them to remain longer on TM4J1 before switching to TM4J2 while the > rest of the world starts emitting XTM2 documents, similar to the .odt > support in OpenOffice 1.1.5 (.odt is the file format of OpenOffice 2). > Of course, I was not aware of that XTM1 topicRef bug. Do you think if > I fix this bug (both in the trunk and in the "TM4J_1_x" branch), then > this "TM4J_1_x" branch reflects your intended "TM4J_1" branch enough? No I don't. As I said earlier, I do not at all like the mixture of XTM1 and 2 in the same parser/builder; I appreciate that you intended it as an aid for XTM1 users to migrate to XTM2, but I have two problems with it: 1) I think that relaxing the strictness of the parser is not an improvement, and in general I don't like the approach of "mixing" the two models. 2) In any case I don't think this would be a good way for an XTM1-based project to read XTM2 topic maps. Better would be to use XSLT to define a transformation to XTM1; this would also allow such a project to define how they want to handle such things as typed names, data types, etc, in XTM1 (either by discarding them or by implementing them in some way on top of the XTM1 model). > If not, please feel free to create another branch before the > introduction of XTM2 reading code. OK then; since I think the TM4J 1 branch should not include ANY XTM2 support, I will go ahead and roll back the TM4J_1_x branch to exclude those changes. > > I want to come to an arrangement in which the TM4J_1 branch is > > established purely for XTM 1.0, and the trunk is used for developing > > XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I > > explained earlier, my primary interest is in XTM 1.0 (because of my > > other software which produces XTM 1), and hence I don't think I'll > > be able to do much on the XTM 2 branch, but all the same I still > > have an opinion on how it should be done, and I don't want to be > > left out of the loop :-) > I think, with your branching suggestion, we are well at such an > agreement (or at least pretty near to it, as I still think that it > sometimes makes sense to change the TM4J1-legacy within TM4J2 to > integrate with the TMDM backend). What do you think? Well ... I'm not sure what you mean by "change the TM4J1-legacy within TM4J2 to integrate with the TMDM backend". I agree that integration between the XTM1 and XTM2 versions could be useful, but I'm strongly opposed to making any changes to the XTM1 legacy which change the behaviour or compliance or interfaces of client code, and for architectural reasons I don't believe the TM4J core should have any dependency on XTM2 or the TMDM. It seems to me it could well be useful to implement a TM4J1 TopicMapProvider which is an adaptor of (or wrapper around) a TMDM provider, and which hides or in some way re-presents the portions of the TMDM which don't exist in XTM1. Perhaps the reverse could also be useful (i.e. a TMDM provider which is an adaptor of a TM4J1 provider). But let me stress that I think it's essential that the integration be done in a strictly layered fashion, which means that the TM4J1 core (including the parser etc) should not include ANY code which has ANYTHING to do with XTM2 or the TMDM, and similarly, the TM4J2 core should also not have any knowledge of the TM4J1 model at all. -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org |