Re: [Tm4j-developers] TopicMapObject.equalsByID - is it really needed?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Back from TM2008!

On Sun, 2008-04-06 at 01:07 +1300, Xuân Baldauf wrote:

> Yes, I agree one should build a nice XTM2 parser because the current
> parser actually parses a subset of the union of the XTM1 and XTM2
> languages. However, building a separate XTM2 parser is a considerable
> effort, and the current XTM2 support is a hack (that is, it works for
> what it was needed, it was written relatively quickly using existing
> infrastructure, but it also allows more than should be allowed).
> Allowing more than it should is not really a problem, as there are
> external XML validators available which can reject everything which is
> a real superset of either XTM1 or XTM2. 

I disagree ... I think allowing more than should be allowed IS a
problem, which should be fixed. I think the parsers should in fact
enforce their schemas, rather than leaving this entirely to a separate
validation process. To me, a parser which accepts an invalid input is a
*bug*, pure and simple. Therefore I think the "hack" should be rolled
back.

> If someone finds time to write a clean XTM2 parser - great. But XTM2
> is not that far from XTM1 away, so for supporting typed names, a XTM1
> +2 parser could peek and poke through the wrapper for this special
> case. That is not best design, however, given that XTM1 and XTM2 are
> fixed standards and there is only a small list of differences between
> them, it is probably considerably more economic to make a current XTM1
> parser XTM2 aware than to build a new XTM2 parser from scratch.

I disagree with this too. IMHO it would have been easier to copy and
paste the XTM1 code and then convert it into XTM2. Then we'd have 2
distinct parsers/builders.

> > So this is why I believe we must now create a branch (based on a
> > revision prior to the introduction of XTM 2 code). We can call the
> > branch "TM4J_1" and we can keep all XTM 2 features out of it.

> Now, for what it is worth, I have created such a branch long ago, and
> I have called it "TM4J_1_x", very similar to what you propose.
> However, I created this branch deliberately _after_ introducing the
> XTM2 reading code (which, however, only reads XTM2 files into the XTM1
> data model), because I felt it would benefit TM4J1 users by allowing
> them to remain longer on TM4J1 before switching to TM4J2 while the
> rest of the world starts emitting XTM2 documents, similar to the .odt
> support in OpenOffice 1.1.5 (.odt is the file format of OpenOffice 2).
> Of course, I was not aware of that XTM1 topicRef bug. Do you think if
> I fix this bug (both in the trunk and in the "TM4J_1_x" branch), then
> this "TM4J_1_x" branch reflects your intended "TM4J_1" branch enough? 

No I don't. As I said earlier, I do not at all like the mixture of XTM1
and 2 in the same parser/builder; I appreciate that you intended it as
an aid for XTM1 users to migrate to XTM2, but I have two problems with
it:

1) I think that relaxing the strictness of the parser is not an
improvement, and in general I don't like the approach of "mixing" the
two models.

2) In any case I don't think this would be a good way for an XTM1-based
project to read XTM2 topic maps. Better would be to use XSLT to define a
transformation to XTM1; this would also allow such a project to define
how they want to handle such things as typed names, data types, etc, in
XTM1 (either by discarding them or by implementing them in some way on
top of the XTM1 model).

> If not, please feel free to create another branch before the
> introduction of XTM2 reading code.

OK then; since I think the TM4J 1 branch should not include ANY XTM2
support, I will go ahead and roll back the TM4J_1_x branch to exclude
those changes.

> > I want to come to an arrangement in which the TM4J_1 branch is
> > established purely for XTM 1.0, and the trunk is used for developing
> > XTM 2.0 support in a way which does not mix up XTM 1 and XTM 2. As I
> > explained earlier, my primary interest is in XTM 1.0 (because of my
> > other software which produces XTM 1), and hence I don't think I'll
> > be able to do much on the XTM 2 branch, but all the same I still
> > have an opinion on how it should be done, and I don't want to be
> > left out of the loop :-)

> I think, with your branching suggestion, we are well at such an
> agreement (or at least pretty near to it, as I still think that it
> sometimes makes sense to change the TM4J1-legacy within TM4J2 to
> integrate with the TMDM backend). What do you think?

Well ... I'm not sure what you mean by "change the TM4J1-legacy within
TM4J2 to integrate with the TMDM backend". 

I agree that integration between the XTM1 and XTM2 versions could be
useful, but I'm strongly opposed to making any changes to the XTM1
legacy which change the behaviour or compliance or interfaces of client
code, and for architectural reasons I don't believe the TM4J core should
have any dependency on XTM2 or the TMDM.

It seems to me it could well be useful to implement a TM4J1
TopicMapProvider which is an adaptor of (or wrapper around) a TMDM
provider, and which hides or in some way re-presents the portions of the
TMDM which don't exist in XTM1. Perhaps the reverse could also be useful
(i.e. a TMDM provider which is an adaptor of a TM4J1 provider). 

But let me stress that I think it's essential that the integration be
done in a strictly layered fashion, which means that the TM4J1 core
(including the parser etc) should not include ANY code which has
ANYTHING to do with XTM2 or the TMDM, and similarly, the TM4J2 core
should also not have any knowledge of the TM4J1 model at all.

-- 
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org