[Tm4j-developers] Proposal for new TM4J package [Long]

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

You will probably have seen the recent posting from Florian Haas about work
he has done on implementing the TM4J API on top of a DOM representation of
an XTM document. The approach he has taken is interesting in a number of
respects:

1) The use of XPath queries to provide the topic map indexes
2) The maintenance of an underlying data structure (the DOM) which much more
closely represents the source XTM file being processed.

My feeling is that were this package to be developed, it would provide a
robust foundation for building a topic map editing package that can do some
of the useful editor-type things that the existing in-memory package could
not do. For example, the DOM implementation could be made to preserve
element ordering, and so preserve toppic map elements on a "round-trip"
editing cycle...this is something which would be a big advantage for manual
editing. Also, having an integration with an XPath processor would enable an
editor to construct arbitrary queries quite easily.

Florian and I have discussed how this package relates to the existing
in-memory implementation and I think that this package has strengths that
make it ideally suited for building an editing environment.

As a new project team, we don't really have a "process" for introducing new
packages into the project. Perhaps at some stage in the future we should
formalise the process...however, right now I am interested in hearing what
the views of the other members of this list are. The main question is "Does
this sound like something that should be a part of TM4J ?"...the second
question is "Is it something that you feel you could / would help out with
?"

I look forward to hearing your thoughts. [The email discussion between
Florian and myself is copied at the end of this email].

Cheers,

Kal

-----------------------------------
Kal Ahmed
XML and Topic Maps Consultant
e: ka...@te...
p: +44 (0)7968 529 531
w: www.techquila.com
-----------------------------------

>
> Hi Kal,
>
> | Firstly, thanks for telling me about this development!
>
> No sweat at all. You're very welcome. :-)
>
> | I like the use of the XPath evaluations to provide the indexing.
>
> Well, I do think it's a nice approach, although I'm not too happy with the
> way I've implemented it so far -- it's actually quite quick and dirty. I
> think I could get this to run a lot more elegantly if I evaluated
> the XPath
> expression by some other means than using the static methods in
> the XPathAPI
> class. But for now, it's all "go with what works".
>

Always the best way ;-)

> | I suppose that theoretically, this DOM implementation could be layered
> over any
> | persistent storage mechanism that provides a DOM interface, right ?
>
> Honestly: when I started on this I was merely playing with TM4J and
> something like "implementing this using the DOM would be nice". Just the
> let's-see-if-this-can-be-done type of thing, so now real worries
> if there is
> any real applicability in life. :-) But what you are saying
> certainly makes
> sense.
>

It would be interesting to see if the DOM implementation can play with one
of the XML databases that provides a DOM interface. The Ozone database which
TM4J uses to provide a persistent backend is also used by another project
which layers XML content management on the database - perhaps that might be
an interesting starting point...

> | I aslo thing that if you can work out a way to make application of the
> topic naming constraint and topic
> | merging in general work without modifying the DOM tree [...]
>
> Not quite sure what you mean by "without modifying the DOM tree".
> Say I have
> a topic that's already in the tree, and the tree already represents a
> consistent TM. Now I want to add a new topic, which has the same base name
> in the same scope as that existing topic. Classic case of TNC-based merge:
> REMOVE existing topic, ADD merged topic. That's modifying the
> tree, right? I
> take nodes out and I put nodes in. How should I do this without modifying
> the tree? Please clarify.
>

What you describe is what should "logically" happen to the topic map,
however, an application should be free to "physically" implement that in any
way it sees fit. For example, if topic A merges with topic B, TM4J does not
remove either topic, instead it makes B a "merged topic" of A. All topics
know their list of merged topics and all merged topics know which topic they
are merged with. This means that any topic, when asked for its
characteristics (such as its names or occurrences) can actually return a
collection containing the values of all of the topics it is merged with. So
logically, it looks to an application as if A and B were merged, physically,
they are separate and using the API it is possible to get at A and B and
modify them separately and even to modify them in such a way that they
"unmerge" and go back to being separate topics again.

This kind of functionality would be incredibly useful in an editing
environment where a merge may happen because the user enters a name string
which happens to be used by another topic. And from an editors point of
view, I think it would be nice to have a file round-trip all <topic>
elements regardless of whether they merge when processed or not...

> | [...]  becaues the application would then be able to guaruntee
> to maintain
> the
> | ordering of the XTM elements (something which the
> com.techquila.topicmap.memory
> | package cannot do).
>
> An com.techquila.topicmap.dom can't yet, either. For example,
> currently, if
> I want to add a scope to a base name that already contains a base name
> string, the DOM implementation simply appends the new <scope> AFTER the
> existing <baseNameString>, which makes the whole TM no longer valid XTM.
> Needless to say, this has to get fixed. And it will. :-)
>

We all have to start somewhere! :-)

> | So my gut feeling is that this implementation is most especially useful
> for editing
> | apps - does that sound right to you ?
>
> Sounds good! Again, this is a classic let's-see-if-this-works venture.
> Compare this to a paragraph above; you'll see a pattern emerging here. :-)
> To be frank, this whole thing was like, OK, I'll give it a shot, as far as
> applicability is concerned -- that's where Kal comes in. And you did! :-)
> Kind of a naive approach, I know.
>
> | In principle, I would have nothing at all against including this
> | as part of
> | the TM4J Project if we can make a clear distinction between what the dom
> | package is intended to do and what the memory package is intended to do.
>
> Well, I guess that distinction is easily made. Firstly, I suppose (gut
> feeling) that the DOM implementation trades perhaps enhanced
> flexibility for
> a major overhead. Just take a look at the object hierarchy. I would assume
> that the in-memory implementation is lightning fast and extremely
> lightweight by comparison. Secondly, for the time being, the DOM
> implementation is at best pre-experimental and far from even half-way
> complete. Its worst down side is, I guess, the lack of a parser. Anyway,
> everything is really under major construction, as I'm sure you've noticed.
>
> Now if you want a formal "yes, I'd like to join the development
> effort", you
> have it. But please be advised, as I already wrote, that my time is
> extremely limited in the next three weeks, so I doubt that I'll be making
> any progress during that time. After that, I can get things going again.
>

That would be cool. I would like to copy this message over to the TM4J
Developer's list, and propose it as a new sub-project for TM4J. Is that OK
with you ? (I'm making it sound like we have some formal process...but we
don't I would just like it to be discussed in that forum too).

I'm in favour of adding this in as a foundation for building an editing
environment that I think would provide useful features such as element-level
round-tripping and arbitrary XPath expression evaluation. If the others
agree (there are only two of us currently active developers), then I would
be happy to add you in to the development team and you can then upload your
code to CVS whenever you feel ready.

How does that sound ?

Kal