From: Lars H. <he...@se...> - 2008-06-29 19:52:20
|
Hi all, Recently, there were some bug reports which are more related to MIO than to tinyTiM. So, the question is, if we want that MIO becomes the default import mechanism or not. We can either fix and use tmapi-utils or we develop a dedicated importer for tinyTiM. The advantage of MIO is, that it is independent of a particular syntax. A small tinyTiM-related adaptor interprets the events from a MIO deserializer and translates them into tinyTiM's Topic Maps constructs. If someone writes a MIO deserializer for syntax XY it is immediately usable by tinyTiM. The disadvantage of MIO is, that it is not controlled by this project, but it is an independent project controlled by me. ;) The license shouldn't be be a problem, because MIO will be released under a Sleepycat-style license, maybe the API will be published in the future under another license to support a more widely adoption, but the first versions will use the Sleepycat lic. and the first deserializers will use that license also. What do you think? I am open minded, I'd use tinyTiM's MIO adaptor for testing purposes anyway, so I don't care if the adaptor is part of this project or if I keep it as personal project. Sleepycat Lic.: <http://opensource.org/licenses/sleepycat.php> Best regards, Lars -- Semagia <http://www.semagia.com/> |
From: Stefan L. <li...@ap...> - 2008-07-01 10:51:06
|
hi, Lars Heuer wrote: > Hi all, > > Recently, there were some bug reports which are more related to MIO than to > tinyTiM. So, the question is, if we want that MIO becomes the default import > mechanism or not. We can either fix and use tmapi-utils or we develop a > dedicated importer for tinyTiM. i would prefer to "fix" the tmapi-utils parser. Not just fixing it, but using latest StAX parser and api. This design is easier to understand compared to a callback-handler / stackusing parser. the goal of tinyTIM was always to be as simple as possible, thats why so many students that started with topic maps used tinytim and tmapi-utils parser, cause they understood the source code. > What do you think? I am open minded, I'd use tinyTiM's MIO adaptor > for testing purposes anyway, so I don't care if the adaptor is part of > this project or if I keep it as personal project. i think the adapter (not the mio libraries) could stay as a part of the project, for example as tinytim-mio-xtm and tinytim-mio-cxtm, but we need to provide a very very simple way of parsing and writing xtm 1.0/2.0 wihtin the project, with all sources. So maybe we can migrate the tmapi-utils parser to tinytim project thoughts? stefan |
From: Lars H. <he...@se...> - 2008-07-01 13:17:52
|
Hi Stefan, > i would prefer to "fix" the tmapi-utils parser. Not just fixing it, > but using latest StAX parser and api. This design is easier to > understand compared to a callback-handler / stackusing parser. Okay, no problem. Will you fix it? Should be straight forward, since we have a CXTM serializer. Just import every topic map from <http://cxtm-tests.cvs.sourceforge.net/cxtm-tests/cxtm-tests/xtm1/in/> and <http://cxtm-tests.cvs.sourceforge.net/cxtm-tests/cxtm-tests/xtm2/in/> and check if the output is identical to the output in the "baseline" directory. I have a TestCase impl. that does the work more or less automatically, I'll contribute it this week to the CXTM subproject. > i think the adapter (not the mio libraries) could stay as a part of > the project, for example as tinytim-mio-xtm and tinytim-mio-cxtm, > but we need to provide a very very simple way of parsing and writing > xtm 1.0/2.0 wihtin the project, with all sources. Hmm... . The CXTM package has nothing to do with MIO. And we need just one "tinytim-mio" package, since you need only one concrete adapter to "parse" every syntax you can think of. ;) Proposal: * Leave tinytim-cxtm as it is * Leave the MapInputHandler in the tinytim-io package or put it into a new package "tinytim-mio". * Replace TopicMapImporter with something that does not use MIO Reasonable? Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-01 15:53:45
|
Hi Stefan, [StAX deserializer] > OK, i hope to find some time in the nights ahead, i want to commit it to tinyTIM directly no > subpackage, please vote! IMO a subpackage would be better, otherwise the .core would rely on StAX for no reason. [Package reorg.] >> * Leave the MapInputHandler in the tinytim-io package or put it into a >> new package "tinytim-mio". > +1 for renaming to tinytim-mio (btw. can subprojects be renamed?) Yes, IIRC SVN provides a command for renaming. Maybe it's just "move". Not sure. But it is possible. >> * Replace TopicMapImporter with something that does not use MIO > which TopicMapImporter? This one: <http://tinytim.svn.sourceforge.net/viewvc/tinytim/tinytim-io/trunk/src/main/java/org/tinytim/io/TopicMapImporter.java?view=markup> But maybe it should also moved into the .mio package since it automates the search for a suitable MIO deserializer. Okay, then I'll rename ".io" into ".mio" and leave the rest up to you. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-01 17:08:41
|
[...] > Okay, then I'll rename ".io" into ".mio" and leave the rest up to you. Done. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 12:08:38
|
[...] > IMO it would be nice if we'd create a generic API to import topic > maps, independently of the concrete syntax, so I'd favour .io. Maybe the proposal was a bit vague, here possible layouts I'd prefer: a) org.tinytim.io: * Deserializer (interface) * Serializer (interface) org.tinytim.io.xtm * XTMDeserializer * XTMSerializer (or XTM10Serializer and XTM20Serializer) org.tinytim.io.another-syntax * AnotherSyntaxDeserializer * ... b) org.tinytim.io: * Deserializer (interface) * Serializer (interface) * XTMDeserializer * XTMSerializer (or XTM10/XTM20Serializer) * AnotherSyntaxDeserializer Personally, I like (a) more. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 12:21:00
|
Hi Stefan, > a sounds good, but do we need an IOFactory? or let the user simple call > new XTM20Serializer(...) IMO for simplicity Serializer ser = new XTM20Serializer(); is enough. A factory would be an overkill, otherwise we could also use MIO ;) Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 12:53:37
|
Hi Stefan, [...] > next thing i wanna talk about is naming of: > serializer/deserialize (i misstyped it 2 times while writing :-) or > writer/parser I don't care. How about Writer / Reader? Maybe with "TopicMap" as prefix to distinguish it better from the Java io.Reader / .io.Writer: TopicMapReader, TopicMapWriter. > then we have to talk about the interfaces. i'd prefer the tmapi-utils way > TopicMapSystem tmsystem; > TopicMap tm = parser.parse(tmsystem, file) > or mio way > TopicMapSystem tmsystem; > TopicMap tm = tmsystem.createTopicmap(???,???) > tm = parser.parse(tm, file) > while using this i did not know what to insert in ???,??? cause its > overwritten by the parser, so the first way is better in my view. > let the parser call createTopicMap Actually, I'd prefer the latter (maybe that's the reason, why it is handled that way in MIO ;)). If the user wants that the topic map is accessible under the same IRI as the the document IRI, she can use the document IRI to create the topic map. IMO the latter style is nice because it would be possible to add more content to an *existing* topic map. Importing more topic map content into an existing topic map should usually be faster than letting the TMReader create a topic map, merge it with an existing topic map and to delete the topic map the TMReader has created. The latter style allows the user to create a topic map and then TopicMapReader reader = new XTMTopicMapReader(); reader.read(tm, file) reader = new LTMTopicMapReader(); reader.read(tm, file2); After that, the topic map "tm" would contain the content from the XTM source and the LTM source. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 14:14:38
|
[...] >> TopicMapSystem tmsystem; >> TopicMap tm = parser.parse(tmsystem, file) >> or mio way >> TopicMapSystem tmsystem; >> TopicMap tm = tmsystem.createTopicmap(???,???) >> tm = parser.parse(tm, file) [...] > The latter style allows the user to create a topic map and then [...] ... and the latter style wouldn't make the first style impossible. We can create a TopicMapImportUtil (cannot come up with a better name), which takes a TopicMapSystem, a deserializer and a File / Stream and it creates the topic map and imports the content from the source and returns that filled topic map. Not sure if that is necessary, though. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 15:50:37
|
[...] >> but if i don't know the document IRI? just parsing a file from the >> web/p2p network. So i think, the first style is needed for this >> case. > I think the document IRI is overestimated. IMO it would be better if [...] Example: If you send me a file "mymap.ctm" with the following content: tinytim. and I place it into the directory "/home/lars/maps/" and I read it into my engine, I get a topic with the item identifier <file:///home/lars/maps/mymap.ctm#tinytim>. And if I send that file to my friend Donald Duck (assuming that Duckburg has a broadband connection ;)) and he places that file into the directory "/home/donald/", and he reads that file in, he gets a topic with the item identifier <file:///home/donald/mymap.ctm#tinytim>. You see, the local identifiers are never stable. If you want, that I get back exactly the same topic map, you have to tell me your document IRI and I can provide that document IRI to my TopicMapReader. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 16:05:50
|
Hi Stefan, [...] > correct me if i'm wrong, and don't blame me, i did not work with tm > for quite some time, but isn't the document IRI contained in the > file? In XTM 1.0 it may be contained in the file (xml:base), and LTM provides also a #BASEURI directive, but this isn't the case for "modern" Topic Maps syntaxes. The IRI of the document is used or the application has to provide a document IRI. C.f. XTM 2.0: <http://www.isotopicmaps.org/sam/sam-xtm/#d0e361> 4.2. Deserialization [...] The input to the deserialization process is: [...] * An absolute IRI. This is the IRI from which the XTM document was retrieved, known as the document IRI. This IRI shall always be provided, as it is necessary in order to assign the item identifiers of the topic items created during deserialization. If the XTM document was not read from any particular IRI the application is responsible for providing an IRI considered suitable. [...] Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Stefan L. <li...@ap...> - 2008-07-01 14:50:44
|
Hi, >> i would prefer to "fix" the tmapi-utils parser. Not just fixing it, >> but using latest StAX parser and api. This design is easier to >> understand compared to a callback-handler / stackusing parser. > > Okay, no problem. Will you fix it? Should be straight forward, since OK, i hope to find some time in the nights ahead, i want to commit it to tinyTIM directly no subpackage, please vote! > I have a TestCase impl. that does the work more or less automatically, > I'll contribute it this week to the CXTM subproject. take your time, first i will redesign it and then we will test it with your nice stuff. >> i think the adapter (not the mio libraries) could stay as a part of >> the project, for example as tinytim-mio-xtm and tinytim-mio-cxtm, >> but we need to provide a very very simple way of parsing and writing >> xtm 1.0/2.0 wihtin the project, with all sources. > > Hmm... . The CXTM package has nothing to do with MIO. And we need just > one "tinytim-mio" package, since you need only one concrete adapter to > "parse" every syntax you can think of. ;) > > Proposal: > * Leave tinytim-cxtm as it is right +1, no dependencies to mio > * Leave the MapInputHandler in the tinytim-io package or put it into a > new package "tinytim-mio". +1 for renaming to tinytim-mio (btw. can subprojects be renamed?) > * Replace TopicMapImporter with something that does not use MIO which TopicMapImporter? |
From: Stefan L. <li...@ap...> - 2008-07-02 07:35:18
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lars Heuer wrote: > Hi Stefan, > > [StAX deserializer] >> OK, i hope to find some time in the nights ahead, i want to commit it to tinyTIM directly no >> subpackage, please vote! > > IMO a subpackage would be better, otherwise the .core would rely on > StAX for no reason. convinced, but we need to find a name, tinytim-io or tinytim-xtm i would prefer tinytim-xtm what you think? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIay56bsixtqnWg1oRAp8nAJ47Udqawy1HLIC0dJFZED6C5fDPjQCeMqEM u2Xoh99KUuwf3e0PO6111oA= =j2m8 -----END PGP SIGNATURE----- |
From: Lars H. <he...@se...> - 2008-07-02 11:59:38
|
Hi Stefan, [...] > convinced, but we need to find a name, tinytim-io or tinytim-xtm > i would prefer tinytim-xtm > what you think? IMO it would be nice if we'd create a generic API to import topic maps, independently of the concrete syntax, so I'd favour .io. Within the .io package we can provide a XTMDeserializer, LTMDeserializer, WhateverDeserializer which would implement a generic Deserializer interface. The Deserializer interface may be very primitive, just a "parse" method which takes a Stream / Reader / File / URL. Not sure if we want to provide other syntaxes != XTM, but we'd be forward-compatible. ;) If you implement the XTMDeserializer, you may want to use tinyTiM's "native" API and not TMAPI, because the the native API provides faster lookups for Topics than the TMAPI TopicsIndex. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Stefan L. <li...@ap...> - 2008-07-02 12:13:32
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 a sounds good, but do we need an IOFactory? or let the user simple call new XTM20Serializer(...) ? Lars Heuer wrote: > [...] >> IMO it would be nice if we'd create a generic API to import topic >> maps, independently of the concrete syntax, so I'd favour .io. > > Maybe the proposal was a bit vague, here possible layouts I'd prefer: > > a) > org.tinytim.io: > * Deserializer (interface) > * Serializer (interface) > > org.tinytim.io.xtm > * XTMDeserializer > * XTMSerializer (or XTM10Serializer and XTM20Serializer) > > org.tinytim.io.another-syntax > * AnotherSyntaxDeserializer > * ... > > b) > org.tinytim.io: > * Deserializer (interface) > * Serializer (interface) > * XTMDeserializer > * XTMSerializer (or XTM10/XTM20Serializer) > * AnotherSyntaxDeserializer > > > Personally, I like (a) more. > > > Best regards, > Lars -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIa2+wbsixtqnWg1oRAoDQAKCP/FNkJHWpNZL6eeso6bU5DYTGqgCfb3Sl wOclTeSroHA5rmk2QFBAbOw= =+dQ5 -----END PGP SIGNATURE----- |
From: Stefan L. <li...@ap...> - 2008-07-02 12:35:46
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 +1 for plain constructor next thing i wanna talk about is naming of: serializer/deserialize (i misstyped it 2 times while writing :-) or writer/parser then we have to talk about the interfaces. i'd prefer the tmapi-utils way TopicMapSystem tmsystem; TopicMap tm = parser.parse(tmsystem, file) or mio way TopicMapSystem tmsystem; TopicMap tm = tmsystem.createTopicmap(???,???) tm = parser.parse(tm, file) while using this i did not know what to insert in ???,??? cause its overwritten by the parser, so the first way is better in my view. let the parser call createTopicMap what you thing? Lars Heuer wrote: > Hi Stefan, > >> a sounds good, but do we need an IOFactory? or let the user simple call > >> new XTM20Serializer(...) > > IMO for simplicity > > Serializer ser = new XTM20Serializer(); > > is enough. A factory would be an overkill, otherwise we could also use > MIO ;) > > Best regards, > Lars -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIa3TMbsixtqnWg1oRArnUAJ9nrllwLFNzyYILQxhdYdvdol3iqACbBc7Z WCgrZI1CO57XmRDoOsVdUxs= =vXKQ -----END PGP SIGNATURE----- |
From: Stefan L. <li...@ap...> - 2008-07-02 15:03:58
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hi, > I don't care. How about Writer / Reader? Maybe with "TopicMap" as > prefix to distinguish it better from the Java io.Reader / .io.Writer: > TopicMapReader, TopicMapWriter. +1 great idea to use java io style names >> then we have to talk about the interfaces. i'd prefer the tmapi-utils way [...] > If the user wants that the topic map is accessible under the same IRI > as the the document IRI, she can use the document IRI to create the > topic map. but if i don't know the document IRI? just parsing a file from the web/p2p network. So i think, the first style is needed for this case. > to add more content to an *existing* topic map. Importing more topic > map content into an existing topic map should usually be faster than > letting the TMReader create a topic map, merge it with an existing > topic map and to delete the topic map the TMReader has created. OK i got your point, lets do both read(tmsystem,input) read(tm, input) [...] > After that, the topic map "tm" would contain the content from the XTM > source and the LTM source. but what about clashes? is there a merging? when there is a merge, then the user must clearly see it and invoke it from outside calling tm.merge. and setting the right merge properties first. so of there is a "hidden" merge, i would recommend only use approach a and force the user to call merge later. If someone wants some handy shortcut methods he could write it himself. the core api schould be atomar. stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIa5ejbsixtqnWg1oRAs7/AJ0SYP/FSpSxYTD3Sp6oGOwFB3sIfgCfVJDh aBLlupI9U4eSsODBxckfCtI= =+8fb -----END PGP SIGNATURE----- |
From: Lars H. <he...@se...> - 2008-07-02 15:26:40
|
Hi Stefan, [...] > but if i don't know the document IRI? just parsing a file from the web/p2p network. > So i think, the first style is needed for this case. I think the document IRI is overestimated. IMO it would be better if the user has the possibility to provide an IRI which is used to override the document IRI. The document IRI is "just" used to resolve IRIs against. If the user wants stable IRIs, she should should use (absolute) subject identifiers / locators. If the document IRI is unknown or should be overridden, the user simply sets the IRI which should be used to resolve IRIs against. [...] >> After that, the topic map "tm" would contain the content from the XTM >> source and the LTM source. > but what about clashes? is there a merging? when there is a merge, Yes, of course, there is always merging, even if you read a single topic map. No Topic Maps syntax mandates that the document contains a merged topic map (well, CXTM is an exception, these topic maps are always merged). You have to look up topics all the time. If a topic is read and it is equal to an existing topic, you merge them transparently; no user interaction necessary. And I'd do the same if the user imports serialized topic map into an existing topic map. The user expects that merging is done, she cannot assume that topics do not merge. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Stefan L. <li...@ap...> - 2008-07-02 15:55:56
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 correct me if i'm wrong, and don't blame me, i did not work with tm for quite some time, but isn't the document IRI contained in the file? Lars Heuer wrote: > [...] >>> but if i don't know the document IRI? just parsing a file from the >>> web/p2p network. So i think, the first style is needed for this >>> case. > >> I think the document IRI is overestimated. IMO it would be better if > [...] > > Example: If you send me a file "mymap.ctm" with the following content: > > tinytim. > > and I place it into the directory "/home/lars/maps/" and I read it > into my engine, I get a topic with the item identifier > <file:///home/lars/maps/mymap.ctm#tinytim>. > > And if I send that file to my friend Donald Duck (assuming that > Duckburg has a broadband connection ;)) and he places that file into > the directory "/home/donald/", and he reads that file in, he gets a > topic with the item identifier > <file:///home/donald/mymap.ctm#tinytim>. > > > You see, the local identifiers are never stable. If you want, that I > get back exactly the same topic map, you have to tell me your document > IRI and I can provide that document IRI to my TopicMapReader. > > Best regards, > Lars -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIa6PQbsixtqnWg1oRAh8ZAJ0eum4WKNRnDLcRn6NwH6dPD5HiUACgmF0b EGX2kTm7RcqODelE6/FCkwI= =JYHv -----END PGP SIGNATURE----- |
From: Stefan L. <li...@ap...> - 2008-07-02 16:17:19
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So do we have to provide the parse(tmsystem,input) for xtm 1.0 ? Lars Heuer wrote: > Hi Stefan, > > [...] >> correct me if i'm wrong, and don't blame me, i did not work with tm >> for quite some time, but isn't the document IRI contained in the >> file? > > In XTM 1.0 it may be contained in the file (xml:base), and LTM > provides also a #BASEURI directive, but this isn't the case for > "modern" Topic Maps syntaxes. The IRI of the document is used or the > application has to provide a document IRI. > > C.f. XTM 2.0: <http://www.isotopicmaps.org/sam/sam-xtm/#d0e361> > > 4.2. Deserialization > [...] > > The input to the deserialization process is: > [...] > * An absolute IRI. This is the IRI from which the XTM document was > retrieved, known as the document IRI. This IRI shall always be > provided, as it is necessary in order to assign the item > identifiers of the topic items created during deserialization. If > the XTM document was not read from any particular IRI the > application is responsible for providing an IRI considered > suitable. > [...] > > > Best regards, > Lars -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIa6jUbsixtqnWg1oRAoopAJ4p4lgMWYZ+1c4QpKOpNLlu8/2+uACfUJRC s4Uhf6w6MpY1Xn5bihHQ958= =+kbr -----END PGP SIGNATURE----- |
From: Lars H. <he...@se...> - 2008-07-02 16:32:44
|
Hi Stefan, > So do we have to provide the parse(tmsystem,input) for xtm 1.0 ? No, not necessarily; parsing XTM 1.0 not very different from XTM 2.0, first, the document IRI is used. If you find a xml:base in the XTM 1.0 file, that xml:base is used. Note, that a XTM 1.0 file may contain any number of xml:base attributes. You have to take care that you use the *current* xml:base. So, it would be pretty okay if the user wants to override the (initial) document IRI, but it may not be used for all Topic Maps constructs, because the xml:base overrides the document IRI. Yes, that sucks, and this was one of the reasons why xml:base is not used in XTM 2.0. :) Example: If I have a topic map "mymap.xtm" and I place it into the directory /home/lars/maps, the document IRI would be <file:///home/lars/maps/mymap.xtm> but it would be okay, if I dictate that the document IRI <http://www.semagia.com/maps/mymap.xtm> should be used. But even if I dictate that, I cannot assume that every local id is resolved against that IRI, because xml:base may override my document IRI with i.e. <http://tinytim.sf.net/>, so some topics may use <http://tinytim.sf.net/> as base IRI. Complicated stuff. :) But XTM 2.0 is easier to read and to serialize, maybe you want to start with XTM 2.0 and postpone XTM 1.0. Best regards, Lars -- Semagia <http://www.semagia.com> |
From: Lars H. <he...@se...> - 2008-07-02 16:41:42
|
[...] >> So do we have to provide the parse(tmsystem,input) for xtm 1.0 ? > No, not necessarily; parsing XTM 1.0 not very different from XTM 2.0, [...] Maybe I should send my mails firstly to myself so I can cut them down to one sentence and *then* send them to the ml. ;) To cut the long story short: It's always okay to let the user override the document IRI regardless of the concrete Topic Maps syntax, because it is such a fragile thing that is never stable and cannot be used to get back stable topic identities. In XTM 1.0 / LTM 1.x the document IRI may be overridden, though. Best regards, Lars -- Semagia <http://www.semagia.com> |