From: Haejoong L. <hae...@un...> - 2003-04-30 17:35:39
|
Gilles, >It's been a long time... I don't know if you ever received my last e-mail (it dates back five months!); I never got the answer anyway. > Sorry if I didn't respond some of your messages. In the last message dated back to Nov. 27, 2002, I see one thing that is not addressed in this email: It would be interesting to give it a try. As I have not really looked into the aglib source code until now, I'd appreciate if you could point where this filter code should belong, and maybe some programming guidelines for such a module. For this question, please check: http://agtk.sourceforge.net/doc/aglib/2.0/newformat.html This is a file I/O plugin development guideline for aglib 2.0, which we hope to release soon. It's quite stable. We haven't released it because proofreading of the document is not complete yet. I'm thinking about releasing a beta version before we finish proofreading. >But never mind. Just to remind you of me: I have been working with Linguistics people who try to establish a convention for annotating audio-visual interaction. The design was researched by L. Balthasar in his PhD (based on current practice in Pragmatics): it is called STAVIS. As many categories of data must be transcribed, the problem of the representation of these data was posed; Balthasar's thesis presents a typographical convention, suitable for a human reader but rather inadequate for computer processing. After reading S. Bird's articles, I started to try and convince my colleagues of using the AG as the basis for the data structure of the STAVIS convention. > > Hmm... So how would you address the offset problem? Obviously, we can't mix two offset systems in annotation graph: orthographic and chronological systems. >One thing I had proposed in my previous mail is to send you a Perl script I wrote to convert the Trancriber format (conforming to 'trans-13.dtd') into AIF (ag.dtd). I'd be interested if someone could give some feedback on it... > > That's great. Would you send me the script and some sample files please. >The second thing which I write for is to ask confirmation on the assumptions concerning the AG structure. The reason is that I'm trying to process AIF files using the XQuery language, and I'm starting to find some things rather difficult to achieve without assumptions stronger than the structure imposed by the AG DTD. > >I am sorry if the questions seem obvious... > >1. <AGSet> can comprise multiple <AG>; what is the purpose of this possibility (i.e. how did you intend it to be used)? > Well, it's not clear, and different applications utilize the possibility in various ways. In some applications, one AG represents one annotation file. In other applications, an annotation file consists of many sentences and each AG represents one sentence. There is another application where each AG represents one channel (or one speaker) of a wave file. >2. Inside an <AG>, is it assumed that the <Anchor> are all somehow connected by <Annotation> elements (by this I mean that, starting from the first anchor, it must be possible to reach the last one by following a sequence of annotations), or can the <AG> consists in multiple disconnected annotations? As an example (some mandatory elements and attributes, but not neccessary for the sake of the argument, have been omitted), consider: > ><AGSet> > <AG> > <Anchor id="a"/> > <Anchor id="b"/> > <Anchor id="c"/> > <Anchor id="d"/> > <Anchor id="e"/> > <Anchor id="f"/> > <Annotation start="a" end="b"> > <Feature name="txt">how</Feature> > </Annotation> > <Annotation start="c" end="d"> > <Feature name="txt">are</Feature> > </Annotation> > <Annotation start="e" end="f"> > <Feature name="txt">you</Feature> > </Annotation> > </AG> ><AGSet/> > >Is the previous graph acceptable? Although valid by the dtd, you can't deduce that it represents the sentence "how are you" because the annotations are disconnected. Without "offset" attributes there is no ordering. A more adequate representation (i.e. with explicit ordering) would be with the graph: > ><AGSet> > <AG> > <Anchor id="a"/> > <Anchor id="b"/> > <Anchor id="c"/> > <Anchor id="d"/> > <Annotation start="a" end="b"> > <Feature name="txt">how</Feature> > </Annotation> > <Annotation start="b" end="c"> > <Feature name="txt">are</Feature> > </Annotation> > <Annotation start="c" end="d"> > <Feature name="txt">you</Feature> > </Annotation> > </AG> ><AGSet/> > The second one looks better and easy to deal with. However, the first one should be also acceptable as a valid AG. For me, annotation graph is just a data modeling language. How to apply the language fully depends on the implementor's decision. For instance, there could be an alternative for the above example: <AGSet> <AG> <Anchor id="a"/> <Anchor id="b"/> <Anchor id="c"/> <Anchor id="d"/> <Anchor id="e"/> <Anchor id="f"/> <Annotation id="ann1" start="a" end="b"> <Feature name="txt">how</Feature> <Feature name="right_sibling">ann2</Feature> </Annotation> <Annotation start="c" end="d"> <Feature id="ann2" name="txt">are</Feature> <Feature name="right_sibling">ann3</Feature> </Annotation> <Annotation start="e" end="f"> <Feature id="ann3" name="txt">you</Feature> <Feature name="right_sibling"></Feature> </Annotation> </AG> </AGSet> This doesn't look good, but should we prevent people from doing this? I don't know... probably not. >Finally, a pratical question: About two weeks ago it was possible to subscribe to the "agtk-devel" mailing list, but it seems to have disappeared again from the project page... I think it would be the place to discuss such things. > > My understanding is that posting is done by email. The messages are archived at the following URL (it's tricky to find this archive URL though): http://sourceforge.net/mailarchive/forum.php?forum_id=1720 Steven has set the forum not to be public in the project web site. This makes the forum disappeared. There must be a reason for him to do this. Thanks, Haejoong |