[agtk-devel] [gilles@harfang.homelinux.org: AG Structure]

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

----- Forwarded message from gi...@ha... -----

Date: Wed, 30 Apr 2003 01:25:31 +0200
To: hae...@un...
Cc: sde...@ul...
Subject: AG Structure
From: gi...@ha...

Hello Haejoong,

It's been a long time...  I don't know if you ever received my last e-mail (it dates back five months!); I never got the answer anyway.
But never mind.  Just to remind you of me: I have been working with Linguistics people who try to establish a convention for annotating audio-visual interaction. The design was researched by L. Balthasar in his PhD (based on current practice in Pragmatics): it is called STAVIS.  As many categories of data must be transcribed, the problem of the representation of these data was posed; Balthasar's thesis presents a typographical convention, suitable for a human reader but rather inadequate for computer processing.  After reading S. Bird's articles, I started to try and convince my colleagues of using the AG as the basis for the data structure of the STAVIS convention.

One thing I had proposed in my previous mail is to send you a Perl script I wrote to convert the Trancriber format (conforming to 'trans-13.dtd') into AIF (ag.dtd).  I'd be interested if someone could give some feedback on it...

The second thing which I write for is to ask confirmation on the assumptions concerning the AG structure.  The reason is that I'm trying to process AIF files using the XQuery language, and I'm starting to find some things rather difficult to achieve without assumptions stronger than the structure imposed by the AG DTD.

I am sorry if the questions seem obvious...

1. <AGSet> can comprise multiple <AG>; what is the purpose of this possibility (i.e. how did you intend it to be used)?
2. Inside an <AG>, is it assumed that the <Anchor> are all somehow connected by <Annotation> elements (by this I mean that, starting from the first anchor, it must be possible to reach the last one by following a sequence of annotations), or can the <AG> consists in multiple disconnected annotations?  As an example (some mandatory elements and attributes, but not neccessary for the sake of the argument, have been omitted), consider:

<AGSet>
  <AG>
    <Anchor id="a"/>
    <Anchor id="b"/>
    <Anchor id="c"/>
    <Anchor id="d"/>
    <Anchor id="e"/>
    <Anchor id="f"/>
    <Annotation start="a" end="b">
      <Feature name="txt">how</Feature>
    </Annotation>
    <Annotation start="c" end="d">
      <Feature name="txt">are</Feature>
    </Annotation>
    <Annotation start="e" end="f">
      <Feature name="txt">you</Feature>
    </Annotation>
  </AG>
<AGSet/>

Is the previous graph acceptable?  Although valid by the dtd, you can't deduce that it represents the sentence "how are you" because the annotations are disconnected.  Without "offset" attributes there is no ordering.  A more adequate representation (i.e. with explicit ordering) would be with the graph:

<AGSet>
  <AG>
    <Anchor id="a"/>
    <Anchor id="b"/>
    <Anchor id="c"/>
    <Anchor id="d"/>
    <Annotation start="a" end="b">
      <Feature name="txt">how</Feature>
    </Annotation>
    <Annotation start="b" end="c">
      <Feature name="txt">are</Feature>
    </Annotation>
    <Annotation start="c" end="d">
      <Feature name="txt">you</Feature>
    </Annotation>
  </AG>
<AGSet/>

Finally, a pratical question:  About two weeks ago it was possible to subscribe to the "agtk-devel" mailing list, but it seems to have disappeared again from the project page...  I think it would be the place to discuss such things.

Best regards,

Gilles Sadowski

----- End forwarded message -----