Re: [agtk-devel] topological sorting of anchors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Steven,

Steven Bird wrote:
 > Matthias Thomae <th...@ei...> wrote:
 >
 >>So apart from the problem above, wouldn't the topological sorting of the
 >>anchors be a nice feature for the AG library?
 >
 > If some application needs this we should add it to the library.

I am currently developing a tool that would benefit from this :)
The tool is intended for speech recognition and understanding purposes.
I assume that all annotations are based on the word sequence, so that
there are no other anchors than the ones that mark word boundaries. I
have one AG per utterance. All AGs are linear and connected with respect
to the word sequence.

Since I do not necessarily need the word but only the utterance
segmentation, only the first and last anchors of each AG are guaranteed
to have offsets.

I am now circumventing the problem by assigning offsets to the
within-utterance anchors as well, so that the temporal sorting is
possible, by I am not very happy with this. Maybe at some point in the
future I will want a word segmentation for some part of a corpus, then I
will have the problem of discriminating 'real' and 'dummy' offsets.

So the AGs I am using are connected and linear with respect to a certain
annotation type, as stated above, so the topsort of this type of AG
would be sufficent for my needs. But I understand that you would like to
implement the sorting on a more general basis.

 > Note that there are some issues that would need to be addressed:
 >
 > Disconnected AGs: In general, AGs aren't required to be connected, and
 > so there is no guarantee you can traverse an AG from some unique
 > earliest node (and the earliest node may not be unique).  What allows
 > us to sort the disjoint sections of an AG relative to each other are
 > the offsets (when they exist), but these offsets would be ignored in a
 > purely topological sort.

Why not combine topological and temporal sorting? Is it possible to have
an AG whose topological and temporal sortings contradict each other?
(e.g. consider (anchor1, offset 2 sec. -> anchor2, offset 1 sec.))
If this is not allowed anyway, I could imagine somthing like this:

As a first step, a topsort (maybe restricted to some type) could
indentify the ambiguities:
- disjoint sections of an AG
- sets of anchors of the same topological 'grade', i.e. where the
relative ordering cannot be determined from the topology alone

In a second step, a temporal sort could resolve (some of) the
ambiguities left from the topsort and at the same time check if any
violations (see example above) exist. If there are still ambiguities
left these may be made aware to the user of the AG library.

By the way, wouldn't it be possible to sort an AGSet on a temporal basis
just like the disjoint sections of an AG?

 > Ambiguity: In many AGs, such as the ones derived from trees using the
 > chart construction, there is a total ordering on the anchors thanks to
 > the total ordering of the word string on which the tree is based.  In
 > general though, there's no guaranteed unique ordering among anchors
 > (e.g. consider the graph {(1,2), (1,3), (2,4), (3,4)} where the
 > relative ordering of 2 and 3 can't be read off the graph).  Would the
 > mapping from partial to total order be arbitrary, or would a canonical
 > order be defined somehow?

If I again assume that topological, temporal and canonical order do not
contradict each other, two possibilities are imaginable:
- the canonical order is defined before the top/temporal sort
- the AGLIB user takes care of the canonical sort himself

 > Types: In general, we'd like our tools to be able to add new layers of
 > annotation to existing annotations; tools shouldn't assume that they
 > know about all of the annotations in the AG.  The simplest way to
 > handle this is to use an explicit type argument on calls to all of the
 > getX functions.  To support this style of application programming,
 > we'd need a version of the topological sort on anchors that only
 > considered anchors incident to an arc of type t.

I agree. So the getXSorted function might return only a subset of all
anchors in an AG?

 > At some point in the future we may decide that particular kinds of AG
 > need special treatment (e.g. connected AGs, totally anchored AGs,
 > linear AGs).  There's two approaches available: enriching the class
 > structure or creating new high-level APIs.  We could subclass our AG
 > class and add extra methods (e.g. a getAnchorSeq method which did a
 > topological sort on ConnectedAGs and a temporal sort on TotalAGs.)
 > Some existing functions can be written more efficiently when we can
 > make safe assumptions about the topology of the AG.  The other
 > approach is to build a new API on top of the AG API which exposes a
 > subset of the AG functions and adds some new ones which can be
 > implemented using the existing API (cf our treebank API).

I'd tend to the first approach, only because I think the topological and
temporal sort is a basic feature that should be integrated closely with
the library.

 > Anyway, to sum up, before adding new sort functions we should first
 > analyze the stated needs carefully...

I hope I could contribute to clarifying some needs :)

Regards.
Matthias