From: Mark L. <my...@un...> - 2002-08-28 13:36:15
|
Dear all, It might be appropriate to use the 'sort' command and 'qsort' function as models. One often winds up sorting things according to a primary criterion that does not impose a total order. In that case, one can define secondary (and tertiary, and ...) sort criteria, or one can just accept whatever comes back, recognizing that the resulting behavior is partly undefined (though some particular order always emerges). The 'sort' command offers the possibility to define different primary, secondary, etc. sort criteria, within a class of possibilities (i.e. numerical, lexicographical, reverse, choice of field within line, etc.). The qsort() function requires the user to supply a comparison function. In both cases, the result is always a particular ordering, but it is explicitly allowed that the order might be undefined in some (or even all!) cases. As long as the relation in question (e.g. the topological order of anchors, or the temporal order of anchors) is a partial order, this seems to be OK. Such sorts will not always be useful things to do, of course, but that is up to users/programmers to figure out. There is a semantic issue lurking here, I guess, which is that nodes without time marks may implicit inherit time region constraints from other nodes that they are connected to; and therefore may have an implicit ordering with respect to other nodes (with or without time marks) that they are not connected to. Thus in A.1------>A.2----->A.3 B.1----->B.2----->B.3 0.1 0.3 1.6 1.9 we can prove that B.2 must be later than A.2, but neither a topological sort nor a temporal sort (of the explicit kind at least) tells us this directly. The easy thing to do with this problem is to ignore it; this just means that the semantics of such sorting has to be clearly understood, and recognized *not* to be equivalent to logically-deducible time order relations in all cases. I think this would be the Unix Way, since it results in a maximally simple treatment that is often but not always the Right Thing. It would also not be terribly hard to associate time ranges with nodes that lack time marks but are connected to nodes that do. Perhaps something like this is already done to ensure that new time marks are consistent with old ones? Regards, Mark Liberman >Steven, > >Steven Bird wrote: > > Matthias Thomae <th...@ei...> wrote: > > > >>So apart from the problem above, wouldn't the topological sorting of the > >>anchors be a nice feature for the AG library? > > > > If some application needs this we should add it to the library. > >I am currently developing a tool that would benefit from this :) >The tool is intended for speech recognition and understanding purposes. >I assume that all annotations are based on the word sequence, so that >there are no other anchors than the ones that mark word boundaries. I >have one AG per utterance. All AGs are linear and connected with respect >to the word sequence. > >Since I do not necessarily need the word but only the utterance >segmentation, only the first and last anchors of each AG are guaranteed >to have offsets. > >I am now circumventing the problem by assigning offsets to the >within-utterance anchors as well, so that the temporal sorting is >possible, by I am not very happy with this. Maybe at some point in the >future I will want a word segmentation for some part of a corpus, then I >will have the problem of discriminating 'real' and 'dummy' offsets. > >So the AGs I am using are connected and linear with respect to a certain >annotation type, as stated above, so the topsort of this type of AG >would be sufficent for my needs. But I understand that you would like to >implement the sorting on a more general basis. > > > Note that there are some issues that would need to be addressed: > > > > Disconnected AGs: In general, AGs aren't required to be connected, and > > so there is no guarantee you can traverse an AG from some unique > > earliest node (and the earliest node may not be unique). What allows > > us to sort the disjoint sections of an AG relative to each other are > > the offsets (when they exist), but these offsets would be ignored in a > > purely topological sort. > >Why not combine topological and temporal sorting? Is it possible to have >an AG whose topological and temporal sortings contradict each other? >(e.g. consider (anchor1, offset 2 sec. -> anchor2, offset 1 sec.)) >If this is not allowed anyway, I could imagine somthing like this: > >As a first step, a topsort (maybe restricted to some type) could >indentify the ambiguities: >- disjoint sections of an AG >- sets of anchors of the same topological 'grade', i.e. where the >relative ordering cannot be determined from the topology alone > >In a second step, a temporal sort could resolve (some of) the >ambiguities left from the topsort and at the same time check if any >violations (see example above) exist. If there are still ambiguities >left these may be made aware to the user of the AG library. > >By the way, wouldn't it be possible to sort an AGSet on a temporal basis >just like the disjoint sections of an AG? > > > Ambiguity: In many AGs, such as the ones derived from trees using the > > chart construction, there is a total ordering on the anchors thanks to > > the total ordering of the word string on which the tree is based. In > > general though, there's no guaranteed unique ordering among anchors > > (e.g. consider the graph {(1,2), (1,3), (2,4), (3,4)} where the > > relative ordering of 2 and 3 can't be read off the graph). Would the > > mapping from partial to total order be arbitrary, or would a canonical > > order be defined somehow? > >If I again assume that topological, temporal and canonical order do not >contradict each other, two possibilities are imaginable: >- the canonical order is defined before the top/temporal sort >- the AGLIB user takes care of the canonical sort himself > > > Types: In general, we'd like our tools to be able to add new layers of > > annotation to existing annotations; tools shouldn't assume that they > > know about all of the annotations in the AG. The simplest way to > > handle this is to use an explicit type argument on calls to all of the > > getX functions. To support this style of application programming, > > we'd need a version of the topological sort on anchors that only > > considered anchors incident to an arc of type t. > >I agree. So the getXSorted function might return only a subset of all >anchors in an AG? > > > At some point in the future we may decide that particular kinds of AG > > need special treatment (e.g. connected AGs, totally anchored AGs, > > linear AGs). There's two approaches available: enriching the class > > structure or creating new high-level APIs. We could subclass our AG > > class and add extra methods (e.g. a getAnchorSeq method which did a > > topological sort on ConnectedAGs and a temporal sort on TotalAGs.) > > Some existing functions can be written more efficiently when we can > > make safe assumptions about the topology of the AG. The other > > approach is to build a new API on top of the AG API which exposes a > > subset of the AG functions and adds some new ones which can be > > implemented using the existing API (cf our treebank API). > >I'd tend to the first approach, only because I think the topological and >temporal sort is a basic feature that should be integrated closely with >the library. > > > Anyway, to sum up, before adding new sort functions we should first > > analyze the stated needs carefully... > >I hope I could contribute to clarifying some needs :) > >Regards. >Matthias > > > >------------------------------------------------------- >This sf.net email is sponsored by: Jabber - The world's fastest growing >real-time communications platform! Don't just IM. Build it in! >http://www.jabber.com/osdn/xim >_______________________________________________ >agtk-devel mailing list >agt...@li... >https://lists.sourceforge.net/lists/listinfo/agtk-devel -- -Mark Liberman |