You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
|
Feb
(7) |
Mar
(9) |
Apr
(6) |
May
|
Jun
|
Jul
(1) |
Aug
(28) |
Sep
(3) |
Oct
(9) |
Nov
(4) |
Dec
(1) |
2003 |
Jan
(3) |
Feb
|
Mar
(3) |
Apr
(4) |
May
(15) |
Jun
(16) |
Jul
(3) |
Aug
|
Sep
(4) |
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
(2) |
Feb
(2) |
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(5) |
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
(4) |
Mar
(8) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Matthias T. <th...@ei...> - 2002-10-04 08:00:03
|
Hello AGTK developers, I would like to know if the current tree implementation is specially tailored to Treebanks, or if it is generally usable, e.g. for semantic trees as well. I would like to use this for an annotation task of non-overlapping hierarchies of semantic annotations, taking advantage of the insured data integrity when using the elementary tree operations described in the Cotton and Bird paper. My doubts about the genericity of the current implementation arose because there are hard-coded annotation types as "syn", "pos" and "wrd" in the kernel functions (e.g. tree_left() in tree_kernel.cc). Regards. Matthias |
From: Kazuaki M. <ma...@un...> - 2002-10-03 19:33:31
|
Hi Jerome, I thought I used the "keep them on disk" option as the default behavior = in TableTrans. Is it faster to load the same file in wavesurfer (with the "keep them on disk" option on) than in TableTrans? If you could send me= <ma...@ld...> what you have tried, I will look into this. Thank you, -Kazuaki Maeda Linguistic Data Consortium From: Jerome Besnard <jer...@mi...> Subject: [agtk-devel] keep on disk ? Date: 03 Oct 2002 17:29:11 +0200 > Hello, > = > I hope this is the place to ask such questions, if not, I apologize. > I'd like to open huge sound files (up to 500 Mo) and in wavesurfer, I = can > choose to "keep them on disk" and not in memory. I did not find such a= n > option in TableTrans. I am using the windows version, and I would like= to > know of it is possible. > I greped the code, to find different places where that should be possi= ble, > but it does not seem to work (different options in python or tcl files= , > mainly in the wsurf loadfile method or equivalent). Is there a way to = enable > such a feature in TableTrans ? > = > Thank you very much in advance > -- = > Jerome BESNARD - T=E9l : (33) 1 56 43 18 36 > ------------------------------ > Miriad Technologies - 8, avenue Hoche 75008 Paris > T=E9l : (33) 1 56 43 18 00 - Fax : (33) 1 56 43 18 28 > www.miriadtech.com = > = > = > = > = > = > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > agtk-devel mailing list > agt...@li... > https://lists.sourceforge.net/lists/listinfo/agtk-devel > = |
From: Jerome B. <jer...@mi...> - 2002-10-03 15:28:46
|
Hello, I hope this is the place to ask such questions, if not, I apologize. I'd like to open huge sound files (up to 500 Mo) and in wavesurfer, I can choose to "keep them on disk" and not in memory. I did not find such an option in TableTrans. I am using the windows version, and I would like to know of it is possible. I greped the code, to find different places where that should be possible= , but it does not seem to work (different options in python or tcl files, mainly in the wsurf loadfile method or equivalent). Is there a way to ena= ble such a feature in TableTrans ? Thank you very much in advance --=20 Jerome BESNARD - T=E9l : (33) 1 56 43 18 36 ------------------------------ Miriad Technologies - 8, avenue Hoche 75008 Paris T=E9l : (33) 1 56 43 18 00 - Fax : (33) 1 56 43 18 28 www.miriadtech.com=20 |
From: garbe n. <Nic...@et...> - 2002-09-06 13:23:49
|
---------- Message transmis ---------- Subject: transcriber and ag Date: Fri, 6 Sep 2002 15:17:55 -0400 From: garbe nicolas <ga...@et...> To: Xiaoyi Ma <xm...@un...> Cc: Haejoong Lee <hae...@un...>, agt...@li... Hello, I have a probleme with Transcriber.cc If I take a transcriber 's file which begin with <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Trans SYSTEM "trans-13.dtd"> And if try a demo like that #!/usr/bin/wish load ../src/ag_wrapper/tcl/.libs/ag.so set transId [AG::CreateAGSet "Transcriber"] AG::Load Transcriber "know.trs" $transId puts [AG::toXML $transId] AG::Store Transcriber "titi.Transcriber" $transId I have a segmentation fault with Load of transcriber I 'm obliged to delete <!DOCTYPE Trans SYSTEM "trans-13.dtd"> And so when I delete the line of dtd ,for load it's good but for store there is a problem the message error is : Error in startup script: AGExceptionAG Exception caught when calling GetFeatureNames: is not a valid AnnotationId, AGSetId, AGId, TimelineId or SignalId! while executing "AG::Store Transcriber "titi.Transcriber" $transId" If you have an idea.... nicolas garbe ------------------------------------------------------- |
From: garbe n. <Nic...@et...> - 2002-09-05 14:38:05
|
---------- Message transmis ---------- Subject: transcriber and ag Date: Wed, 4 Sep 2002 17:09:25 -0400 From: garbe nicolas <ga...@et...> To: Haejoong Lee <hae...@un...>, Xiaoyi Ma <xm...@un...> Hello, I work on Transcriber and I tried to use libag. But I have a probleme with Transcriber.cc, I wrote this file ################################################# #!/usr/bin/wish load ../src/ag_wrapper/tcl/.libs/ag.so set transId [AG::CreateAGSet "Transcriber"] set toto [AG::Load Transcriber "frint980428.xml" $transId] ########################################" But I have a problem with Load and more particular with ########################################## static void mk_mdata(DOMNode* e, map<string,string>& m) { cout << "rentre dans mdata\n"; if (e == NULL) { cout << "arbre non initialise\n"; return; } DOMNamedNodeMap* A = e->getAttributes(); cout << "attributes\n"; if (A == NULL) return; for (int i=0; i < A->getLength(); ++i) { DOMNode* a = A->item(i); m[get_fname(e->getNodeName(),a->getNodeName())] = StrX()(a->getNodeValue()); } cout << "sortie mdata\n"; } ########################################## DOMNode isn't initialised.... I attached the file I used, If you can give me an example in tcl which used load and store, it'll be very interessant for me Must I give an agsetid for load with transcriber ? nicolas ------------------------------------------------------- |
From: Xiaoyi Ma <xm...@un...> - 2002-09-03 22:14:48
|
Matthias, I've fixed this. You can get an update from the CVS server. Xiaoyi On Thu, Aug 22, 2002 at 10:39:59AM +0200, Matthias Thomae wrote: > Hello Xiaoyi, > > Xiaoyi Ma wrote: > > I couldn't replicate the problem you encounted when with > > AG::UnsetAnchorOffset. Do you have some sample code I can use to > figure out > > what went wrong? > > Yes, I have attached some sample code, and a sample AIF File. When > calling "test.tcl Text.xml", you should get something like this: > > trying to unset anchor Test:AG-901-1.3.1:Anchor10, offset 147.313241 > Test:AG-901-1.3.1:Anchor1 Test:AG-901-1.3.1:Anchor2 > Test:AG-901-1.3.1:Anchor3 Test:AG-901-1.3.1:Anchor4 > Test:AG-901-1.3.1:Anchor5 Test:AG-901-1.3.1:Anchor6 > Test:AG-901-1.3.1:Anchor7 Test:AG-901-1.3.1:Anchor8 > Test:AG-901-1.3.1:Anchor10 Test:AG-901-1.3.1:Anchor11 > Test:AG-901-1.3.1:Anchor12 Test:AG-901-1.3.1:Anchor9 > AGExceptionAG Exception caught when calling UnsetAnchorOffset: > Anchor not found! > while executing > "AG::UnsetAnchorOffset $anchor" > > > Have you also noticed my mail from the 8th of August, where I wrote: > > > I also noticed that before the exception occurs, [AG::GetAnchorSet > > $agId] returns the anchor set in a different order than before, which is > > the correct topological and chronological order. > > > > Maybe this problem is related to the sorting of the anchors? > > As I understood from the papers, an AG is a directed acyclic graph, so > > that a topological sorting would be possible regardless of the anchor > > offsets. But as I understand from the sources, the sorting of the > > anchors is only done chronologically, i.e. if all anchors have an offset. > > > > So apart from the problem above, wouldn't the topological sorting of the > > anchors be a nice feature for the AG library? > > Regards. > Matthias -- Regards, Xiaoyi -------------------------------- Linguistic Data Consortium University of Pennsylvania email: xm...@ld... phone: (215)573-5491 fax: (215)573-2175 |
From: Matthias T. <th...@ei...> - 2002-08-30 08:16:58
|
Steven Bird wrote: > Mark Liberman wrote: > Note that prec (defined as in my last posting) will do this. However, we > don't maintain this information in AGLIB, since it gets about as complex as > a truth maintenance system (updates can have non-local consequences; a > single unsetAnchorOffset may lead to having to retract unbounded amounts of > precedence data). How about not maintaining the prec information for any update, but only generate it when required, i.e. when the user calls a getXSorted function? Matthias |
From: Matthias T. <th...@ei...> - 2002-08-30 08:09:53
|
Steven, Steven Bird wrote: >>have one AG per utterance. All AGs are linear and connected with respect >>to the word sequence. > > I guess you would permit coterminous annotations, and annotations that span > multiple words. Thus the anchors form a linear order, but the annotations > themselves do not. Sorry, I was not precise. I meant all my annotations of TYPE "Word" are linear and connected, so that AG::CheckLinear $agId "Word" AG::CheckConnected $agId "Word" succeed in the current implementation. >>I'd tend to the first approach, only because I think the topological and >>temporal sort is a basic feature that should be integrated closely with >>the library. > > Yes, but that unfortunately makes the closed-world assumption. You tool > would only be able to add its kind of annotation to existing AGs that > behave the same constraints. It seems strange to require that you cannot > add connected annotations to an AG which happens to be disconnected. I thought we were talking about the possibility to restricting the sorting to certain annotation types, meaning that an AG may be connected and disconnected at the same time, depending on what annotation type(s) are viewed as 'relevant'? So that my Tool would only view a subset of all annotation types as 'his/hers' and ignore the others, so that only the relevant types have to meet the constraints imposed by my Tool. >> > Anyway, to sum up, before adding new sort functions we should first >> > analyze the stated needs carefully... >> >>I hope I could contribute to clarifying some needs :) > > You'd be very welcome to. In the fall I'd like to redesign the API a > little (stronger typing) and add better support for sorting, and we'll > need to work through the details on this list... I'd be happy to contribute as far as I am able to. Regards. Matthias |
From: Steven B. <sb...@un...> - 2002-08-29 19:47:54
|
Mark Liberman wrote: > It might be appropriate to use the 'sort' command and 'qsort' function > as models. I think this is an excellent suggestion for where we should start... > A.1------>A.2----->A.3 B.1----->B.2----->B.3 > 0.1 0.3 1.6 1.9 > > we can prove that B.2 must be later than A.2, but neither a topological > sort nor a temporal sort (of the explicit kind at least) tells us this > directly. Note that prec (defined as in my last posting) will do this. However, we don't maintain this information in AGLIB, since it gets about as complex as a truth maintenance system (updates can have non-local consequences; a single unsetAnchorOffset may lead to having to retract unbounded amounts of precedence data). > It would also not be terribly hard to associate time ranges with nodes > that lack time marks but are connected to nodes that do. Perhaps > something like this is already done to ensure that new time marks are > consistent with old ones? No, but there's a paper which discusses this proposal in detail (section 6, Bird, Buneman & Tan 2000, http://arXiv.org/abs/cs/0007023) Pace my comments above, perhaps we need to think about how to integrate this into AGLIB. For some applications it will improve efficiency (but it could be a significant overhead for others). -S |
From: Steven B. <sb...@un...> - 2002-08-29 19:36:33
|
Matthias, > I assume that all annotations are based on the word sequence, so that > there are no other anchors than the ones that mark word boundaries. I > have one AG per utterance. All AGs are linear and connected with respect > to the word sequence. I guess you would permit coterminous annotations, and annotations that span multiple words. Thus the anchors form a linear order, but the annotations themselves do not. > I am now circumventing the problem by assigning offsets to the > within-utterance anchors as well, so that the temporal sorting is > possible, by I am not very happy with this. Agreed, you shouldn't be forced to do this in order to get our API to do what you want. > Why not combine topological and temporal sorting? Is it possible to have > an AG whose topological and temporal sortings contradict each other? No, by definition, the AG arrows go forward in time. If we have an annotation from anchor a to anchor b, then t(a) <= t(b). > (e.g. consider (anchor1, offset 2 sec. -> anchor2, offset 1 sec.)) > If this is not allowed anyway, I could imagine somthing like this: > > As a first step, a topsort (maybe restricted to some type) could > indentify the ambiguities: > - disjoint sections of an AG > - sets of anchors of the same topological 'grade', i.e. where the > relative ordering cannot be determined from the topology alone > > In a second step, a temporal sort could resolve (some of) the > ambiguities left from the topsort and at the same time check if any > violations (see example above) exist. If there are still ambiguities > left these may be made aware to the user of the AG library. Actually, the B&L paper defines prec_s (structural precedence) and prec_t (temporal precedence), and prec, an ordering which is the transitive closure prec_s and prec_t. > By the way, wouldn't it be possible to sort an AGSet on a temporal basis > just like the disjoint sections of an AG? Only if there is a common timeline. > > At some point in the future we may decide that particular kinds of AG > > need special treatment (e.g. connected AGs, totally anchored AGs, > > linear AGs). There's two approaches available: enriching the class > > structure or creating new high-level APIs. We could subclass our AG > > class and add extra methods (e.g. a getAnchorSeq method which did a > > topological sort on ConnectedAGs and a temporal sort on TotalAGs.) > > Some existing functions can be written more efficiently when we can > > make safe assumptions about the topology of the AG. The other > > approach is to build a new API on top of the AG API which exposes a > > subset of the AG functions and adds some new ones which can be > > implemented using the existing API (cf our treebank API). > > I'd tend to the first approach, only because I think the topological and > temporal sort is a basic feature that should be integrated closely with > the library. Yes, but that unfortunately makes the closed-world assumption. You tool would only be able to add its kind of annotation to existing AGs that behave the same constraints. It seems strange to require that you cannot add connected annotations to an AG which happens to be disconnected. > > Anyway, to sum up, before adding new sort functions we should first > > analyze the stated needs carefully... > > I hope I could contribute to clarifying some needs :) You'd be very welcome to. In the fall I'd like to redesign the API a little (stronger typing) and add better support for sorting, and we'll need to work through the details on this list... -Steven |
From: Mark L. <my...@un...> - 2002-08-28 13:36:15
|
Dear all, It might be appropriate to use the 'sort' command and 'qsort' function as models. One often winds up sorting things according to a primary criterion that does not impose a total order. In that case, one can define secondary (and tertiary, and ...) sort criteria, or one can just accept whatever comes back, recognizing that the resulting behavior is partly undefined (though some particular order always emerges). The 'sort' command offers the possibility to define different primary, secondary, etc. sort criteria, within a class of possibilities (i.e. numerical, lexicographical, reverse, choice of field within line, etc.). The qsort() function requires the user to supply a comparison function. In both cases, the result is always a particular ordering, but it is explicitly allowed that the order might be undefined in some (or even all!) cases. As long as the relation in question (e.g. the topological order of anchors, or the temporal order of anchors) is a partial order, this seems to be OK. Such sorts will not always be useful things to do, of course, but that is up to users/programmers to figure out. There is a semantic issue lurking here, I guess, which is that nodes without time marks may implicit inherit time region constraints from other nodes that they are connected to; and therefore may have an implicit ordering with respect to other nodes (with or without time marks) that they are not connected to. Thus in A.1------>A.2----->A.3 B.1----->B.2----->B.3 0.1 0.3 1.6 1.9 we can prove that B.2 must be later than A.2, but neither a topological sort nor a temporal sort (of the explicit kind at least) tells us this directly. The easy thing to do with this problem is to ignore it; this just means that the semantics of such sorting has to be clearly understood, and recognized *not* to be equivalent to logically-deducible time order relations in all cases. I think this would be the Unix Way, since it results in a maximally simple treatment that is often but not always the Right Thing. It would also not be terribly hard to associate time ranges with nodes that lack time marks but are connected to nodes that do. Perhaps something like this is already done to ensure that new time marks are consistent with old ones? Regards, Mark Liberman >Steven, > >Steven Bird wrote: > > Matthias Thomae <th...@ei...> wrote: > > > >>So apart from the problem above, wouldn't the topological sorting of the > >>anchors be a nice feature for the AG library? > > > > If some application needs this we should add it to the library. > >I am currently developing a tool that would benefit from this :) >The tool is intended for speech recognition and understanding purposes. >I assume that all annotations are based on the word sequence, so that >there are no other anchors than the ones that mark word boundaries. I >have one AG per utterance. All AGs are linear and connected with respect >to the word sequence. > >Since I do not necessarily need the word but only the utterance >segmentation, only the first and last anchors of each AG are guaranteed >to have offsets. > >I am now circumventing the problem by assigning offsets to the >within-utterance anchors as well, so that the temporal sorting is >possible, by I am not very happy with this. Maybe at some point in the >future I will want a word segmentation for some part of a corpus, then I >will have the problem of discriminating 'real' and 'dummy' offsets. > >So the AGs I am using are connected and linear with respect to a certain >annotation type, as stated above, so the topsort of this type of AG >would be sufficent for my needs. But I understand that you would like to >implement the sorting on a more general basis. > > > Note that there are some issues that would need to be addressed: > > > > Disconnected AGs: In general, AGs aren't required to be connected, and > > so there is no guarantee you can traverse an AG from some unique > > earliest node (and the earliest node may not be unique). What allows > > us to sort the disjoint sections of an AG relative to each other are > > the offsets (when they exist), but these offsets would be ignored in a > > purely topological sort. > >Why not combine topological and temporal sorting? Is it possible to have >an AG whose topological and temporal sortings contradict each other? >(e.g. consider (anchor1, offset 2 sec. -> anchor2, offset 1 sec.)) >If this is not allowed anyway, I could imagine somthing like this: > >As a first step, a topsort (maybe restricted to some type) could >indentify the ambiguities: >- disjoint sections of an AG >- sets of anchors of the same topological 'grade', i.e. where the >relative ordering cannot be determined from the topology alone > >In a second step, a temporal sort could resolve (some of) the >ambiguities left from the topsort and at the same time check if any >violations (see example above) exist. If there are still ambiguities >left these may be made aware to the user of the AG library. > >By the way, wouldn't it be possible to sort an AGSet on a temporal basis >just like the disjoint sections of an AG? > > > Ambiguity: In many AGs, such as the ones derived from trees using the > > chart construction, there is a total ordering on the anchors thanks to > > the total ordering of the word string on which the tree is based. In > > general though, there's no guaranteed unique ordering among anchors > > (e.g. consider the graph {(1,2), (1,3), (2,4), (3,4)} where the > > relative ordering of 2 and 3 can't be read off the graph). Would the > > mapping from partial to total order be arbitrary, or would a canonical > > order be defined somehow? > >If I again assume that topological, temporal and canonical order do not >contradict each other, two possibilities are imaginable: >- the canonical order is defined before the top/temporal sort >- the AGLIB user takes care of the canonical sort himself > > > Types: In general, we'd like our tools to be able to add new layers of > > annotation to existing annotations; tools shouldn't assume that they > > know about all of the annotations in the AG. The simplest way to > > handle this is to use an explicit type argument on calls to all of the > > getX functions. To support this style of application programming, > > we'd need a version of the topological sort on anchors that only > > considered anchors incident to an arc of type t. > >I agree. So the getXSorted function might return only a subset of all >anchors in an AG? > > > At some point in the future we may decide that particular kinds of AG > > need special treatment (e.g. connected AGs, totally anchored AGs, > > linear AGs). There's two approaches available: enriching the class > > structure or creating new high-level APIs. We could subclass our AG > > class and add extra methods (e.g. a getAnchorSeq method which did a > > topological sort on ConnectedAGs and a temporal sort on TotalAGs.) > > Some existing functions can be written more efficiently when we can > > make safe assumptions about the topology of the AG. The other > > approach is to build a new API on top of the AG API which exposes a > > subset of the AG functions and adds some new ones which can be > > implemented using the existing API (cf our treebank API). > >I'd tend to the first approach, only because I think the topological and >temporal sort is a basic feature that should be integrated closely with >the library. > > > Anyway, to sum up, before adding new sort functions we should first > > analyze the stated needs carefully... > >I hope I could contribute to clarifying some needs :) > >Regards. >Matthias > > > >------------------------------------------------------- >This sf.net email is sponsored by: Jabber - The world's fastest growing >real-time communications platform! Don't just IM. Build it in! >http://www.jabber.com/osdn/xim >_______________________________________________ >agtk-devel mailing list >agt...@li... >https://lists.sourceforge.net/lists/listinfo/agtk-devel -- -Mark Liberman |
From: Matthias T. <th...@ei...> - 2002-08-28 13:04:41
|
Steven, Steven Bird wrote: > Matthias Thomae <th...@ei...> wrote: > >>So apart from the problem above, wouldn't the topological sorting of the >>anchors be a nice feature for the AG library? > > If some application needs this we should add it to the library. I am currently developing a tool that would benefit from this :) The tool is intended for speech recognition and understanding purposes. I assume that all annotations are based on the word sequence, so that there are no other anchors than the ones that mark word boundaries. I have one AG per utterance. All AGs are linear and connected with respect to the word sequence. Since I do not necessarily need the word but only the utterance segmentation, only the first and last anchors of each AG are guaranteed to have offsets. I am now circumventing the problem by assigning offsets to the within-utterance anchors as well, so that the temporal sorting is possible, by I am not very happy with this. Maybe at some point in the future I will want a word segmentation for some part of a corpus, then I will have the problem of discriminating 'real' and 'dummy' offsets. So the AGs I am using are connected and linear with respect to a certain annotation type, as stated above, so the topsort of this type of AG would be sufficent for my needs. But I understand that you would like to implement the sorting on a more general basis. > Note that there are some issues that would need to be addressed: > > Disconnected AGs: In general, AGs aren't required to be connected, and > so there is no guarantee you can traverse an AG from some unique > earliest node (and the earliest node may not be unique). What allows > us to sort the disjoint sections of an AG relative to each other are > the offsets (when they exist), but these offsets would be ignored in a > purely topological sort. Why not combine topological and temporal sorting? Is it possible to have an AG whose topological and temporal sortings contradict each other? (e.g. consider (anchor1, offset 2 sec. -> anchor2, offset 1 sec.)) If this is not allowed anyway, I could imagine somthing like this: As a first step, a topsort (maybe restricted to some type) could indentify the ambiguities: - disjoint sections of an AG - sets of anchors of the same topological 'grade', i.e. where the relative ordering cannot be determined from the topology alone In a second step, a temporal sort could resolve (some of) the ambiguities left from the topsort and at the same time check if any violations (see example above) exist. If there are still ambiguities left these may be made aware to the user of the AG library. By the way, wouldn't it be possible to sort an AGSet on a temporal basis just like the disjoint sections of an AG? > Ambiguity: In many AGs, such as the ones derived from trees using the > chart construction, there is a total ordering on the anchors thanks to > the total ordering of the word string on which the tree is based. In > general though, there's no guaranteed unique ordering among anchors > (e.g. consider the graph {(1,2), (1,3), (2,4), (3,4)} where the > relative ordering of 2 and 3 can't be read off the graph). Would the > mapping from partial to total order be arbitrary, or would a canonical > order be defined somehow? If I again assume that topological, temporal and canonical order do not contradict each other, two possibilities are imaginable: - the canonical order is defined before the top/temporal sort - the AGLIB user takes care of the canonical sort himself > Types: In general, we'd like our tools to be able to add new layers of > annotation to existing annotations; tools shouldn't assume that they > know about all of the annotations in the AG. The simplest way to > handle this is to use an explicit type argument on calls to all of the > getX functions. To support this style of application programming, > we'd need a version of the topological sort on anchors that only > considered anchors incident to an arc of type t. I agree. So the getXSorted function might return only a subset of all anchors in an AG? > At some point in the future we may decide that particular kinds of AG > need special treatment (e.g. connected AGs, totally anchored AGs, > linear AGs). There's two approaches available: enriching the class > structure or creating new high-level APIs. We could subclass our AG > class and add extra methods (e.g. a getAnchorSeq method which did a > topological sort on ConnectedAGs and a temporal sort on TotalAGs.) > Some existing functions can be written more efficiently when we can > make safe assumptions about the topology of the AG. The other > approach is to build a new API on top of the AG API which exposes a > subset of the AG functions and adds some new ones which can be > implemented using the existing API (cf our treebank API). I'd tend to the first approach, only because I think the topological and temporal sort is a basic feature that should be integrated closely with the library. > Anyway, to sum up, before adding new sort functions we should first > analyze the stated needs carefully... I hope I could contribute to clarifying some needs :) Regards. Matthias |
From: Steven B. <sb...@un...> - 2002-08-24 15:28:23
|
Matthias Thomae <th...@ei...> wrote: > So apart from the problem above, wouldn't the topological sorting of the > anchors be a nice feature for the AG library? If some application needs this we should add it to the library. Note that there are some issues that would need to be addressed: Disconnected AGs: In general, AGs aren't required to be connected, and so there is no guarantee you can traverse an AG from some unique earliest node (and the earliest node may not be unique). What allows us to sort the disjoint sections of an AG relative to each other are the offsets (when they exist), but these offsets would be ignored in a purely topological sort. Ambiguity: In many AGs, such as the ones derived from trees using the chart construction, there is a total ordering on the anchors thanks to the total ordering of the word string on which the tree is based. In general though, there's no guaranteed unique ordering among anchors (e.g. consider the graph {(1,2), (1,3), (2,4), (3,4)} where the relative ordering of 2 and 3 can't be read off the graph). Would the mapping from partial to total order be arbitrary, or would a canonical order be defined somehow? Types: In general, we'd like our tools to be able to add new layers of annotation to existing annotations; tools shouldn't assume that they know about all of the annotations in the AG. The simplest way to handle this is to use an explicit type argument on calls to all of the getX functions. To support this style of application programming, we'd need a version of the topological sort on anchors that only considered anchors incident to an arc of type t. At some point in the future we may decide that particular kinds of AG need special treatment (e.g. connected AGs, totally anchored AGs, linear AGs). There's two approaches available: enriching the class structure or creating new high-level APIs. We could subclass our AG class and add extra methods (e.g. a getAnchorSeq method which did a topological sort on ConnectedAGs and a temporal sort on TotalAGs.) Some existing functions can be written more efficiently when we can make safe assumptions about the topology of the AG. The other approach is to build a new API on top of the AG API which exposes a subset of the AG functions and adds some new ones which can be implemented using the existing API (cf our treebank API). Anyway, to sum up, before adding new sort functions we should first analyze the stated needs carefully... -Steven |
From: Matthias T. <th...@ei...> - 2002-08-22 08:40:20
|
Hello Xiaoyi, Xiaoyi Ma wrote: > I couldn't replicate the problem you encounted when with > AG::UnsetAnchorOffset. Do you have some sample code I can use to figure out > what went wrong? Yes, I have attached some sample code, and a sample AIF File. When calling "test.tcl Text.xml", you should get something like this: trying to unset anchor Test:AG-901-1.3.1:Anchor10, offset 147.313241 Test:AG-901-1.3.1:Anchor1 Test:AG-901-1.3.1:Anchor2 Test:AG-901-1.3.1:Anchor3 Test:AG-901-1.3.1:Anchor4 Test:AG-901-1.3.1:Anchor5 Test:AG-901-1.3.1:Anchor6 Test:AG-901-1.3.1:Anchor7 Test:AG-901-1.3.1:Anchor8 Test:AG-901-1.3.1:Anchor10 Test:AG-901-1.3.1:Anchor11 Test:AG-901-1.3.1:Anchor12 Test:AG-901-1.3.1:Anchor9 AGExceptionAG Exception caught when calling UnsetAnchorOffset: Anchor not found! while executing "AG::UnsetAnchorOffset $anchor" Have you also noticed my mail from the 8th of August, where I wrote: > I also noticed that before the exception occurs, [AG::GetAnchorSet > $agId] returns the anchor set in a different order than before, which is > the correct topological and chronological order. > > Maybe this problem is related to the sorting of the anchors? > As I understood from the papers, an AG is a directed acyclic graph, so > that a topological sorting would be possible regardless of the anchor > offsets. But as I understand from the sources, the sorting of the > anchors is only done chronologically, i.e. if all anchors have an offset. > > So apart from the problem above, wouldn't the topological sorting of the > anchors be a nice feature for the AG library? Regards. Matthias |
From: Xiaoyi Ma <xm...@un...> - 2002-08-21 20:25:38
|
On Mon, Aug 05, 2002 at 02:32:03PM +0200, Matthias Thomae wrote: > Dear AG developers, > > I encountered a problem using the AG::SplitAnnotation function from Tcl. > When I try to reference the newly created anchor, e.g. by setting its > offset, I get an exception with a message: > > "<anchorId> is an invalid AnchorId" > > I think this problem originates from the AG::splitAnnotation method in > AG.cc, in that the anchor which is constructed is not added to the > identifiers. After adding > > addAnchor(n); > > to AG::splitAnnotation and recompiling, the problem disappears. > > I have a similar problem when unsetting anchor offsets using > AG::UnsetAnchorOffset, but no solution. Matthias, I couldn't replicate the problem you encounted when with AG::UnsetAnchorOffset. Do you have some sample code I can use to figure out what went wrong? Thanks. Xiaoyi > > Maybe one of the developers could check the sources with respect to the > Identifier management, in case more of these problems exist? > > Regards. > Matthias > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > agtk-devel mailing list > agt...@li... > https://lists.sourceforge.net/lists/listinfo/agtk-devel -- Regards, Xiaoyi -------------------------------- Linguistic Data Consortium University of Pennsylvania email: xm...@ld... phone: (215)573-5491 fax: (215)573-2175 |
From: Haejoong L. <hae...@un...> - 2002-08-20 15:06:58
|
Matthias, > Maybe the two calls to string::clear() should be replaced by > string::erase()? Right. string::clear() has been introduced since gcc 3.0. I always forget that it is not supported by older gcc's. > I have also noticed that the compilation won't work with Xerces-C V1.7.0 > any more. I had success with V2.0.0. > > Which version of Xerces-C do you recommend? Transcriber loader uses dom of xercesc 2.0.0, where there have been many api-level changes since last version. Maybe I can rewrite it using sax, but for now it is necessary to use 2.0.0. Probably I'll do some work on configuration script to enable or disable Transcriber format. Thanks, Haejoong |
From: Matthias T. <th...@ei...> - 2002-08-20 13:20:12
|
Hello AGTK developers, trying to compile the latest CVS version of AGLIB on linux (gcc 2.95.3), I get the following error: agf/Transcriber.cc: In function `void trim(string &)': agf/Transcriber.cc:134: no matching function for call to `basic_string<char,string_char_traits<char>,__default_alloc_template<true,0> >::clear ()' agf/Transcriber.cc: In function `void mk_ag(const AGId &, DOMNode *)': agf/Transcriber.cc:261: no matching function for call to `basic_string<char,string_char_traits<char>,__default_alloc_template<true,0> >::clear ()' Maybe the two calls to string::clear() should be replaced by string::erase()? I have also noticed that the compilation won't work with Xerces-C V1.7.0 any more. I had success with V2.0.0. Which version of Xerces-C do you recommend? Regards. Matthias |
From: Steven B. <sb...@un...> - 2002-08-10 13:55:57
|
Folks, > Steve Cassidy wrote: > > > The compile time switch might be a good way to manage the change > > but in the end everyone will be sure to move over given the obvious > > need for this. > > Mark Liberman wrote: > > > Another way to manage the change would be to re-name the functions > > (e.g. "GetAnnotationSetByOffset" -> "GetAnnotationListByOffset") > > and deprecate the old ones. I'm online briefly, so let me put in my 2c worth. I think we're agreed that everyone will want to make the switch. If it was a difficult switch to make, such that different projects would have to allocate time well in advance, then I think Steve's proposal would be warranted. However, since the changes to application code are quite trivial, I think our developers will cope with a sudden switch that comes from upgrading to a new library version. The extra work for us in adding conditional compilation flags throughout our code, plus supporting documentation, just isn't worth it in my opinion. As far as the function names go, note that we also have functions called: GetAnnotationSeqByXXXX (NB Seq not Set). Thus the function name is clear about whether the set is ordered or not. The name "List" is ambiguous to me, and only provides one name, when we'd need two. A stronger objection is that using this name would represent a long-term change to the API definition, which is not motivating (in my opinion) for such a trivial change. > I think both are good ideas. Are there any opinions as to which of these > is better? Perhaps, we can choose one of these two methods, and release > the new API as version 1.2 or 2.0 quite soon. So I vote to make a well documented change to the return value of all the Set and Seq functions, and release it in version 1.2 of the library. I vote to hold off on 2.0 until there's much broader experience with the library among developers in the community. Steven |
From: Kazuaki M. <ma...@un...> - 2002-08-09 19:12:42
|
Hello everyone, Thanks for the discussion. Matthias Thomae wrote: > > The new types would be something like the following. These will be > > lists (or arrays) in C++, Python, Java and Tcl. > > Does that mean, if I now process the data as list in Tcl, I don't have > to make any changes? With the current API, I think you'd convert the return values of the affected functions to lists by, perhaps, using "split". set annSet [split [::AG::GetAnnotationSetByOffset $AGID 0.100]] This would need to be changed to something like the following when the new API comes out. set annSet [::AG::GetAnnotationSetByOffset $AGID 0.100] Things required in the change are perhaps simply 1) grep'ing the affected functions in the code, and 2) removing a few things from the code. Steve Cassidy wrote: > The compile time switch might be a good way to manage the change > but in the end everyone will be sure to move over given the obvious > need for this. Mark Liberman wrote: > Another way to manage the change would be to re-name the functions > (e.g. "GetAnnotationSetByOffset" -> "GetAnnotationListByOffset") > and deprecate the old ones. I think both are good ideas. Are there any opinions as to which of these is better? Perhaps, we can choose one of these two methods, and release the new API as version 1.2 or 2.0 quite soon. Thank you, - Kazuaki Maeda (LDC) |
From: Matthias T. <th...@ei...> - 2002-08-09 12:00:40
|
Hello Haejoong, Matthias Thomae wrote: > I think the problem is on the 'other side', when *building* annotation > graphs and storing them in AIF format via the "toXML" function. It is > not possible to set the encoding of the input strings, as far as I have > seen. I seem to have a kind of solution or workaround for this problem, at least for the Tcl API: set outXmlFile [open "agset.xml" "w"] fconfigure $outXmlFile -encoding "utf-8" puts $outXmlFile [AG::toXML $agsetName] close $outXmlFile This stores the XML Document as UTF-8. As the load function of the AG library assumes UTF-8 encoding by default, loading and storing now takes the same encoding and my error message disappears. Regards. Matthias |
From: Mark L. <my...@un...> - 2002-08-09 11:21:25
|
>On Fri, 9 Aug 2002 07:23, Kazuaki Maeda wrote: >> We are interested in hearing your opinions about the following possible >> changes in the API. >> >> In the current AG API, functions that return sets of ID's, names, etc. have >> the return type of "string" instead of "list". For example, >> GetAnnotationSetByOffset returns a string that may look like >> "AGSet1:AG1:Annotation1 AGSet1:AG1:Annotation2 ...". This requires >> additional processing on the application side. We are thinking about >> changing the API in a future version of AGLIB such that these functions >> return lists instead of strings. > >Absolutely, please make this change. These should never have been strings in >the first place. > >The compile time switch might be a good way to manage the change but in the >end everyone will be sure to move over given the obvious need for this. > >Steve Another way to manage the change would be to re-name the functions (e.g. "GetAnnotationSetByOffset" -> "GetAnnotationListByOffset") and deprecate the old ones. -Mark |
From: Matthias T. <th...@ei...> - 2002-08-09 11:17:17
|
Hello Kazuaki, Kazuaki Maeda wrote: > Dear developers who use the AG API, I use the Tcl API. > We are interested in hearing your opinions about the following possible > changes in the API. > The new types would be something like the following. These will be > lists (or arrays) in C++, Python, Java and Tcl. Does that mean, if I now process the data as list in Tcl, I don't have to make any changes? Regards. Matthias |
From: Matthias T. <th...@ei...> - 2002-08-09 11:00:04
|
Hello Haejoong, Haejoong Lee wrote: > Have you tried 'encoding' option to load the file? > > 'encoding' option is available for AIF loader. > Here is an example code: > > package require ag > array set sig {} > set opt(encoding) UTF-8 > AG::Load AIF timit.xml "" sig opt > ... Ok, I didn't know that. I tried it, but it made no difference, I still get the same error, no matter if the encoding is set to UTF-8, ISO-8859-1 or ASCII. By the way, as I understand from the sources, UTF-8 is the default encoding. I think the problem is on the 'other side', when *building* annotation graphs and storing them in AIF format via the "toXML" function. It is not possible to set the encoding of the input strings, as far as I have seen. I have also noticed that the AG library creates the XML document "by hand", not using the DOM XML document building functions of Xerces. Let me take some guesses about this: Since you seem to use C++ strings (ASCII) in the AG library, the encoding of the generated document is always ASCII. Using the DOM to build the XML document would allow you to set different encodings. Regards. Matthias |
From: Steve C. <ste...@mq...> - 2002-08-09 05:54:18
|
On Fri, 9 Aug 2002 07:23, Kazuaki Maeda wrote: > We are interested in hearing your opinions about the following possible > changes in the API. > > In the current AG API, functions that return sets of ID's, names, etc. have > the return type of "string" instead of "list". For example, > GetAnnotationSetByOffset returns a string that may look like > "AGSet1:AG1:Annotation1 AGSet1:AG1:Annotation2 ...". This requires > additional processing on the application side. We are thinking about > changing the API in a future version of AGLIB such that these functions > return lists instead of strings. Absolutely, please make this change. These should never have been strings in the first place. The compile time switch might be a good way to manage the change but in the end everyone will be sure to move over given the obvious need for this. Steve |
From: Haejoong L. <hae...@un...> - 2002-08-08 22:10:49
|
Matthias, Have you tried 'encoding' option to load the file? 'encoding' option is available for AIF loader. Here is an example code: package require ag array set sig {} set opt(encoding) UTF-8 AG::Load AIF timit.xml "" sig opt ... XercesC++ FAQ says that: Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse input XML files in these above mentioned encodings. Thanks, Haejoong On Thu, Aug 08, 2002 at 08:05:34PM +0200, Matthias Thomae wrote: > Hello agtk developers, > > I have just stumbled over a non-ASCII label which I added as a feature > value to an annotation graph via the tcl wrapper, and saved the > resulting AGSet in AIF Format. > > When trying to read that file back, I get an error message like: > > loading agset from file Test.xml...agf:Expected end of tag 'Feature' > Error in startup script: LoadErroragf:Expected end of tag 'Feature' > while executing > "AG::load "AIF" $agSetFileName" > > I checked the XML file, and apparently the characters are saved 'as-is', > which was the source of the error. I had a similar problem in a > different matter, and solved it by storing the data as UTF-8. That meant > converting all strings read to and from the XML document via the parser, > using helper functions like "iso2utf8 and utf82iso". > > Is that something the AG Library user has to take care of, or should > that be implemented in the library itself? > > Regards. > Matthias > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > agtk-devel mailing list > agt...@li... > https://lists.sourceforge.net/lists/listinfo/agtk-devel |