From: Steven B. <sb...@un...> - 2002-07-01 14:56:50
|
I have recently added 8 validation functions to aglib, listed below. The purpose of these functions is to validate the content and structure of an annotation graph. Special purpose applications that load AG data from an untrusted source may need to check the validity of the assumptions they make about the data before operating on it. This is an onerous task using the existing API, but greatly simplified using the new functions. Note that the functions are in the CVS tree [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/agtk/AGLIB/src/ag/Validation.cc?rev=1.5&content-type=text/vnd.viewcvs-markup], but not yet available via the scripting language interfaces and not yet included in any AGLIB distributions. Once they are, their documentation will show up in our online library documentation [http://agtk.sourceforge.net/doc/api/]. The fundamental principle behind these functions is that they don't make a "closed world assumption". They assume that an annotation graph may contain other layers of annotation that we don't care about. This is accomplished using types - the functions only work with annotations of the specified type (and with the anchors these annotations are connected to). For example, to check that an AG containing TIMIT data is well-formed, we could call: if ( checkAnchorOffsetTotal(ag, "phn") && checkFeatureExists(ag, "txt", "label") && checkFeatureExists(ag, "wrd", "label") && checkFeatureExists(ag, "phn", "label") && checkLinear(ag, "wrd") && checkLinear(ag, "phn") && checkSpan(ag, "txt", "wrd") && checkSpan(ag, "wrd", "phn") ) # WELL-FORMED Please see the function definitions below for a brief explanation of what each one does. Please let me know if you need any other low-level validation functions. (Note that some format conversion programs will need to check that an AG contains nothing other than the validated structure, to ensure lossless conversion. An easy way to do this check is to get a list of all the annotation types contained in the AG, and check that the list only contains those types that the application knows about and has validated.) Steven Bird ********************************************************************** VALIDATION FUNCTIONS: bool checkAnchorOffsetTotal(AG *ag, AnnotationType type); Check that all anchors of annotations of a given type have an offset. bool checkAnchorOffsetBounded(AG *ag, AnnotationType type); Check that all anchors of annotations of a given type are bounded by anchors that have an offset, following paths of this type. bool checkFeatureExists(AG *ag, AnnotationType type, FeatureName featureName); Check that all annotations of a given type have the specified feature. bool checkFeatureIsAnnotationId(AG *ag, AnnotationType type, FeatureName featureName); Check that all annotations of a given type have the specified feature and that its value is a valid AnnotationId. bool checkLinear(AG *ag, AnnotationType type); Check that all annotations of a given type form a connected linear sequence. bool checkConnected(AG *ag, AnnotationType type); Check that all annotations of a given type form a connected subgraph. bool checkCoextensive(AG *ag, AnnotationType type1, AnnotationType type2); Check that all annotations of type type1 are coextensive with an annotation of type type2, and vice versa (an existence not uniqueness test). bool checkSpan(AG *ag, AnnotationType spanType, AnnotationType spannedType); Check that all annotations of type spanType span annotations of type spannedType and that all annotations of type spannedType are spanned by annotations of type spanType. ********************************************************************** |