From: Mani, I. <im...@mi...> - 2007-04-19 14:05:06
|
Apologies for the resend, but some folks didn't receive these two messages, for reasons yet to be determined.=20 -----Original Message----- From: Mani, Inderjeet=20 Sent: Wednesday, April 18, 2007 9:11 PM To: 'spa...@li...' Subject: RE: [Spatialml-discussion] some comments Greg, thanks for your insightful comments! Some responses below (see **). -----Original Message----- From: spa...@li... [mailto:spa...@li...] On Behalf Of Greg Jan=E9e Sent: Wednesday, April 18, 2007 3:27 PM To: spa...@li... Subject: [Spatialml-discussion] some comments Generally, this looks good to me. =20 **Good to hear that. Comments below: Content-related comments: 1. A gazetteer reference is made using the gazref attribute, which =20 takes the form prefix:identifier. Have you considered making such =20 references href-style URLs? For then a client could automatically =20 follow a gazref link and retrieve the associated place information. =20 I realize that gazetteer record formats are not yet standardized, nor =20 is the identification of places by URIs, but this is a use case that =20 could help drive such standardization. As SpatialML stands now, a =20 client needs three document-external pieces of information to follow =20 a gazref: the location of the gazetteer being referred to; the =20 protocol for accessing that gazetteer; and the format of that =20 gazetteer's records. (Also, see comment #6 below). **This is an excellent point. XML Schemas should be made use of in the guidelines, to facilitate such an integration.=20 2. PLACEs identify (by surrounding with XML tags) relevant portions =20 of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20 Perhaps, by symmetry, they should? For example, a LINK could =20 surround the relevant preposition ("in") or punctuation (","). **Both PATHs and LINKs are non-text-consuming. SIGNALs are textual indicators motivating a particular type of attribute, e.g., direction, distance (for Paths). The indicators here are usually explicitly provided in the text, e.g., "30 miles", "on top of", etc. Do LINKs merit SIGNALs? From cases like "near", "in", etc., one might think so. However, some indicators may be expressed in certain languages just by punctuation, e.g., "Bedford, MA". Further, we've included equality, which can be expressed anaphorically, without an explicit signal, e.g., "Baltimore .. The city ...". Worth thinking about more. 3. Seems like the document should at least suggest a format for =20 latitude/longitude coordinates. **Yes, certainly, and it's in the works, as Section 24 suggests.=20 4. Why is PLACE's id attribute required when no other attribute is? **That's because the annotation editing/authoring tool being used here to annotate SpatialML (Callisto) requires that.=20 XML comments: 5. XML elements should be defined in a namespace, so that they can be =20 embedded in namespace-aware documents. **OK. 6. If gazetteers are identified by prefixes, consider using the XML =20 namespace mechanism to correlate prefixes with gazetteer URLs. This =20 technique is used in XML Schema, for example. Exposition comments: 7. Consider rephrasing sentences having "we" in them (e.g., "we try =20 to keep the extents as small as possible...") to passive requirements =20 ("guideline: extents should be kept as small as possible...").=20 **Thanks -- will certainly do that. The first-person plural doesn't sound very professional. As it is, the spec reads a little like a project-specific document instead =20 of a community standard. This is also an opportunity to revisit some =20 of those statements: are they really requirements? Or guidelines? =20 Best practices that everybody should follow? MITRE- or project-=20 specific? Etc. **One important requirement of an annotation scheme for natural language is that it should be effective for a human to annotate, in particular, people should be able to annotate documents according to the schem with high inter-annotator reliability (e.g., as measured by an automatic scoring program). One of the benefits of doing this is increased sharing of data and resources, and common evaluation standards. To satisfy this requirement, annotation guidelines have to be more like requirements, stated as rules, allowing for less slack, enforcing obedience, etc. Since we're dealing with natural language, however, it's hard to specify necessary and sufficient conditions for use of a particular annotation. Thus, we call them guidelines, as the NLP community has traditionally done. [To pontificate further -- human judgment always enters into this process, and that's often a good thing. There is not yet a scientific method for developing guidelines, or for training humans to annotate documents, but there are best practices that can be and are usually/often followed. Once the data is annotated, however, there are some basic empirical methods that are used for training machines to produce the annotations.] 8. I don't understand the distinction between cities, towns, and =20 villages in CTV. Should I? ***CTV is rather like a constrained version of the description attribute on a PLACE tag. If the text characterizes a place as being one of these, e.g., "village of Upper Slaughter" or "town of Chipping Camden" then the guidelines say it should be marked as such (i.e., CTV=3D"VILLAGE", and CTV=3D"TOWN"). Rather than attempt to further decompose these fuzzy concepts further in terms of gazetteer features, instead it may be desirable to record these characterizations 'at face value' in case the information is useful downstream. But it does seem somewhat awkward, not sure exactly why. Minor points: 9. Some codes are abbreviated (RGN), some are spelled out =20 (BODYOFWATER). I take it some were copied, but I guess I would =20 strive for consistency here. **OK. 10. More generally, the use of abbreviations counters XML's goal and =20 virtue of self-documentation. Coded abbreviations like mod=3D"BR" are =20 already inscrutable to me, and I read the spec all of 5 minutes ago. =20 Why not spell it out, i.e., modifier=3D"BORDER"? **OK. -Greg **Thanks again for the detailed comments! ----------------------------------------------------------------------- -- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Spatialml-discussion mailing list Spa...@li... https://lists.sourceforge.net/lists/listinfo/spatialml-discussion |