You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(3) |
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
(2) |
Dec
(1) |
2009 |
Jan
(4) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <gj...@al...> - 2007-04-19 17:13:25
|
Justin Richer wrote: > ...it's our position that every element SHOULD have an id, since an > external language might, say, string multiple PATHs and LINKs > together. > [...] > Maybe we need to say this a bit stronger in the guideline documents? I would argue just the opposite: while ids might be necessary in some cases, and might be desirable, the fact that they're not always necessary means they shouldn't be a requirement. It's just an application of the general principle of minimizing requirements. It's easy to imagine scenarios in which placenames might be marked up without using LINKs or other structures requiring ids, and hence the additional requirement would represent an unnecessary burden (and it is quite a burden, given the uniqueness requirement of ids). -Greg |
From: Hitzeman, J. M. <hi...@mi...> - 2007-04-19 15:55:51
|
In annotating using SpatialML, I have found there to be a natural difference between LINKs and PATHs. LINKs are used to pinpoint a PLACE while PATHs are used to give directions on how to get to a PLACE. The signals for a LINK, when present, are never more complicated than the possible linkTypes. [Boston, MA] is the same as [Boston in MA] and the relationship is clearly indicated by IN. Similarly, [Boston (42.358=B0N 71.060=B0W)] indicates an EQ relationship between Boston and its = latlong; The signals, if any, are the two parens. In contrast, the PATH indicates a relationship between two distinct places, two pins on a map. The signals have more semantic content than the finite list of linkTypes. If I tell you that I live in the house with the blue door across from the Commonwealth Pool, I've LINKed my house with the house with the blue door (one pin) and I've given you a PATH between the Pool (pin#1) and the house (pin#2) so that if you can find the pool you have instructions on how to find my house. PATHs are particularly useful if you're looking for a PLACE that doesn't have coordinates in any gazetteer, e.g., [We put down anchor five miles off of the port of Leith.] If you can find the latlong for Leith, you can guess at the latlong for our location. A LINK gives a much greater search space, e.g., [We put down anchor in the Waters of Leith.] Janet -----Original Message----- From: spa...@li... [mailto:spa...@li...] On Behalf Of Greg Jan=E9e Sent: Wednesday, April 18, 2007 3:27 PM To: spa...@li... Subject: [Spatialml-discussion] some comments 2. PLACEs identify (by surrounding with XML tags) relevant portions =20 of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20 Perhaps, by symmetry, they should? For example, a LINK could surround the relevant preposition ("in") or punctuation (","). [snip] |
From: Justin R. <jr...@mi...> - 2007-04-19 15:23:24
|
A few comments and clarifications that I wanted to make here as well. > 1. A gazetteer reference is made using the gazref attribute, which > takes the form prefix:identifier. Have you considered making such > references href-style URLs? For then a client could automatically > follow a gazref link and retrieve the associated place information. > I realize that gazetteer record formats are not yet standardized, nor > is the identification of places by URIs, but this is a use case that > could help drive such standardization. As SpatialML stands now, a > client needs three document-external pieces of information to follow > a gazref: the location of the gazetteer being referred to; the > protocol for accessing that gazetteer; and the format of that > gazetteer's records. (Also, see comment #6 below). > > **This is an excellent point. XML Schemas should be made use of in the > guidelines, to facilitate such an > integration. We basically are using URNs here, noting both where to get the info and what to ask for, though it makes no mention of how to get there, how to ask, or what the answer will look like. Maybe we could allow for full URLs by further specifying some type of URI scheme, such as "gaz:igdb:12345" in addition to allowing "http://igdb.foo/getrecord?r=12345" types. I'm of the opinion that requiring a URL fronted interface shouldn't exactly be a requirement. Using a URN allows for more flexibility in defining the accessor methods of a particular gazetteer. I do agree that it's something to be aspired to, but I don't think it should be a requirement. But even if we were to require a URL accessor in front of each gazetteer, doing so would still say nothing about the format of the returned result. It does cut down on the requirement of external information to make full use of gazRefs, but it doesn't cut it out entirely. > 3. Seems like the document should at least suggest a format for > latitude/longitude coordinates. > > **Yes, certainly, and it's in the works, as Section 24 suggests. Most definitely. The syntax of what are currently plain-text fields (such as latLong and gazRef) is something that we're working out at the moment. We are also considering moving it to a more general "location" field and allowing specification in other coordinate systems (such as UTM). Comments and suggestions are welcome! > 4. Why is PLACE's id attribute required when no other attribute is? > > **That's because the annotation editing/authoring tool being used here > to annotate SpatialML (Callisto) requires that. Actually, that's not quite true. Callisto doesn't require id attributes to be set, but they're still a really good idea anyway. The id attribute is required here because that's the only way that a PLACE can be referenced by a PATH or LINK. Since SpatialML is a language with a fairly flat syntactic structure, we need document-global-unique referents such as this in order to tie anything in together. Further, it's our position that every element SHOULD have an id, since an external language might, say, string multiple PATHs and LINKs together. (You'll note that it's a #REQUIRED attribute on PATH, LINK, and SIGNAL as well as PLACE). Maybe we need to say this a bit stronger in the guideline documents? Regardless, the other attributes are optional because sometimes the only thing that you can say is "This is a place" with no further details. > 6. If gazetteers are identified by prefixes, consider using the XML > namespace mechanism to correlate prefixes with gazetteer URLs. This > technique is used in XML Schema, for example. See note above about URN vs. URL, but I do think that moving to a full an WC3 XML Schema will give us more expressive control than our current DTD allows. > 7. Consider rephrasing sentences having "we" in them (e.g., "we try > to keep the extents as small as possible...") to passive requirements > ("guideline: extents should be kept as small as possible..."). > > **Thanks -- will certainly do that. The first-person plural doesn't > sound very professional. But on the same token, excessive use of the passive voice is tiresome. I'm not sure what the general rule of thumb here is, but I do recall most standard description documents sounding dry and stuffy. :) > 8. I don't understand the distinction between cities, towns, and > villages in CTV. Should I? > > ***CTV is rather like a constrained version of the description > attribute on a PLACE tag. If the text characterizes a place as being > one of these, e.g., "village of Upper Slaughter" or "town of Chipping > Camden" then the guidelines say it should be marked as such (i.e., > CTV="VILLAGE", and CTV="TOWN"). Rather than attempt to further > decompose these fuzzy concepts further in terms of gazetteer features, > instead it may be desirable to record these characterizations 'at face > value' in case the information is useful downstream. But it does seem > somewhat awkward, not sure exactly why. Further, it's kind of a first attempt at a notation of *scale* within gazetteer feature types. Maybe we should move on to a more general "scale" attribute, with defined ranges for each allowable PLACE type? > Minor points: > > 9. Some codes are abbreviated (RGN), some are spelled out > (BODYOFWATER). I take it some were copied, but I guess I would > strive for consistency here. > > **OK. > > 10. More generally, the use of abbreviations counters XML's goal and > virtue of self-documentation. Coded abbreviations like mod="BR" are > already inscrutable to me, and I read the spec all of 5 minutes ago. > Why not spell it out, i.e., modifier="BORDER"? > > **OK. I agree, though I think that the country codes should remain a copy of the ISO 3166 standard. I would also say that the sixteen compass points allowable for different attributes (N, NNE, NE, ENE, etc.) could reasonably stay abbreviated without causing overmuch confusion. Apart from this small subset, I think things should be spelled out. Thank you for your great feedback. -- Justin |
From: Mani, I. <im...@mi...> - 2007-04-19 14:05:06
|
Apologies for the resend, but some folks didn't receive these two messages, for reasons yet to be determined.=20 -----Original Message----- From: Mani, Inderjeet=20 Sent: Wednesday, April 18, 2007 9:11 PM To: 'spa...@li...' Subject: RE: [Spatialml-discussion] some comments Greg, thanks for your insightful comments! Some responses below (see **). -----Original Message----- From: spa...@li... [mailto:spa...@li...] On Behalf Of Greg Jan=E9e Sent: Wednesday, April 18, 2007 3:27 PM To: spa...@li... Subject: [Spatialml-discussion] some comments Generally, this looks good to me. =20 **Good to hear that. Comments below: Content-related comments: 1. A gazetteer reference is made using the gazref attribute, which =20 takes the form prefix:identifier. Have you considered making such =20 references href-style URLs? For then a client could automatically =20 follow a gazref link and retrieve the associated place information. =20 I realize that gazetteer record formats are not yet standardized, nor =20 is the identification of places by URIs, but this is a use case that =20 could help drive such standardization. As SpatialML stands now, a =20 client needs three document-external pieces of information to follow =20 a gazref: the location of the gazetteer being referred to; the =20 protocol for accessing that gazetteer; and the format of that =20 gazetteer's records. (Also, see comment #6 below). **This is an excellent point. XML Schemas should be made use of in the guidelines, to facilitate such an integration.=20 2. PLACEs identify (by surrounding with XML tags) relevant portions =20 of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20 Perhaps, by symmetry, they should? For example, a LINK could =20 surround the relevant preposition ("in") or punctuation (","). **Both PATHs and LINKs are non-text-consuming. SIGNALs are textual indicators motivating a particular type of attribute, e.g., direction, distance (for Paths). The indicators here are usually explicitly provided in the text, e.g., "30 miles", "on top of", etc. Do LINKs merit SIGNALs? From cases like "near", "in", etc., one might think so. However, some indicators may be expressed in certain languages just by punctuation, e.g., "Bedford, MA". Further, we've included equality, which can be expressed anaphorically, without an explicit signal, e.g., "Baltimore .. The city ...". Worth thinking about more. 3. Seems like the document should at least suggest a format for =20 latitude/longitude coordinates. **Yes, certainly, and it's in the works, as Section 24 suggests.=20 4. Why is PLACE's id attribute required when no other attribute is? **That's because the annotation editing/authoring tool being used here to annotate SpatialML (Callisto) requires that.=20 XML comments: 5. XML elements should be defined in a namespace, so that they can be =20 embedded in namespace-aware documents. **OK. 6. If gazetteers are identified by prefixes, consider using the XML =20 namespace mechanism to correlate prefixes with gazetteer URLs. This =20 technique is used in XML Schema, for example. Exposition comments: 7. Consider rephrasing sentences having "we" in them (e.g., "we try =20 to keep the extents as small as possible...") to passive requirements =20 ("guideline: extents should be kept as small as possible...").=20 **Thanks -- will certainly do that. The first-person plural doesn't sound very professional. As it is, the spec reads a little like a project-specific document instead =20 of a community standard. This is also an opportunity to revisit some =20 of those statements: are they really requirements? Or guidelines? =20 Best practices that everybody should follow? MITRE- or project-=20 specific? Etc. **One important requirement of an annotation scheme for natural language is that it should be effective for a human to annotate, in particular, people should be able to annotate documents according to the schem with high inter-annotator reliability (e.g., as measured by an automatic scoring program). One of the benefits of doing this is increased sharing of data and resources, and common evaluation standards. To satisfy this requirement, annotation guidelines have to be more like requirements, stated as rules, allowing for less slack, enforcing obedience, etc. Since we're dealing with natural language, however, it's hard to specify necessary and sufficient conditions for use of a particular annotation. Thus, we call them guidelines, as the NLP community has traditionally done. [To pontificate further -- human judgment always enters into this process, and that's often a good thing. There is not yet a scientific method for developing guidelines, or for training humans to annotate documents, but there are best practices that can be and are usually/often followed. Once the data is annotated, however, there are some basic empirical methods that are used for training machines to produce the annotations.] 8. I don't understand the distinction between cities, towns, and =20 villages in CTV. Should I? ***CTV is rather like a constrained version of the description attribute on a PLACE tag. If the text characterizes a place as being one of these, e.g., "village of Upper Slaughter" or "town of Chipping Camden" then the guidelines say it should be marked as such (i.e., CTV=3D"VILLAGE", and CTV=3D"TOWN"). Rather than attempt to further decompose these fuzzy concepts further in terms of gazetteer features, instead it may be desirable to record these characterizations 'at face value' in case the information is useful downstream. But it does seem somewhat awkward, not sure exactly why. Minor points: 9. Some codes are abbreviated (RGN), some are spelled out =20 (BODYOFWATER). I take it some were copied, but I guess I would =20 strive for consistency here. **OK. 10. More generally, the use of abbreviations counters XML's goal and =20 virtue of self-documentation. Coded abbreviations like mod=3D"BR" are =20 already inscrutable to me, and I read the spec all of 5 minutes ago. =20 Why not spell it out, i.e., modifier=3D"BORDER"? **OK. -Greg **Thanks again for the detailed comments! ----------------------------------------------------------------------- -- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Spatialml-discussion mailing list Spa...@li... https://lists.sourceforge.net/lists/listinfo/spatialml-discussion |
From: Mani, I. <im...@mi...> - 2007-04-19 01:10:53
|
Greg, thanks for your insightful comments! Some responses below (see **). -----Original Message----- From: spa...@li... [mailto:spa...@li...] On Behalf Of Greg Jan=E9e Sent: Wednesday, April 18, 2007 3:27 PM To: spa...@li... Subject: [Spatialml-discussion] some comments Generally, this looks good to me. =20 **Good to hear that. Comments below: Content-related comments: 1. A gazetteer reference is made using the gazref attribute, which =20 takes the form prefix:identifier. Have you considered making such =20 references href-style URLs? For then a client could automatically =20 follow a gazref link and retrieve the associated place information. =20 I realize that gazetteer record formats are not yet standardized, nor =20 is the identification of places by URIs, but this is a use case that =20 could help drive such standardization. As SpatialML stands now, a =20 client needs three document-external pieces of information to follow =20 a gazref: the location of the gazetteer being referred to; the =20 protocol for accessing that gazetteer; and the format of that =20 gazetteer's records. (Also, see comment #6 below). **This is an excellent point. XML Schemas should be made use of in the guidelines, to facilitate such an integration.=20 2. PLACEs identify (by surrounding with XML tags) relevant portions =20 of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20 Perhaps, by symmetry, they should? For example, a LINK could =20 surround the relevant preposition ("in") or punctuation (","). **Both PATHs and LINKs are non-text-consuming. SIGNALs are textual indicators motivating a particular type of attribute, e.g., direction, distance (for Paths). The indicators here are usually explicitly provided in the text, e.g., "30 miles", "on top of", etc. Do LINKs merit SIGNALs? From cases like "near", "in", etc., one might think so. However, some indicators may be expressed in certain languages just by punctuation, e.g., "Bedford, MA". Further, we've included equality, which can be expressed anaphorically, without an explicit signal, e.g., "Baltimore .. The city ...". Worth thinking about more. 3. Seems like the document should at least suggest a format for =20 latitude/longitude coordinates. **Yes, certainly, and it's in the works, as Section 24 suggests.=20 4. Why is PLACE's id attribute required when no other attribute is? **That's because the annotation editing/authoring tool being used here to annotate SpatialML (Callisto) requires that.=20 XML comments: 5. XML elements should be defined in a namespace, so that they can be =20 embedded in namespace-aware documents. **OK. 6. If gazetteers are identified by prefixes, consider using the XML =20 namespace mechanism to correlate prefixes with gazetteer URLs. This =20 technique is used in XML Schema, for example. Exposition comments: 7. Consider rephrasing sentences having "we" in them (e.g., "we try =20 to keep the extents as small as possible...") to passive requirements =20 ("guideline: extents should be kept as small as possible...").=20 **Thanks -- will certainly do that. The first-person plural doesn't sound very professional. As it is, the spec reads a little like a project-specific document instead =20 of a community standard. This is also an opportunity to revisit some =20 of those statements: are they really requirements? Or guidelines? =20 Best practices that everybody should follow? MITRE- or project-=20 specific? Etc. **One important requirement of an annotation scheme for natural language is that it should be effective for a human to annotate, in particular, people should be able to annotate documents according to the schem with high inter-annotator reliability (e.g., as measured by an automatic scoring program). One of the benefits of doing this is increased sharing of data and resources, and common evaluation standards. To satisfy this requirement, annotation guidelines have to be more like requirements, stated as rules, allowing for less slack, enforcing obedience, etc. Since we're dealing with natural language, however, it's hard to specify necessary and sufficient conditions for use of a particular annotation. Thus, we call them guidelines, as the NLP community has traditionally done. [To pontificate further -- human judgment always enters into this process, and that's often a good thing. There is not yet a scientific method for developing guidelines, or for training humans to annotate documents, but there are best practices that can be and are usually/often followed. Once the data is annotated, however, there are some basic empirical methods that are used for training machines to produce the annotations.] 8. I don't understand the distinction between cities, towns, and =20 villages in CTV. Should I? ***CTV is rather like a constrained version of the description attribute on a PLACE tag. If the text characterizes a place as being one of these, e.g., "village of Upper Slaughter" or "town of Chipping Camden" then the guidelines say it should be marked as such (i.e., CTV=3D"VILLAGE", and CTV=3D"TOWN"). Rather than attempt to further decompose these fuzzy concepts further in terms of gazetteer features, instead it may be desirable to record these characterizations 'at face value' in case the information is useful downstream. But it does seem somewhat awkward, not sure exactly why. Minor points: 9. Some codes are abbreviated (RGN), some are spelled out =20 (BODYOFWATER). I take it some were copied, but I guess I would =20 strive for consistency here. **OK. 10. More generally, the use of abbreviations counters XML's goal and =20 virtue of self-documentation. Coded abbreviations like mod=3D"BR" are =20 already inscrutable to me, and I read the spec all of 5 minutes ago. =20 Why not spell it out, i.e., modifier=3D"BORDER"? **OK. -Greg **Thanks again for the detailed comments! ----------------------------------------------------------------------- -- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Spatialml-discussion mailing list Spa...@li... https://lists.sourceforge.net/lists/listinfo/spatialml-discussion |
From: <gj...@al...> - 2007-04-18 19:28:19
|
Generally, this looks good to me. Comments below: Content-related comments: 1. A gazetteer reference is made using the gazref attribute, which takes the form prefix:identifier. Have you considered making such references href-style URLs? For then a client could automatically follow a gazref link and retrieve the associated place information. I realize that gazetteer record formats are not yet standardized, nor is the identification of places by URIs, but this is a use case that could help drive such standardization. As SpatialML stands now, a client needs three document-external pieces of information to follow a gazref: the location of the gazetteer being referred to; the protocol for accessing that gazetteer; and the format of that gazetteer's records. (Also, see comment #6 below). 2. PLACEs identify (by surrounding with XML tags) relevant portions of the document. PATHs do, too, via SIGNALs. But LINKs don't. Perhaps, by symmetry, they should? For example, a LINK could surround the relevant preposition ("in") or punctuation (","). 3. Seems like the document should at least suggest a format for latitude/longitude coordinates. 4. Why is PLACE's id attribute required when no other attribute is? XML comments: 5. XML elements should be defined in a namespace, so that they can be embedded in namespace-aware documents. 6. If gazetteers are identified by prefixes, consider using the XML namespace mechanism to correlate prefixes with gazetteer URLs. This technique is used in XML Schema, for example. Exposition comments: 7. Consider rephrasing sentences having "we" in them (e.g., "we try to keep the extents as small as possible...") to passive requirements ("guideline: extents should be kept as small as possible..."). As it is, the spec reads a little like a project-specific document instead of a community standard. This is also an opportunity to revisit some of those statements: are they really requirements? Or guidelines? Best practices that everybody should follow? MITRE- or project- specific? Etc. 8. I don't understand the distinction between cities, towns, and villages in CTV. Should I? Minor points: 9. Some codes are abbreviated (RGN), some are spelled out (BODYOFWATER). I take it some were copied, but I guess I would strive for consistency here. 10. More generally, the use of abbreviations counters XML's goal and virtue of self-documentation. Coded abbreviations like mod="BR" are already inscrutable to me, and I read the spec all of 5 minutes ago. Why not spell it out, i.e., modifier="BORDER"? -Greg |
From: Mani, I. <im...@mi...> - 2007-04-16 12:55:44
|
>From im...@mi... -- ignore. |
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:50:03
|
from inderjeet at his gmail account. |
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:47:55
|
>From im...@mi.... |
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:23:50
|
This is just a test. |
From: Discussion of S. G. <spa...@li...> - 2007-04-16 00:26:18
|
Initiating the discussion... |
From: Discussion of S. G. <spa...@li...> - 2007-04-15 23:26:31
|
Welcome to the SpatialML discussion list! SpatialML is a markup language for representing spatial expressions in natural language documents. The goal is to allow for potentially better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases, mapping services, etc. A draft version of the first set of SpatialML Guidelines (version 1.0) are available at: *http://sourceforge.net/projects/spatialml*<http://sourceforge.net/projects/spatialml> We expect that subsequent releases will incorporate feedback from many others in the research and development community. As the guidelines mature, we will be providing additional links for resources related to SpatialML, including annotation editors, annotated data, and automatic taggers. Best wishes, Inderjeet. |