You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(3) |
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
(2) |
Dec
(1) |
| 2009 |
Jan
(4) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: <gj...@al...> - 2007-04-19 17:13:25
|
Justin Richer wrote: > ...it's our position that every element SHOULD have an id, since an > external language might, say, string multiple PATHs and LINKs > together. > [...] > Maybe we need to say this a bit stronger in the guideline documents? I would argue just the opposite: while ids might be necessary in some cases, and might be desirable, the fact that they're not always necessary means they shouldn't be a requirement. It's just an application of the general principle of minimizing requirements. It's easy to imagine scenarios in which placenames might be marked up without using LINKs or other structures requiring ids, and hence the additional requirement would represent an unnecessary burden (and it is quite a burden, given the uniqueness requirement of ids). -Greg |
|
From: Hitzeman, J. M. <hi...@mi...> - 2007-04-19 15:55:51
|
In annotating using SpatialML, I have found there to be a natural
difference between LINKs and PATHs. LINKs are used to pinpoint a PLACE
while PATHs are used to give directions on how to get to a PLACE. The
signals for a LINK, when present, are never more complicated than the
possible linkTypes. [Boston, MA] is the same as [Boston in MA] and the
relationship is clearly indicated by IN. Similarly, [Boston (42.358=B0N
71.060=B0W)] indicates an EQ relationship between Boston and its =
latlong;
The signals, if any, are the two parens. In contrast, the PATH
indicates a relationship between two distinct places, two pins on a
map. The signals have more semantic content than the finite list of
linkTypes. If I tell you that I live in the house with the blue door
across from the Commonwealth Pool, I've LINKed my house with the house
with the blue door (one pin) and I've given you a PATH between the Pool
(pin#1) and the house (pin#2) so that if you can find the pool you have
instructions on how to find my house.
PATHs are particularly useful if you're looking for a PLACE that
doesn't have coordinates in any gazetteer, e.g., [We put down anchor
five miles off of the port of Leith.] If you can find the latlong for
Leith, you can guess at the latlong for our location. A LINK gives a
much greater search space, e.g., [We put down anchor in the Waters of
Leith.]
Janet
-----Original Message-----
From: spa...@li...
[mailto:spa...@li...] On Behalf
Of Greg Jan=E9e
Sent: Wednesday, April 18, 2007 3:27 PM
To: spa...@li...
Subject: [Spatialml-discussion] some comments
2. PLACEs identify (by surrounding with XML tags) relevant portions =20
of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20
Perhaps, by symmetry, they should? For example, a LINK could surround
the relevant preposition ("in") or punctuation (",").
[snip]
|
|
From: Justin R. <jr...@mi...> - 2007-04-19 15:23:24
|
A few comments and clarifications that I wanted to make here as well. > 1. A gazetteer reference is made using the gazref attribute, which > takes the form prefix:identifier. Have you considered making such > references href-style URLs? For then a client could automatically > follow a gazref link and retrieve the associated place information. > I realize that gazetteer record formats are not yet standardized, nor > is the identification of places by URIs, but this is a use case that > could help drive such standardization. As SpatialML stands now, a > client needs three document-external pieces of information to follow > a gazref: the location of the gazetteer being referred to; the > protocol for accessing that gazetteer; and the format of that > gazetteer's records. (Also, see comment #6 below). > > **This is an excellent point. XML Schemas should be made use of in the > guidelines, to facilitate such an > integration. We basically are using URNs here, noting both where to get the info and what to ask for, though it makes no mention of how to get there, how to ask, or what the answer will look like. Maybe we could allow for full URLs by further specifying some type of URI scheme, such as "gaz:igdb:12345" in addition to allowing "http://igdb.foo/getrecord?r=12345" types. I'm of the opinion that requiring a URL fronted interface shouldn't exactly be a requirement. Using a URN allows for more flexibility in defining the accessor methods of a particular gazetteer. I do agree that it's something to be aspired to, but I don't think it should be a requirement. But even if we were to require a URL accessor in front of each gazetteer, doing so would still say nothing about the format of the returned result. It does cut down on the requirement of external information to make full use of gazRefs, but it doesn't cut it out entirely. > 3. Seems like the document should at least suggest a format for > latitude/longitude coordinates. > > **Yes, certainly, and it's in the works, as Section 24 suggests. Most definitely. The syntax of what are currently plain-text fields (such as latLong and gazRef) is something that we're working out at the moment. We are also considering moving it to a more general "location" field and allowing specification in other coordinate systems (such as UTM). Comments and suggestions are welcome! > 4. Why is PLACE's id attribute required when no other attribute is? > > **That's because the annotation editing/authoring tool being used here > to annotate SpatialML (Callisto) requires that. Actually, that's not quite true. Callisto doesn't require id attributes to be set, but they're still a really good idea anyway. The id attribute is required here because that's the only way that a PLACE can be referenced by a PATH or LINK. Since SpatialML is a language with a fairly flat syntactic structure, we need document-global-unique referents such as this in order to tie anything in together. Further, it's our position that every element SHOULD have an id, since an external language might, say, string multiple PATHs and LINKs together. (You'll note that it's a #REQUIRED attribute on PATH, LINK, and SIGNAL as well as PLACE). Maybe we need to say this a bit stronger in the guideline documents? Regardless, the other attributes are optional because sometimes the only thing that you can say is "This is a place" with no further details. > 6. If gazetteers are identified by prefixes, consider using the XML > namespace mechanism to correlate prefixes with gazetteer URLs. This > technique is used in XML Schema, for example. See note above about URN vs. URL, but I do think that moving to a full an WC3 XML Schema will give us more expressive control than our current DTD allows. > 7. Consider rephrasing sentences having "we" in them (e.g., "we try > to keep the extents as small as possible...") to passive requirements > ("guideline: extents should be kept as small as possible..."). > > **Thanks -- will certainly do that. The first-person plural doesn't > sound very professional. But on the same token, excessive use of the passive voice is tiresome. I'm not sure what the general rule of thumb here is, but I do recall most standard description documents sounding dry and stuffy. :) > 8. I don't understand the distinction between cities, towns, and > villages in CTV. Should I? > > ***CTV is rather like a constrained version of the description > attribute on a PLACE tag. If the text characterizes a place as being > one of these, e.g., "village of Upper Slaughter" or "town of Chipping > Camden" then the guidelines say it should be marked as such (i.e., > CTV="VILLAGE", and CTV="TOWN"). Rather than attempt to further > decompose these fuzzy concepts further in terms of gazetteer features, > instead it may be desirable to record these characterizations 'at face > value' in case the information is useful downstream. But it does seem > somewhat awkward, not sure exactly why. Further, it's kind of a first attempt at a notation of *scale* within gazetteer feature types. Maybe we should move on to a more general "scale" attribute, with defined ranges for each allowable PLACE type? > Minor points: > > 9. Some codes are abbreviated (RGN), some are spelled out > (BODYOFWATER). I take it some were copied, but I guess I would > strive for consistency here. > > **OK. > > 10. More generally, the use of abbreviations counters XML's goal and > virtue of self-documentation. Coded abbreviations like mod="BR" are > already inscrutable to me, and I read the spec all of 5 minutes ago. > Why not spell it out, i.e., modifier="BORDER"? > > **OK. I agree, though I think that the country codes should remain a copy of the ISO 3166 standard. I would also say that the sixteen compass points allowable for different attributes (N, NNE, NE, ENE, etc.) could reasonably stay abbreviated without causing overmuch confusion. Apart from this small subset, I think things should be spelled out. Thank you for your great feedback. -- Justin |
|
From: Mani, I. <im...@mi...> - 2007-04-19 14:05:06
|
Apologies for the resend, but some folks didn't receive these two
messages, for reasons yet to be determined.=20
-----Original Message-----
From: Mani, Inderjeet=20
Sent: Wednesday, April 18, 2007 9:11 PM
To: 'spa...@li...'
Subject: RE: [Spatialml-discussion] some comments
Greg, thanks for your insightful comments! Some responses below (see
**).
-----Original Message-----
From: spa...@li...
[mailto:spa...@li...] On Behalf
Of Greg Jan=E9e
Sent: Wednesday, April 18, 2007 3:27 PM
To: spa...@li...
Subject: [Spatialml-discussion] some comments
Generally, this looks good to me. =20
**Good to hear that.
Comments below:
Content-related comments:
1. A gazetteer reference is made using the gazref attribute, which =20
takes the form prefix:identifier. Have you considered making such =20
references href-style URLs? For then a client could automatically =20
follow a gazref link and retrieve the associated place information. =20
I realize that gazetteer record formats are not yet standardized, nor =20
is the identification of places by URIs, but this is a use case that =20
could help drive such standardization. As SpatialML stands now, a =20
client needs three document-external pieces of information to follow =20
a gazref: the location of the gazetteer being referred to; the =20
protocol for accessing that gazetteer; and the format of that =20
gazetteer's records. (Also, see comment #6 below).
**This is an excellent point. XML Schemas should be made use of in the
guidelines, to facilitate such an
integration.=20
2. PLACEs identify (by surrounding with XML tags) relevant portions =20
of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20
Perhaps, by symmetry, they should? For example, a LINK could =20
surround the relevant preposition ("in") or punctuation (",").
**Both PATHs and LINKs are non-text-consuming. SIGNALs are textual
indicators motivating a particular
type of attribute, e.g., direction, distance (for Paths). The
indicators here are usually explicitly provided in the text, e.g., "30
miles", "on top of", etc. Do LINKs merit SIGNALs? From cases like
"near", "in", etc., one might think so. However, some indicators may be
expressed in certain languages just by punctuation, e.g., "Bedford,
MA". Further, we've included equality, which can be expressed
anaphorically, without an explicit signal, e.g., "Baltimore .. The city
...". Worth thinking about more.
3. Seems like the document should at least suggest a format for =20
latitude/longitude coordinates.
**Yes, certainly, and it's in the works, as Section 24 suggests.=20
4. Why is PLACE's id attribute required when no other attribute is?
**That's because the annotation editing/authoring tool being used here
to annotate SpatialML (Callisto) requires that.=20
XML comments:
5. XML elements should be defined in a namespace, so that they can be =20
embedded in namespace-aware documents.
**OK.
6. If gazetteers are identified by prefixes, consider using the XML =20
namespace mechanism to correlate prefixes with gazetteer URLs. This =20
technique is used in XML Schema, for example.
Exposition comments:
7. Consider rephrasing sentences having "we" in them (e.g., "we try =20
to keep the extents as small as possible...") to passive requirements =20
("guideline: extents should be kept as small as possible...").=20
**Thanks -- will certainly do that. The first-person plural doesn't
sound very professional.
As it is, the spec reads a little like a project-specific document
instead =20
of a community standard. This is also an opportunity to revisit some =20
of those statements: are they really requirements? Or guidelines? =20
Best practices that everybody should follow? MITRE- or project-=20
specific? Etc.
**One important requirement of an annotation scheme for natural
language is that it should be effective for a human to annotate, in
particular, people should be able to annotate documents according to
the schem with high inter-annotator reliability (e.g., as measured by
an automatic scoring program). One of the benefits of doing this is
increased sharing of data and resources, and common evaluation
standards. To satisfy this requirement, annotation guidelines have to
be more like requirements, stated as rules, allowing for less slack,
enforcing obedience, etc. Since we're dealing with natural language,
however, it's hard to specify necessary and sufficient conditions for
use of a particular annotation. Thus, we call them guidelines, as the
NLP community has traditionally done. [To pontificate further -- human
judgment always enters into this process, and that's often a good
thing. There is not yet a scientific method for developing guidelines,
or for training humans to annotate documents, but there are best
practices that can be and are usually/often followed. Once the data is
annotated, however, there are some basic empirical methods that are
used for training machines to produce the annotations.]
8. I don't understand the distinction between cities, towns, and =20
villages in CTV. Should I?
***CTV is rather like a constrained version of the description
attribute on a PLACE tag. If the text characterizes a place as being
one of these, e.g., "village of Upper Slaughter" or "town of Chipping
Camden" then the guidelines say it should be marked as such (i.e.,
CTV=3D"VILLAGE", and CTV=3D"TOWN"). Rather than attempt to further
decompose these fuzzy concepts further in terms of gazetteer features,
instead it may be desirable to record these characterizations 'at face
value' in case the information is useful downstream. But it does seem
somewhat awkward, not sure exactly why.
Minor points:
9. Some codes are abbreviated (RGN), some are spelled out =20
(BODYOFWATER). I take it some were copied, but I guess I would =20
strive for consistency here.
**OK.
10. More generally, the use of abbreviations counters XML's goal and =20
virtue of self-documentation. Coded abbreviations like mod=3D"BR" are =20
already inscrutable to me, and I read the spec all of 5 minutes ago. =20
Why not spell it out, i.e., modifier=3D"BORDER"?
**OK.
-Greg
**Thanks again for the detailed comments!
-----------------------------------------------------------------------
--
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Spatialml-discussion mailing list
Spa...@li...
https://lists.sourceforge.net/lists/listinfo/spatialml-discussion
|
|
From: Mani, I. <im...@mi...> - 2007-04-19 01:10:53
|
Greg, thanks for your insightful comments! Some responses below (see
**).
-----Original Message-----
From: spa...@li...
[mailto:spa...@li...] On Behalf
Of Greg Jan=E9e
Sent: Wednesday, April 18, 2007 3:27 PM
To: spa...@li...
Subject: [Spatialml-discussion] some comments
Generally, this looks good to me. =20
**Good to hear that.
Comments below:
Content-related comments:
1. A gazetteer reference is made using the gazref attribute, which =20
takes the form prefix:identifier. Have you considered making such =20
references href-style URLs? For then a client could automatically =20
follow a gazref link and retrieve the associated place information. =20
I realize that gazetteer record formats are not yet standardized, nor =20
is the identification of places by URIs, but this is a use case that =20
could help drive such standardization. As SpatialML stands now, a =20
client needs three document-external pieces of information to follow =20
a gazref: the location of the gazetteer being referred to; the =20
protocol for accessing that gazetteer; and the format of that =20
gazetteer's records. (Also, see comment #6 below).
**This is an excellent point. XML Schemas should be made use of in the
guidelines, to facilitate such an
integration.=20
2. PLACEs identify (by surrounding with XML tags) relevant portions =20
of the document. PATHs do, too, via SIGNALs. But LINKs don't. =20
Perhaps, by symmetry, they should? For example, a LINK could =20
surround the relevant preposition ("in") or punctuation (",").
**Both PATHs and LINKs are non-text-consuming. SIGNALs are textual
indicators motivating a particular
type of attribute, e.g., direction, distance (for Paths). The
indicators here are usually explicitly provided in the text, e.g., "30
miles", "on top of", etc. Do LINKs merit SIGNALs? From cases like
"near", "in", etc., one might think so. However, some indicators may be
expressed in certain languages just by punctuation, e.g., "Bedford,
MA". Further, we've included equality, which can be expressed
anaphorically, without an explicit signal, e.g., "Baltimore .. The city
...". Worth thinking about more.
3. Seems like the document should at least suggest a format for =20
latitude/longitude coordinates.
**Yes, certainly, and it's in the works, as Section 24 suggests.=20
4. Why is PLACE's id attribute required when no other attribute is?
**That's because the annotation editing/authoring tool being used here
to annotate SpatialML (Callisto) requires that.=20
XML comments:
5. XML elements should be defined in a namespace, so that they can be =20
embedded in namespace-aware documents.
**OK.
6. If gazetteers are identified by prefixes, consider using the XML =20
namespace mechanism to correlate prefixes with gazetteer URLs. This =20
technique is used in XML Schema, for example.
Exposition comments:
7. Consider rephrasing sentences having "we" in them (e.g., "we try =20
to keep the extents as small as possible...") to passive requirements =20
("guideline: extents should be kept as small as possible...").=20
**Thanks -- will certainly do that. The first-person plural doesn't
sound very professional.
As it is, the spec reads a little like a project-specific document
instead =20
of a community standard. This is also an opportunity to revisit some =20
of those statements: are they really requirements? Or guidelines? =20
Best practices that everybody should follow? MITRE- or project-=20
specific? Etc.
**One important requirement of an annotation scheme for natural
language is that it should be effective for a human to annotate, in
particular, people should be able to annotate documents according to
the schem with high inter-annotator reliability (e.g., as measured by
an automatic scoring program). One of the benefits of doing this is
increased sharing of data and resources, and common evaluation
standards. To satisfy this requirement, annotation guidelines have to
be more like requirements, stated as rules, allowing for less slack,
enforcing obedience, etc. Since we're dealing with natural language,
however, it's hard to specify necessary and sufficient conditions for
use of a particular annotation. Thus, we call them guidelines, as the
NLP community has traditionally done. [To pontificate further -- human
judgment always enters into this process, and that's often a good
thing. There is not yet a scientific method for developing guidelines,
or for training humans to annotate documents, but there are best
practices that can be and are usually/often followed. Once the data is
annotated, however, there are some basic empirical methods that are
used for training machines to produce the annotations.]
8. I don't understand the distinction between cities, towns, and =20
villages in CTV. Should I?
***CTV is rather like a constrained version of the description
attribute on a PLACE tag. If the text characterizes a place as being
one of these, e.g., "village of Upper Slaughter" or "town of Chipping
Camden" then the guidelines say it should be marked as such (i.e.,
CTV=3D"VILLAGE", and CTV=3D"TOWN"). Rather than attempt to further
decompose these fuzzy concepts further in terms of gazetteer features,
instead it may be desirable to record these characterizations 'at face
value' in case the information is useful downstream. But it does seem
somewhat awkward, not sure exactly why.
Minor points:
9. Some codes are abbreviated (RGN), some are spelled out =20
(BODYOFWATER). I take it some were copied, but I guess I would =20
strive for consistency here.
**OK.
10. More generally, the use of abbreviations counters XML's goal and =20
virtue of self-documentation. Coded abbreviations like mod=3D"BR" are =20
already inscrutable to me, and I read the spec all of 5 minutes ago. =20
Why not spell it out, i.e., modifier=3D"BORDER"?
**OK.
-Greg
**Thanks again for the detailed comments!
-----------------------------------------------------------------------
--
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Spatialml-discussion mailing list
Spa...@li...
https://lists.sourceforge.net/lists/listinfo/spatialml-discussion
|
|
From: <gj...@al...> - 2007-04-18 19:28:19
|
Generally, this looks good to me. Comments below:
Content-related comments:
1. A gazetteer reference is made using the gazref attribute, which
takes the form prefix:identifier. Have you considered making such
references href-style URLs? For then a client could automatically
follow a gazref link and retrieve the associated place information.
I realize that gazetteer record formats are not yet standardized, nor
is the identification of places by URIs, but this is a use case that
could help drive such standardization. As SpatialML stands now, a
client needs three document-external pieces of information to follow
a gazref: the location of the gazetteer being referred to; the
protocol for accessing that gazetteer; and the format of that
gazetteer's records. (Also, see comment #6 below).
2. PLACEs identify (by surrounding with XML tags) relevant portions
of the document. PATHs do, too, via SIGNALs. But LINKs don't.
Perhaps, by symmetry, they should? For example, a LINK could
surround the relevant preposition ("in") or punctuation (",").
3. Seems like the document should at least suggest a format for
latitude/longitude coordinates.
4. Why is PLACE's id attribute required when no other attribute is?
XML comments:
5. XML elements should be defined in a namespace, so that they can be
embedded in namespace-aware documents.
6. If gazetteers are identified by prefixes, consider using the XML
namespace mechanism to correlate prefixes with gazetteer URLs. This
technique is used in XML Schema, for example.
Exposition comments:
7. Consider rephrasing sentences having "we" in them (e.g., "we try
to keep the extents as small as possible...") to passive requirements
("guideline: extents should be kept as small as possible..."). As it
is, the spec reads a little like a project-specific document instead
of a community standard. This is also an opportunity to revisit some
of those statements: are they really requirements? Or guidelines?
Best practices that everybody should follow? MITRE- or project-
specific? Etc.
8. I don't understand the distinction between cities, towns, and
villages in CTV. Should I?
Minor points:
9. Some codes are abbreviated (RGN), some are spelled out
(BODYOFWATER). I take it some were copied, but I guess I would
strive for consistency here.
10. More generally, the use of abbreviations counters XML's goal and
virtue of self-documentation. Coded abbreviations like mod="BR" are
already inscrutable to me, and I read the spec all of 5 minutes ago.
Why not spell it out, i.e., modifier="BORDER"?
-Greg
|
|
From: Mani, I. <im...@mi...> - 2007-04-16 12:55:44
|
>From im...@mi... -- ignore. |
|
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:50:03
|
from inderjeet at his gmail account. |
|
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:47:55
|
>From im...@mi.... |
|
From: Discussion of S. G. <spa...@li...> - 2007-04-16 12:23:50
|
This is just a test. |
|
From: Discussion of S. G. <spa...@li...> - 2007-04-16 00:26:18
|
Initiating the discussion... |
|
From: Discussion of S. G. <spa...@li...> - 2007-04-15 23:26:31
|
Welcome to the SpatialML discussion list! SpatialML is a markup language for representing spatial expressions in natural language documents. The goal is to allow for potentially better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases, mapping services, etc. A draft version of the first set of SpatialML Guidelines (version 1.0) are available at: *http://sourceforge.net/projects/spatialml*<http://sourceforge.net/projects/spatialml> We expect that subsequent releases will incorporate feedback from many others in the research and development community. As the guidelines mature, we will be providing additional links for resources related to SpatialML, including annotation editors, annotated data, and automatic taggers. Best wishes, Inderjeet. |