Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#330 Introduce standard attributes to refer to ISOCat

GREEN
closed-accepted
Piotr Banski
None
5
2014-08-20
2011-11-03
Laurent Romary
No

ISO 12620:2009 is a standard describing the data model and procedures for a Data Category Registry (DCR). Data categories are defined as elementary descriptors in a linguistic structure. In the DCR data model each data category gets assigned a unique Peristent IDentifier (ID), i.e., an URI. Linguistic resources or preferably their schemas that make use of data categories from a DCR should refer to them using this PID. For XML-based resources, like TEI documents, ISO 12620:2009 normative Annex A gives a small Data Category Reference XML vocabulary (also available online at http://www.isocat.org/12620/\) which provides two attributes dcr:datcat anddcr:valueDatcat. The following TEI example illustrates its usage in a TEI feature (structure):

<tei:TEI xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:dcr="http://www.isocat.org/ns/dcr">
...
<tei:fs>
...
<tei:f
name="part of speech"
dcr:datcat="http://www.isocat.org/datcat/DC-1345"
fVal="common noun"
dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"
/>
...
</tei:fs>
...
</tei:TEI>

In this example @dcr:datcat relates the feature name to a /partOfSpeech/ data category and @dcr:valueDatcat the feature value to a /commonNoun/ data category. Both these data categories reside in the ISOcat DCR atwww.isocat.org, which is the DCR is use by ISO TC37 and hosted by its registration authority the MPI for Psycholinguistics. The given example results currently in an invalid TEI document, and the proposal is to remedy that by adding the dcr:datcat and dcr:valueDatcat attributes to the TEI global attribute list. This would allow referring to the used data categories from any place in a TEI document.

Discussion

  • Piotr Banski
    Piotr Banski
    2011-11-03

    Yes, please... That would increase the TEI's attractiveness to language-resource peeps.

    And it opens the way to nicely handle tagUsage, too, at least with respect to dcr:datcat (I'm thinking of my TEI-MM-2010 presentation on that, and the discussion that we had afterwards).

     
  • stuart yeates
    stuart yeates
    2011-11-04

    I've been struggling to find a good introduction ISO 12620:2009 / Data Category Registry.

    If there's no decent overview I may end up writing a wikipedia page for this, like I did for genetic editing, but I'd prefer not to, since it's harder than it looks.

     
  • Piotr Banski
    Piotr Banski
    2011-11-07

    • assigned_to: nobody --> bansp
     
  • Piotr Banski
    Piotr Banski
    2011-11-07

    • status: open --> open-accepted
     
  • Laurent Romary
    Laurent Romary
    2011-11-08

    • milestone: --> 871209
     
  • Piotr Banski
    Piotr Banski
    2011-11-09

    • status: open-accepted --> open
     
  • Lou Burnard
    Lou Burnard
    2011-11-09

    • status: open --> pending
     
  • Lou Burnard
    Lou Burnard
    2011-11-09

    Proposal is to define new attribute class att.datcat which will provide dcr attributes. Initial members will be <gram>, and members of model.gramPart

     
  • Lou Burnard
    Lou Burnard
    2012-03-26

    I'm changing status of this ticket so that it doesn't disappear. Piotr, please implement, or re-assign ticket to someone else (e.g. me) for implementation!

     
  • Lou Burnard
    Lou Burnard
    2012-03-26

    • milestone: 871209 --> GREEN
    • status: pending --> open
     
  • Piotr Banski
    Piotr Banski
    2012-03-26

    I'll do my best to implement it this week. Sorry about the horrible delay.

     
  • James Cummings
    James Cummings
    2012-04-13

    Setting 'Resolution' to be 'accepted' since consensus is to implement this.
    -James

     
  • James Cummings
    James Cummings
    2012-04-13

    • status: open --> open-accepted
     
  • Piotr Banski
    Piotr Banski
    2012-04-15

    Thanks, James. "This week" got somewhat extended here, but I hope to implement this tomorrow after I join the hack session (I'd prefer to make sure I'm not a pain in Mr. Jenkins's build procass).

     
  • Piotr Banski
    Piotr Banski
    2012-04-16

    The prototype, thankfully, got corrected by Sebastian, but what's still missing is placing this in the text of the Guidelines. Historically, the DI chapter might be its primary house, but that actually feels a bit random -- it probably needs to be mentioned at least in the chapter on corpora, and also the header, analysis, feature structures and ODD chapters qualify. Doesn't that suggest a separate section somewhere, with references from the other places (to keep the documentation manageable)?

     
  • Piotr Banski
    Piotr Banski
    2012-04-22

    At this point, the membership in this class is as follows: att.lexicographic and att.segLike as well as the FSR elements: fs, f, binary, numeric, string, symbol.

     
  • Piotr Banski
    Piotr Banski
    2012-04-25

    (Sigh,) and the <equiv> element, which would be the natural locus of DCR references in the schema, already uses @uri; I will add it to att.datcat as well, because it seems a much better choice than to add several tagdocs elements there instead. Discussion, as always, is welcome.

     
  • Piotr Banski
    Piotr Banski
    2012-04-25

    An equivalent of all these additions (that originate from the attempt to keep it minimal, I guess in the spirit of Lou's remark of 2011-11-09) would be to follow Laurent's suggestion of making att.global a member of att.datcat and thus making datcat attributes available everywhere. I sense, however, that that might be consider too bold a move, eh?

     
  • Piotr Banski
    Piotr Banski
    2012-06-17

    * equiv is not a member of this class, thanks to an enlightening discussion that we had on the Council list some time ago
    * Laurent's suggestion for a global solution is not implemented, the datcat class currently ranges over att.lexicographic, att.segLike, f, fs, binary, numeric, string, and symbol.
    * chapters 9 (DI) and 15 (FS) have some new prose and examples (finished with rev. 10529).

    While I have a feeling that this is not the end of the datcat story, I believe that this ticket may be closed.

     
  • Lou Burnard
    Lou Burnard
    2012-09-12

    • status: open-accepted --> closed-accepted
     
  • Lou Burnard
    Lou Burnard
    2012-09-12

    Closing this as it seems to have been completed.

     
  • Lou Burnard
    Lou Burnard
    2012-09-12

    Closing this as it seems to have been completed.