#330 Introduce standard attributes to refer to ISOCat

GREEN
closed-accepted
None
5
2014-08-20
2011-11-03
No

ISO 12620:2009 is a standard describing the data model and procedures for a Data Category Registry (DCR). Data categories are defined as elementary descriptors in a linguistic structure. In the DCR data model each data category gets assigned a unique Peristent IDentifier (ID), i.e., an URI. Linguistic resources or preferably their schemas that make use of data categories from a DCR should refer to them using this PID. For XML-based resources, like TEI documents, ISO 12620:2009 normative Annex A gives a small Data Category Reference XML vocabulary (also available online at http://www.isocat.org/12620/\) which provides two attributes dcr:datcat anddcr:valueDatcat. The following TEI example illustrates its usage in a TEI feature (structure):

<tei:TEI xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:dcr="http://www.isocat.org/ns/dcr">
...
<tei:fs>
...
<tei:f
name="part of speech"
dcr:datcat="http://www.isocat.org/datcat/DC-1345"
fVal="common noun"
dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"
/>
...
</tei:fs>
...
</tei:TEI>

In this example @dcr:datcat relates the feature name to a /partOfSpeech/ data category and @dcr:valueDatcat the feature value to a /commonNoun/ data category. Both these data categories reside in the ISOcat DCR atwww.isocat.org, which is the DCR is use by ISO TC37 and hosted by its registration authority the MPI for Psycholinguistics. The given example results currently in an invalid TEI document, and the proposal is to remedy that by adding the dcr:datcat and dcr:valueDatcat attributes to the TEI global attribute list. This would allow referring to the used data categories from any place in a TEI document.

Discussion

  • Piotr Banski

    Piotr Banski - 2011-11-03

    Yes, please... That would increase the TEI's attractiveness to language-resource peeps.

    And it opens the way to nicely handle tagUsage, too, at least with respect to dcr:datcat (I'm thinking of my TEI-MM-2010 presentation on that, and the discussion that we had afterwards).

     
  • stuart yeates

    stuart yeates - 2011-11-04

    I've been struggling to find a good introduction ISO 12620:2009 / Data Category Registry.

    If there's no decent overview I may end up writing a wikipedia page for this, like I did for genetic editing, but I'd prefer not to, since it's harder than it looks.

     
  • Piotr Banski

    Piotr Banski - 2011-11-07
    • assigned_to: nobody --> bansp
     
  • Piotr Banski

    Piotr Banski - 2011-11-07
    • status: open --> open-accepted
     
  • Laurent Romary

    Laurent Romary - 2011-11-08
    • milestone: --> 871209
     
  • Piotr Banski

    Piotr Banski - 2011-11-09
    • status: open-accepted --> open
     
  • Lou Burnard

    Lou Burnard - 2011-11-09
    • status: open --> pending
     
  • Lou Burnard

    Lou Burnard - 2011-11-09

    Proposal is to define new attribute class att.datcat which will provide dcr attributes. Initial members will be <gram>, and members of model.gramPart

     
  • Lou Burnard

    Lou Burnard - 2012-03-26

    I'm changing status of this ticket so that it doesn't disappear. Piotr, please implement, or re-assign ticket to someone else (e.g. me) for implementation!

     
  • Lou Burnard

    Lou Burnard - 2012-03-26
    • milestone: 871209 --> GREEN
    • status: pending --> open
     
  • Piotr Banski

    Piotr Banski - 2012-03-26

    I'll do my best to implement it this week. Sorry about the horrible delay.

     
  • James Cummings

    James Cummings - 2012-04-13

    Setting 'Resolution' to be 'accepted' since consensus is to implement this.
    -James

     
  • James Cummings

    James Cummings - 2012-04-13
    • status: open --> open-accepted
     
  • Piotr Banski

    Piotr Banski - 2012-04-15

    Thanks, James. "This week" got somewhat extended here, but I hope to implement this tomorrow after I join the hack session (I'd prefer to make sure I'm not a pain in Mr. Jenkins's build procass).

     
  • Piotr Banski

    Piotr Banski - 2012-04-16

    The prototype, thankfully, got corrected by Sebastian, but what's still missing is placing this in the text of the Guidelines. Historically, the DI chapter might be its primary house, but that actually feels a bit random -- it probably needs to be mentioned at least in the chapter on corpora, and also the header, analysis, feature structures and ODD chapters qualify. Doesn't that suggest a separate section somewhere, with references from the other places (to keep the documentation manageable)?

     
  • Piotr Banski

    Piotr Banski - 2012-04-22

    At this point, the membership in this class is as follows: att.lexicographic and att.segLike as well as the FSR elements: fs, f, binary, numeric, string, symbol.

     
  • Piotr Banski

    Piotr Banski - 2012-04-25

    (Sigh,) and the <equiv> element, which would be the natural locus of DCR references in the schema, already uses @uri; I will add it to att.datcat as well, because it seems a much better choice than to add several tagdocs elements there instead. Discussion, as always, is welcome.

     
  • Piotr Banski

    Piotr Banski - 2012-04-25

    An equivalent of all these additions (that originate from the attempt to keep it minimal, I guess in the spirit of Lou's remark of 2011-11-09) would be to follow Laurent's suggestion of making att.global a member of att.datcat and thus making datcat attributes available everywhere. I sense, however, that that might be consider too bold a move, eh?

     
  • Piotr Banski

    Piotr Banski - 2012-06-17

    * equiv is not a member of this class, thanks to an enlightening discussion that we had on the Council list some time ago
    * Laurent's suggestion for a global solution is not implemented, the datcat class currently ranges over att.lexicographic, att.segLike, f, fs, binary, numeric, string, and symbol.
    * chapters 9 (DI) and 15 (FS) have some new prose and examples (finished with rev. 10529).

    While I have a feeling that this is not the end of the datcat story, I believe that this ticket may be closed.

     
  • Lou Burnard

    Lou Burnard - 2012-09-12
    • status: open-accepted --> closed-accepted
     
  • Lou Burnard

    Lou Burnard - 2012-09-12

    Closing this as it seems to have been completed.

     
  • Lou Burnard

    Lou Burnard - 2012-09-12

    Closing this as it seems to have been completed.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks