Menu

#496 `<taxonomy>` should be allowed as a child of `<category>`

AMBER
open
None
5(default)
2015-05-29
2014-01-31
No

A taxonomy naturally contains categories:

<taxonomy xml:id="docType">
  <desc>Document type</desc>
  <category xml:id="dtManuscript">
    <catDesc>Handwritten manuscript</catDesc>
  </category>
  <category xml:id="dtPrint">
    <catDesc>Printed document</catDesc>
  </category>
</taxonomy>

and it's also obvious that categories should be able to nest:

  <category xml:id="dtManuscript">
    <catDesc>Handwritten manuscript</catDesc>
      <category xml:id="dtLetter">
        <catDesc>Handwritten letter</catDesc>
      </category>
      <category xml:id="dtMemo">
        <catDesc>Handwritten memo</catDesc>
      </category>
  </category>

It's clear that you could assign a document to any of the subcategories, and this would imply membership of the parent category; so if a document has <catRef target="#dtLetter"/>, then it is by definition also "dtManuscript".

However, there are some areas in a nested tree like this which don't follow this pattern. Consider this:

  <category xml:id="dtPaperSize">
    <catDesc>The size of paper on which a manuscript document is written.</catDesc>
    <category xml:id="dtA4">
      <catDesc>A4 paper</catDesc>
    </category>
    <category xml:id="dtA5">
      <catDesc>A5 paper</catDesc>
    </category>
  </category>

Any document may belong to either of the child categories (even both, if it happens to include both paper sizes); so we would see e.g. <catRef target="#dtA4"/>, meaning that the document falls into the category of documents which consist wholly or partially of A4 paper. However, this does not make any claim as to the parent category; it makes no sense to say that a letter "is" or "has" a paper size, without specifying what that size is.

In other words, the "category" of paper size is not a category at all; it's a taxonomy. This issue comes up frequently in complex, nested taxonomies. It would be valuable to be able to create a structure like this:

<category xml:id="dtManuscript">
  <catDesc>Handwritten manuscript</catDesc>
  <category xml:id="dtLetter">
    <catDesc>Handwritten letter</catDesc>
  </category>
  <category xml:id="dtMemo">
    <catDesc>Handwritten memo</catDesc>
  </category>
  <taxonomy xml:id="dtPaperSize">
    <desc>The size of paper on which a manuscript document is written.</desc>
    <category xml:id="dtA4">
      <catDesc>A4 paper</catDesc>
    </category>
    <category xml:id="dtA5">
      <catDesc>A5 paper</catDesc>
    </category>
  </taxonomy>
</category>

I submit that <taxonomy> should be available as a child of <category>, to allow for such rich multi-layered taxonomies.

Discussion

  • James Cummings

    James Cummings - 2014-05-19
    • assigned_to: Paul Schaffner
     
  • James Cummings

    James Cummings - 2014-05-19

    Assigning to Paul Schaffner to triage and report to Council with a proposal.

     
  • Syd Bauman

    Syd Bauman - 2014-07-01

    Martin to provide a better example — probably a trimmed-down version of the real problem.

     
  • Martin Holmes

    Martin Holmes - 2014-11-18
    <taxonomy xml:id="paperDescriptors">
    ...
    
    <category xml:id="paperSize">
      <catDesc>Various paper sizes</catDesc>
      <taxonomy xml:id="americanPaperSizes">
        <category xml:id="apsLetter">
          <catDesc>Letter paper</catDesc>
        </category>
        <category xml:id="apsLegal">
        <catDesc>Legal paper</catDesc>
      </category>
      </taxonomy>
      <taxonomy xml:id="europeanPaperSizes">
        <category xml:id="apsA4">
          <catDesc>A4 paper</catDesc>
        </category>
        <category xml:id="apsA5">
          <catDesc>A5 paper</catDesc>
        </category>
      </taxonomy>
    </category>
    
    ...
    </taxonomy>
    
     
  • James Cummings

    James Cummings - 2014-11-18
    • assigned_to: Paul Schaffner --> Martin Holmes
    • Group: AMBER --> GREEN
     
  • James Cummings

    James Cummings - 2014-11-18

    MH to provide better examples of use case.

     
  • James Cummings

    James Cummings - 2014-11-18
    • Group: GREEN --> AMBER
     
  • Martin Holmes

    Martin Holmes - 2015-05-25

    First, I argue that taxonomies should be able to nest:

    This example is based on a real use-case from the Map of Early Modern London.

    We defined the nature of contributors' contributions to the project or to a specific document using taxonomies. We draw the majority of our responsibility definitions from the Marc Relators codes, as defined by the LOC:

    http://www.loc.gov/marc/relators/

    However, we use only a subset of that very long list, expressed as a TEI <taxonomy>:

    <taxonomy xml:id="marcRelators">
      <category xml:id="aft">
        <catDesc>
          <term>Author of Afterword, Colophon, etc.</term>
          <gloss type="marcRelator"> Use for a person or organization responsible for an
            afterword, postface, colophon, etc. but who is not the chief author of a
            work.</gloss>
        </catDesc>
      </category>
    
      <category xml:id="aui">
        <catDesc>
          <term>Author of Introduction</term>
          <gloss type="marcRelator">Use for a person or organization responsible for an
            introduction, preface, foreword, or other critical introductory matter, but who is
            not the chief author.</gloss>
        </catDesc>
      </category>
      [... and many more...]
    </taxonomy>
    

    However, the Marc Relators codes do not provide for all of our needs; we also have our own supplementary responsibility codes, also defined as a <taxonomy>:

    <taxonomy xml:id="molRelators">
      <category xml:id="cpy">
        <catDesc>
          <term>Copy Editor</term>
          <gloss type="mol"><title level="m">MoEML</title> uses the term <mentioned>copy editor</mentioned> to designate the person who brings the document into conformity with <title level="m">MoEML</title> stylistic and citational practice. Acceptable names for this role are copy editor, principal copy editor, secondary copy editor, or copy editor of (a particular section of text).</gloss>
        </catDesc>
      </category>
      <category xml:id="top">
        <catDesc>
          <term>Toponymist</term>
          <gloss type="mol"><title level="m">MoEML</title> uses the term <mentioned>toponymist</mentioned> to designate the person who identifies the place references in a text and points them to the right place in our locations database. The toponymist does not necessarily encode the toponyms. In most cases, the author of a born-digital article or the editor of a primary-source document will also be the toponymist.</gloss>
        </catDesc>
      </category>
      [... and several more...]
    </taxonomy>
    

    These taxonomies are used together and it would make more sense to be able to express them as a single taxonomy composed of two:

    <taxonomy xml:id="relators">
      <taxonomy xml:id="marcRelators">[...]</taxonomy>
      <taxonomy xml:id="molRelators">[...]</taxonomy>
    </taxonomy>
    

    Since taxonomies are often composed of other taxonomies, I believe <taxonomy> should be nestable.

     
  • Martin Holmes

    Martin Holmes - 2015-05-25

    Next, consider a taxonomy of literary forms (this arises out of a different project). You might characterize literary work in many different ways:

    <taxonomy xml:id="literaryForms">
      <category xml:id="lfProse">
        <category xml:id="lfShortStory"/>
        <category xml:id="lfNovella"/>
        <category xml:id="lfNovel"/>
      </category>
      <category xml:id="verse">
    [...]
      </category>
    </taxonomy>
    

    Now, inside the "verse" category we need to characterize different features of the verse, including foot type, line-length and stanza type:

    <category xml:id="verse">
      <category xml:id="lfFootAmphibrach"/>
      <category xml:id="lfFootAnapaest">
        [...more foot types...]
      <category xml:id="lfLineMonometer"/>
      <category xml:id="lfLineDimeter"/>
        [...more line lengths...]
      <category xml:id="lfStanzaCouplet"/>
      <category xml:id="lfStanzaTercet"/>
        [...more stanza types...]
    </category>
    

    It's clear here that we have three distinct types of category; they don't belong as siblings. What we really have are three distinct taxonomies, that should be marked up as such:

    <category xml:id="verse">
      <taxonomy xml:id="lfFoot">
        <category xml:id="lfFootAmphibrach"/>
        <category xml:id="lfFootAnapaest">
        [...more foot types...]
      </taxonomy>
      <taxonomy xml:id="lfLine">
        <category xml:id="lfLineMonometer"/>
        <category xml:id="lfLineDimeter"/>
        [...more line lengths...]
      </taxonomy>
      <taxonomy xml:id="lfStanza">
        <category xml:id="lfStanzaCouplet"/>
        <category xml:id="lfStanzaTercet"/>
        [...more stanza types...]
      </taxonomy>
    </category>
    

    It makes no sense to use nested categories for this; these are sub-taxonomies within the overall taxonomy of literary forms. On this basis, I argue that <taxonomy> should be available inside <category>.

     
  • Martin Holmes

    Martin Holmes - 2015-05-29

    Council 2015-05-29: MH to bug everyone on the Council list about this, and if no objections in 2 weeks, go ahead.