Menu

#539 New element annotatedU

AMBER
open
None
5(default)
2015-03-24
2014-12-05
Lou Burnard
No

[This is the second of a few tickets related to the TEI/ISO standard for transcriptions of spoken language: see http://bit.ly/1jyZC37 ]

It is usual to segment transcribed speech into smaller chunks for which the existing <u> element is appropriate. This proposal suggests a way of grouping each such chunk with one or more tiers of annotation, as is common practice.

Discussion

  • Laurent Romary

    Laurent Romary - 2014-12-05

    We should probably see how we could also deal with such cases by leans of the stand-off element. I see the two options as complementary flavors (for many pieces of speech annotation software an interleaved representation à la annotationU is easier; whereas for some other use cases, it is better to leave the primary transcription "untouched")

     
  • Laurent Romary

    Laurent Romary - 2014-12-17

    After going back and forth between the ISO proposal and the stdf proposal. I see the possibility to create an element that would be slightly more generic than annotated you, which we could call annotationGrp. This element could be used to group together series of annotations associated to the same primary object (e.g. the same u element) either by having this object as a child (i.e. what we wanted with annotatedU: a u with a series of spanGrp for instance) or in a stand-off mode within the annotations sub-element of stdf. The specification of this element could be as follows:

    <elementSpec ident="annotationGrp" mode="add" ns="http://standoff.proposal">
       <desc>Groups together various annotations, for instance for parallel interpretations of a spoken segment</desc>
       <classes>
          <memberOf key="model.annotationPart"/>
          <memberOf key="model.divPart.spoken"/>
          <memberOf key="att.timed"/>
          <memberOf key="att.global"/>
          <memberOf key="att.ascribed"/>
       </classes>
       <content>
          <rng:zeroOrMore>
             <rng:choice>
                <rng:ref name="u"/>
                <rng:ref name="model.global.meta"/>
                <rng:ref name="model.annotationPart"/>
             </rng:choice>
          </rng:zeroOrMore>
       </content>
    </elementSpec>
    

    with the idea that model.annotationPart would be the hook where one could add any kind of internal or external annotation object. For instance in my tests, I make model.global.meta member of this class to get spanGrp and the like in it.

     

    Last edit: Laurent Romary 2015-03-16
  • Lou Burnard

    Lou Burnard - 2015-01-29

    Generalizing is always nice. But what is "stdf" please?

     
  • Piotr Banski

    Piotr Banski - 2015-01-29

    stdf is a proposed element badly in need of a name approved for all audiences.

    Please see ticket #378, then the google doc linked from there, then Peter Stadler's ODD proposal for standoff annotations, linked from the google doc...

     
  • Laurent Romary

    Laurent Romary - 2015-01-29

    There is also a github project (https://github.com/laurentromary/stdfSpec), where I maintain updates on the stdf proposal and some samples, which shows how annotatedU can be used nine or stand-off in relation to speech transcription.

     
  • Lou Burnard

    Lou Burnard - 2015-01-30
    • assigned_to: Lou Burnard
     
  • Paul Schaffner

    Paul Schaffner - 2015-03-02

    Referring to the document at https://docs.google.com/document/d/1BTjYHSiPjD6GhKMNFmZrrvCkLQAa1RK7aGbG5K50uN4

    Section 6.5.2 ("Representation as unclear or gap") says that when an string of words is unclear, and alternatives are proposed, the strings should each be wrapped in a separate span element (within choice, within unclear). I think this meant to say "a separate seg element" ; and indeed the examples given two sections later (6.5.4) use seg, not span. Probably just the usual code-switching problem between HTML span and TEI seg.

    Section 5.7 (6.7 as listed in the TOC) on "Global divisions" proposes that divisions of the transcription at levels superordinate to the utterance should be accomplished by the use of non-tessellating divs. Unless utterance and annotated utterance themselves are regarded as syntactic sugar for div type="utterance", this is surely a very un-TEI way of doing things. Do we really mean to slip floating divs into the scheme by this means?

     
  • Lou Burnard

    Lou Burnard - 2015-03-16

    I have suggested a revision to the document precluding non-tesselating divs. In the meantime, do we have agreement on introducing a new <annotatedU> element, a spec for which would look something like this

    <elementSpec ident="annotatedU" ns="http://iso-tei-spoken.org/ns/1.0">
    <desc>groups an utterance with the  annotation layers associated with
    it</desc>
    <classes>
    <memberOf key="model.divPart.spoken"/>
    </classes>
    <content>
          <group xmlns="http://relaxng.org/ns/structure/1.0">
        <ref name="u"/>
        <oneOrMore>
          <ref name="spanGrp"/>
        </oneOrMore>
          </group>
    </content>     
    </elementSpec>
    
     
  • Laurent Romary

    Laurent Romary - 2015-03-16

    @Lou: please see above the new name + specification for annotationGrp, comprising the creation of a class model.annotationPart allowing an easy customization of the content depending of the kind of annotation object people will use (e.g. term entries, NER, open annotation objects, what have you)

     
  • Lou Burnard

    Lou Burnard - 2015-03-16

    So you want to replace "annotatedU" with "annotationGrp" ?

     
  • Laurent Romary

    Laurent Romary - 2015-03-16

    Yes. See Thomas' last document.

     
  • Lou Burnard

    Lou Burnard - 2015-03-16

    For the benefit of others trying to follow this ticket, "Thomas' last document" is an entirely new docx version of the googledoc, the existence of which I learned of about 20 minutes ago when he sent me a copy !

     
  • Laurent Romary

    Laurent Romary - 2015-03-24

    Could we put this behind a pwd protected place. We may have a pb with ISO copyrighted documents. (I am +not+ opening a debate, just mentioning)

     
  • Lou Burnard

    Lou Burnard - 2015-03-24

    Well, we have the wiki, but that is hardly secure. If you want to restrict access to this document, then clearly it is not yet ready for discussion by the TEI, so I will remove it.