Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#407 datatype of @n to plain text

closed-accepted
Lou Burnard
None
5
2012-06-17
2012-06-03
Sebastian Rahtz
No

att.global/@n is currently defined as have a datatype of

<datatype maxOccurs="unbounded">
<ref xmlns="http://relaxng.org/ns/structure/1.0" name="data.word"/>
</datatype>

which excludes certain characters ("only letters, digits, punctuation characters, or symbols"),
but is unbounded so can have spaces.

This is silly. Its plain text, and needs a plain <rng:text> datatype. What would it mean to have
multiple values for @n?

Discussion

  • After discussing with Syd and James, a lesser alternative is to change data.word to allow it to contain characters like † and its double equivalent. But I still don't think it makes sense to have multiple values for n...

     
  • Laurent Romary
    Laurent Romary
    2012-06-04

    I think it is even more silly to patch ln with a few extra characters. The guidelines needs to show a simple face not patches all over the place. Let us +simply+ go for <rng:text>

     
  • BODARD Gabriel
    BODARD Gabriel
    2012-06-04

    I agree with Laurent--if what we want is any text to be allowed in @n, let's do that, rather than patching data.word (which might, conceivably break--or at least endanger--things it is more correctly used for). I also agree with Sebastian that multiple values for @n is odd (although I can think of cases when I would use it that way), and I bet that the multiple values of data.word is only there so people can have valid values of @n that contain spaces.

     
  • Lou Burnard
    Lou Burnard
    2012-06-04

    I think that the right answer here is to define a new TEI data type (data.text maybe) which maps to the XML text datatype, but has the specific semantics that it is a unitary value which may happen to contain spaces. This would also be appropriate for the @key attribute, of course.
    If we felt strong, we *could* then define a fancy regexp to specify exactly which characters we wanted to exclude from it.

     
  • I like Lou's notion, of actually using <rng:text> but bundling it up in a named concept gives us a place to distinguish "plain ole text" from "a label which is allowed to contain any (most) Unicode characters"

     
  • Lou Burnard
    Lou Burnard
    2012-06-17

    • assigned_to: nobody --> louburnard
    • status: open --> closed-accepted
     
  • Lou Burnard
    Lou Burnard
    2012-06-17

    I have defined data.text accordingly and specified it as the value for @n : I've also made it the value for attributes previously classed as data.key (@key, the three weirdos on <distinct>, and w@lemma) , since that is effectively the same; this means we don't need data.key any more so I've removed it. All at rev 10512