#425 TEI using outmoded ISO 5218 for sex value attribute


TEI uses ISO 5218:2004 to assign sexuality of persons in a document ( with attributes being given as 1 for male, 2 for female, 9 for non-applicable, and 0 for unknown). This is an outmoded and problematic representation of sexuality, and in particular formally assigns women to be secondary to men.

There are other discussions online regarding how best to tackle sexuality in markup, and the problems in using ISO 5218 - see the w3c lists here: http://lists.w3.org/Archives/Public/public-contacts-coord/2010JulSep/0010.html .

I would like to see TEI move away from enshrining women as the second sex in their markup - as Steven Ramsay tweeted:
<author>Simone de Beauvoir</author> <sex value="2">female</sex> *sigh*

Can a discussion be had about how best to achieve this? Your current approach is both outmoded and offensive.


Feature Requests: #487


1 2 > >> (Page 1 of 2)
  • BODARD Gabriel

    BODARD Gabriel - 2013-01-21

    I have long argued that ISO 5218 was inadequate for recording sex, not only because of the precedence of "male" in the numbering (although I've heard people interpret it the other way around--female is double the value of male--but that's a derailment), but because the values "unknown" and "not applicable" (which presumably can only refer to things like an anonymous blogger and a robot, respectively) are completely inadequate for representing the many other sexes and sexual-identities possible, including intersex, genderqueer, gender-neutral, fluid, trans* and many others.

    I did ask around various online queer communities if there were other proposed open standards for representing sex more inclusively, but no one could think of any. (The problem being, of course, that any such standard would be inadequate.)

    I agree very strongly with Melissa that we need a discussion about how to improve this situation, both at the TEI level, where we have a chance to improve the situation in the short term, and at ISO (which will no doubt takea lot longer). Failing any other external standard to adopt, I suggest the datatype of @sex be changed from data.sex to enumerated, allowing project-specific definition of sex (with documentation). This would allow anyone who is currently using ISO 5218 validly to continue doing so, but anyone who wants to do better to definte their taxonomy in the teiHeader (perhaps a sexDesc element provided for the purpose?).

  • Elena Pierazzo

    Elena Pierazzo - 2013-01-21

    I have been arguing for this for the past 6 years, since the presentation of P5, and I don't buy the retrospective argument that 2 is assigned to a woman because it is twice as good as a man. The TEI should not accomplice of the sexism of ISO. Agreed with Melissa and Gabby.

  • Martin Holmes

    Martin Holmes - 2013-01-21

    I agree with Gabby's proposed solution to go to data.enumerated for sex/@value in the short term, and revise the prose of the Guidelines. Then we need a working group to come up with a proper solution.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-21

    My feeling is that we should stick to the principle that we re-use external standards where at all possible. There isn't an obvious other contender to a categorisation of sex than ISO, so if thats inadequate, lobby ISO, not TEI.

    Why do people care? if you want to ignore the nornalization to ISO, use the body of the element to say whatever is needed. If you want to use another normalization schema, redefine data.sex in your customization as usual.

    by the way, I don't regard TEI as "them" or "you". It's "we" and "our" standard. Similarly, ISO.
    lets not fall into the "leave the EU, they take away all our rights" trap....

  • James Cummings

    James Cummings - 2013-01-22

    I don't seriously make the argument that '2' is better than '1' because it is more. When I've said that it is to point out how silly I find it to make the assumption that a numbering system of 1 and 2 somehow implies precedence or order (especially when '9' is also used). IMHO, I don't think it truly 'assigns women to be secondary to men', just like if ISO 5218 was expanded to have , let's say, trans ppl as '3' that they would be considered tertiary and below women. I'm sorry that you find it offensive. I've always felt that It is just a different number. It is simply an agreed machine-processable label -- yes people can get offended by that, but that isn't inherent to the number itself but the interpretations people place on it. I'm not saying those interpretations aren't real or don't have weight, validity or consequences. But that has to be balanced against using some adhoc linguistically-specific system which is why we moved away from 'm', 'f', 'u', and 'x'. There are many, many, other possible ways that people could record the information about sex and/or gender using TEI should they wish to do so on point of principle. (Using @ana to point to a full taxonomy of possibilities, for example.) Or as Sebastian points out local implementations are free to change the values associated with data.sex if they wish. I've always felt unease at the limitation of the 4 values and would happily argue for updating it if an agreed standard could be formalised.

    I'm not saying the TEI shouldn't change this, but I would instead be trying to get ISO to redefine the standard in some appropriate way and then TEI would, happily and without argument, implement that.

    No one has suggested what possible values we *should* use, so discussion would need to develop a clear proposal. Part of the problem is that any numerical system has the perception of ordering, and alphabetic ones have linguistic culture-specific assumptions that we'd prefer to shy away from if possible. One possibility suggested to me was to code for chromosome types XY for males and XX for females (which nicely gets a way to deal with sex chromosome abnormalities like Klinefelter syndrome), however, while this may work for biological sex identification it does not deal with gender and/or sexual identification such as the list proposed by Gabby. I honestly do not know what the right values should be. If I was encoding texts where it was felt important to have more than the four categories, as I suggested I would probably use a <taxonomy> with a range of values suitable to the task.

    [Since Gabby has already done some research in this area, I'm assigning the ticket to him to make sure we don't lose sight of it. Marking it as group 'RED' at the moment because we'd need a clear proposal to discuss and it isn't all clear what that proposal should be.]

  • James Cummings

    James Cummings - 2013-01-22
    • assigned_to: nobody --> gabrielbodard
    • milestone: --> RED
  • BODARD Gabriel

    BODARD Gabriel - 2013-01-22

    I agree that TEI is not--and shouldn't be--in the business of creating new standards, but we are currently in the business of recommending the use of existing standards, and we should be careful that the standards we recommend are fit for the purpose our users are going to employ them for. It's clear that for several reasons ISO 5218 is not fit.

    I do have a concrete proposal, in fact: change the datatype of person/@sex and sex/@value to data.enumerated, with instructions in the classSpec to use a project-defined or other standard taxonomy of sex (e.g. ISO 5218, or something better if it arises). This is backward compatible and doesn't prevent anyone who likes the current scheme from continuing to use it as their default normalization; it just deprivileges it.

    Those of us who care can also petition ISO to improve or remove the inadequate standard (but good luck with that), or work with other communities to come up with a rival standard. It's not TEI's business to do that, though.

  • Martin Holmes

    Martin Holmes - 2013-01-22

    Having thought about this a bit more, Gabby's proposal (changing data.sex to data.enumerated) won't work transparently, because data.enumerated is data.name, and data.name is an XML name which cannot begin with a digit, and so ISO 5218 numerical values would become invalid. They could be prefixed with a letter, of course, but such a change would break backwards compatibility.

  • BODARD Gabriel

    BODARD Gabriel - 2013-01-22

    That's a problem. This isn't the first time that the datatype of data.enumerated has turned out to be a problem (cf discussion of datatype of @ed). Is there some other some other value we can use (e.g. data.code) that allows arbitrary enumerated values?

    Why does data.enumerated need to start with an alphabetic character anyway?

  • Martin Holmes

    Martin Holmes - 2013-01-22

    The discussion Melissa has pointed to advocates an open-ended approach in which there are some suggested values ("male", "female") but the category is open so that users can express their sexuality or gender in a way that suits them. One option would be to create a new attribute with open data.enumerated values, and suggest some values. This could coexist alongside the @sex and sex/@value, which could then be deprecated if we wanted to make a clear statement of disapproval of 5218.

    Naming this attribute would be problematic. @gender springs to mind, but the potential confusion between sex and gender, transsexual vs transgendered, etc. would probably rule it out. We need a lot of input from the community here.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    It seems decidely retrograde to go back to an open list of arbitrary tokens made up by each project as it deems fit. Either let's use tokens from a recognized authority or convention (as we do for dates and times, for example), or let's use the "pointer to a classification" system which we espouse elsewhere. So just as we say hand="#hand1", lets say sex="#sex1", where "sex1" is the ID
    of an item in a typology. People can then make up as many categories as they see fit, and making it a pointer forces them to decide which scheme to use.

    Using arbitrary magic codes is the worst possible solution.

    (my survey of people, asking them if they find the 1 and 2 thing offensive, so far yields a more or less equal numbers of "i have no idea what you're talking about", "oh yes that old chestnut, but there are far more important issues to solve", and "it's an unordered set of arbitrary tokens, whats the issue")

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    I was amused to read that Sweden used to/uses a citizen identifier where "The number uses ten digits, YYMMDD-NNGC. The first six give the birth date in YYMMDD format. Digits seven to nine (NNG) are used to make the number unique, where digit nine (G) is odd for men and even for women. " Scotland does something similar. It raises the possibility that one could use any old numbering system, but follow the convention of "odd is sort of like men and even is sort of like women" (leaving 0 or negative numbers for other uses). that would allow one to use 100 for women and 101 for men. A trivial function will return the ISO equivalent for those that want to map to it.

    http://en.wikipedia.org/wiki/National_identification_number is fascinating reading :-}

  • Melissa Terras

    Melissa Terras - 2013-01-22

    Thanks for your comments on this - and glad to see that some are taking this seriously (when someone says that they are offended at something, it is generally useful to believe that they are offended, rather than telling them that they cant possibly be offended, or that there are better things to be doing.)

    I'm following this discussion with interest (although markup isnt my forte) - I agree that using an outmoded standard, just because it is a standard, isnt a useful approach.

    Fwiw, I'd be interested in working with someone on petitioning the ISO about this, if anyone else is willing to join forces.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    ISO standards have a very detailed and carefully designed process, to make sure they don't just hang on for ever. This one was last examined and renewed by due process in 2004. I dont think the right process is to "petition ISO", however. They don't _make_ standards, they merely publish the work done by their working groups, which are composed of representatives of the national standard bodies, ie the BSI in our case. So I'd suggest contacting BSI, and finding out when the next examination is due, and which the relevant committee is.

    On a quick browse of 5218, it is very clear that this isn't a group of people sitting down and making up codes; it is (as is often the case) formalizing existing processes in member countries. My other investigation suggests that the convention odd=male, even=female is probably the origin of it. It may well be, then, a very uphill task indeed to argue for a revision.
    Maybe the argument of offence caused would have an impact. Maybe the argument that sex is not regarded as binary these days would have an impact. Still, the point about formalizing existing processes remain; arguably, one has to first get a majority practice across the member countries of no longer regarding sex as a binary divide.

    I cannot see which of the lengthy and detailed replies on the ticket is not taking it seriously, by the way.

    I would _not_ agree that "using an outmoded standard, just because it is a standard, isn't a useful approach.". I'd argue that it is a great deal better than having no interchange of information at all. It is pretty obvious, isn't it, that the standard of doing our calendar based on the supposed birth of a Jewish prophet in a religion which is a minority worldwide is outmoded - but its jolly useful!

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    I think there are 6 answers here, some backward compatible and some not:

    1. stay aligned with ISO, adding a sentence or two apologizing for any offence caused and explaining more background

    2. allow any digits, except 0 and 9, and say that "odd means male, even means female"

    3. just change to an open value, taxonomy up to the project

    4. make data.sex into a pointer, and say that it must point to a taxonomy

    5. invent our own tokenization based on a non-ordered set (eg symbols, though almost
    any choice is open to causing offence, symbols are powerful beasts)

    6. some combiunation of the above, with several attributes; eg keep @sex as @isosex, and add a new @gender (or whatever) pointing at a taxonomy

  • James Cummings

    James Cummings - 2013-01-23

    @melissaterras: I don't think anyone has said that you can't possibly be offended at this, you certainly can and as I said your interpretation has weight and validity, just as much as anyone else's. I believe we honestly don't want to cause offence to any of our users while still providing a useful and robust encoding scheme. It is tricky and would be much easier for the TEI if the ISO standard was changed.

    @rahtz: You are right in your summary of ISO workings, I had forgotten that. I still think the right course of action is to get involved with any others out there who are working to change the ISO standard whether that is via the BSI or elsewhere.

    I will encourage the rest of tei-council to comment on this (and if it seems reasonable later to draw attention to this ticket on TEI-L).

    To spell out my proposed solution for those who don't want to use a numerical @sex attribute I would have used @ana with two URI pointers in it pointing to a taxonomy in the header (or elsewhere online):

    <person ana="#idOfBiologicalSexualCategory #idOfGenderIdentification">...</person>

    which would then point to a taxonomy with categories with the appropriate IDs. I include the (debatable) bio vs genderIdentification here simply to highlight that such an approach allows multiple vectors however the encoders feel would be useful to categorise their taxonomies. If we adopted @rahtz's proposal #4 then @sex would do this. The drawback is one of perceived interoperability -- a large corpus of texts from disparate projects would have to be normalised back into a single system (whatever that system is).

    Of his suggestions:

    1. I'd vote for this one.
    2. So women become 5 and men become 4? (Or something else... it doesn't necessarily solve the problem) and makes interchange more difficult.
    3. possible but we lose the benefit of having a datatype in the first place.
    4. If not 1. then I'd vote for this one (with backwards compatibility arguments pending)
    5. This sounds like just a different can of worms
    6. Possible, but a more major change... I think we'd need more community input on how desired this change was.

    Of all of them 1. is easiest, but may not really solve the problem of offence generated, just apologise for it shifting the blame to ISO. I realise that isn't very satisfactory but at least recognises the problem while causing the minimal side-effects in backwards compatibility for the community.

  • Laurent Romary

    Laurent Romary - 2013-01-23

    I would vote for 1. as well, maybe referring to James' good proposal of using @ana. It is my understanding that anything going beyond the baseline reflected in the ISO standard (which I do not read as a political or sociological stance, but rather as the simplest way to ensure interoperability within administrative information system) relates to interpretative processes that @ana can take into account quite well.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    I agree with Laurent. Promoting/explaining the use of @ana on <sex> (surely not on <person>? that would not have enough context) _alongside_ the existing minimal @(iso)sex seems desirable. One could imagine a new section (after http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDPERSEpc?\) which discussed the issue of how to represent non-binary sex.

  • James Cummings

    James Cummings - 2013-01-23

    Using @ana on <sex> would also be fine. I've usually preferred the @sex attribute because projects I've been involved with have not been recording human-readable text alongside the sex, just a very limited (and usually binary) interpretation of it viz male/female represented by a digit. to use <sex> then just provides additional markup when not required for those situations. But yes, that would work as well.

    None of this addresses the central problem of ISO 5218 and that our use of it may be offensive. I think there is a choice (before @rahtz's more detailed options) of either:
    a) we still use ISO 5218 for data.sex datatype (possibly encouraging alternatives and explaining its limitations and potential for offence) or
    b) we abandon ISO 5218 in favour of some-other-system, as yet not codified, that either we create or adopt from a different group.

    Of the two I would prefer a) but with an explanation of possible problems and limitations added in chapters and reference pages, and use of something like @ana pointing to a detailed taxonomy suggested as a mechanism for those who want more fine-grained (and potentially less offensive) methods.

  • BODARD Gabriel

    BODARD Gabriel - 2013-01-23

    I think we need a slightly more coherent approach than is being discussed here. I suggest:

    1. deprecating (but not removing) both person/@sex and sex/@value

    2. replacing both by a new (resumably classed, e.g. att.sex), @sex-iso (parallel with @when-iso which we don't recommend but provide for people who want to use the, to us not ideal, ISO standard)

    3. also add (maybe in the same class) @sexRef, as suggested by Sebastian, which allows linking to internal or externally defined taxonomies of sex via url/uri/pointer (I prefer this to @ana, especially on <person>, but am flexible)

    For those who wish to continue using ISO 5218 in the meantime, the only difference is that the attribute they are using is mildly disrecommended in favour of @sex-iso, and we encourage them to move over to that in the next couple of years. Of course, when ISO next update that datatype, we'll change the model of that attribute to follow.

    Brief additional prose to point out the problems with ISO 5218 would be welcome. I don't think we want to discuss "how to represent non-binary sex" especially, as again that's not our place. How to use taxonomies other than ISO (whatever the reason for your dissatisfaction for it) would be essential, however.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    I'd simplify this to

    1. add @sexRef to <person> and <sex>
    2. add extra prose around @sex/<sex> pointing out the issues

    and forget @sex-iso/deprecated @sex stuff.

    I don't think our deprecation mechanisms are enough to have enough effect.

    If are prepared to consider this issue against Birnbaum, rename @sex to @iso-sex now.

  • Martin Holmes

    Martin Holmes - 2013-01-23

    <personGrp> would also need any new attribute (it currently has @sex), and while we're at it we could add it to <listPerson> as well. If a <personGrp> can have a consistent sex value, I don't see why a <listPerson> couldn't either.

    A new attribute class for @sex and @sexRef would be a good idea, but there are two problems: <sex> uses @value rather than @sex, and I can't think of any name for such a class that doesn't seem ridiculous.

  • BODARD Gabriel

    BODARD Gabriel - 2013-01-23

    Martin: This is why I suggested we replace *both* person/@sex (and as you say personGrp/@sex) and sex/@value with the new @sex-iso and @sexRef. It has the added advantage of making the attributes more coherent. (And groupable in a class.)

    Why does att.sex sound ridiculous?

    Sebastian: I still think that renaming to @sex-iso (either with deprecation or without) has two advantages: consistency (as above) and quarantining ourselves a little bit from the problems with the ISO standard.

  • Martin Holmes

    Martin Holmes - 2013-01-23

    Gabby: I do like your solution, but it is a bit disruptive compared with Sebastian's. Re att.sex: I was thinking it should be an adjective, and I couldn't think of an acceptable one.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    i am ok with renaming @sex to @iso-sex, but less keen on that _and_ keeping old @sex (albeit deprecated). It is a toss-up between Birnbaum and user confusion. I'd take that one to TEI-L, to get a sense of how much @sex is used and relied upon in processing.

1 2 > >> (Page 1 of 2)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks