#425 TEI using outmoded ISO 5218 for sex value attribute


TEI uses ISO 5218:2004 to assign sexuality of persons in a document ( with attributes being given as 1 for male, 2 for female, 9 for non-applicable, and 0 for unknown). This is an outmoded and problematic representation of sexuality, and in particular formally assigns women to be secondary to men.

There are other discussions online regarding how best to tackle sexuality in markup, and the problems in using ISO 5218 - see the w3c lists here: http://lists.w3.org/Archives/Public/public-contacts-coord/2010JulSep/0010.html .

I would like to see TEI move away from enshrining women as the second sex in their markup - as Steven Ramsay tweeted:
<author>Simone de Beauvoir</author> <sex value="2">female</sex> *sigh*

Can a discussion be had about how best to achieve this? Your current approach is both outmoded and offensive.


Feature Requests: #487


<< < 1 2 3 4 > >> (Page 2 of 4)
  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    It seems decidely retrograde to go back to an open list of arbitrary tokens made up by each project as it deems fit. Either let's use tokens from a recognized authority or convention (as we do for dates and times, for example), or let's use the "pointer to a classification" system which we espouse elsewhere. So just as we say hand="#hand1", lets say sex="#sex1", where "sex1" is the ID
    of an item in a typology. People can then make up as many categories as they see fit, and making it a pointer forces them to decide which scheme to use.

    Using arbitrary magic codes is the worst possible solution.

    (my survey of people, asking them if they find the 1 and 2 thing offensive, so far yields a more or less equal numbers of "i have no idea what you're talking about", "oh yes that old chestnut, but there are far more important issues to solve", and "it's an unordered set of arbitrary tokens, whats the issue")

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    I was amused to read that Sweden used to/uses a citizen identifier where "The number uses ten digits, YYMMDD-NNGC. The first six give the birth date in YYMMDD format. Digits seven to nine (NNG) are used to make the number unique, where digit nine (G) is odd for men and even for women. " Scotland does something similar. It raises the possibility that one could use any old numbering system, but follow the convention of "odd is sort of like men and even is sort of like women" (leaving 0 or negative numbers for other uses). that would allow one to use 100 for women and 101 for men. A trivial function will return the ISO equivalent for those that want to map to it.

    http://en.wikipedia.org/wiki/National_identification_number is fascinating reading :-}

  • Melissa Terras

    Melissa Terras - 2013-01-22

    Thanks for your comments on this - and glad to see that some are taking this seriously (when someone says that they are offended at something, it is generally useful to believe that they are offended, rather than telling them that they cant possibly be offended, or that there are better things to be doing.)

    I'm following this discussion with interest (although markup isnt my forte) - I agree that using an outmoded standard, just because it is a standard, isnt a useful approach.

    Fwiw, I'd be interested in working with someone on petitioning the ISO about this, if anyone else is willing to join forces.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-22

    ISO standards have a very detailed and carefully designed process, to make sure they don't just hang on for ever. This one was last examined and renewed by due process in 2004. I dont think the right process is to "petition ISO", however. They don't _make_ standards, they merely publish the work done by their working groups, which are composed of representatives of the national standard bodies, ie the BSI in our case. So I'd suggest contacting BSI, and finding out when the next examination is due, and which the relevant committee is.

    On a quick browse of 5218, it is very clear that this isn't a group of people sitting down and making up codes; it is (as is often the case) formalizing existing processes in member countries. My other investigation suggests that the convention odd=male, even=female is probably the origin of it. It may well be, then, a very uphill task indeed to argue for a revision.
    Maybe the argument of offence caused would have an impact. Maybe the argument that sex is not regarded as binary these days would have an impact. Still, the point about formalizing existing processes remain; arguably, one has to first get a majority practice across the member countries of no longer regarding sex as a binary divide.

    I cannot see which of the lengthy and detailed replies on the ticket is not taking it seriously, by the way.

    I would _not_ agree that "using an outmoded standard, just because it is a standard, isn't a useful approach.". I'd argue that it is a great deal better than having no interchange of information at all. It is pretty obvious, isn't it, that the standard of doing our calendar based on the supposed birth of a Jewish prophet in a religion which is a minority worldwide is outmoded - but its jolly useful!

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    I think there are 6 answers here, some backward compatible and some not:

    1. stay aligned with ISO, adding a sentence or two apologizing for any offence caused and explaining more background

    2. allow any digits, except 0 and 9, and say that "odd means male, even means female"

    3. just change to an open value, taxonomy up to the project

    4. make data.sex into a pointer, and say that it must point to a taxonomy

    5. invent our own tokenization based on a non-ordered set (eg symbols, though almost
    any choice is open to causing offence, symbols are powerful beasts)

    6. some combiunation of the above, with several attributes; eg keep @sex as @isosex, and add a new @gender (or whatever) pointing at a taxonomy

  • James Cummings

    James Cummings - 2013-01-23

    @melissaterras: I don't think anyone has said that you can't possibly be offended at this, you certainly can and as I said your interpretation has weight and validity, just as much as anyone else's. I believe we honestly don't want to cause offence to any of our users while still providing a useful and robust encoding scheme. It is tricky and would be much easier for the TEI if the ISO standard was changed.

    @rahtz: You are right in your summary of ISO workings, I had forgotten that. I still think the right course of action is to get involved with any others out there who are working to change the ISO standard whether that is via the BSI or elsewhere.

    I will encourage the rest of tei-council to comment on this (and if it seems reasonable later to draw attention to this ticket on TEI-L).

    To spell out my proposed solution for those who don't want to use a numerical @sex attribute I would have used @ana with two URI pointers in it pointing to a taxonomy in the header (or elsewhere online):

    <person ana="#idOfBiologicalSexualCategory #idOfGenderIdentification">...</person>

    which would then point to a taxonomy with categories with the appropriate IDs. I include the (debatable) bio vs genderIdentification here simply to highlight that such an approach allows multiple vectors however the encoders feel would be useful to categorise their taxonomies. If we adopted @rahtz's proposal #4 then @sex would do this. The drawback is one of perceived interoperability -- a large corpus of texts from disparate projects would have to be normalised back into a single system (whatever that system is).

    Of his suggestions:

    1. I'd vote for this one.
    2. So women become 5 and men become 4? (Or something else... it doesn't necessarily solve the problem) and makes interchange more difficult.
    3. possible but we lose the benefit of having a datatype in the first place.
    4. If not 1. then I'd vote for this one (with backwards compatibility arguments pending)
    5. This sounds like just a different can of worms
    6. Possible, but a more major change... I think we'd need more community input on how desired this change was.

    Of all of them 1. is easiest, but may not really solve the problem of offence generated, just apologise for it shifting the blame to ISO. I realise that isn't very satisfactory but at least recognises the problem while causing the minimal side-effects in backwards compatibility for the community.

  • Laurent Romary

    Laurent Romary - 2013-01-23

    I would vote for 1. as well, maybe referring to James' good proposal of using @ana. It is my understanding that anything going beyond the baseline reflected in the ISO standard (which I do not read as a political or sociological stance, but rather as the simplest way to ensure interoperability within administrative information system) relates to interpretative processes that @ana can take into account quite well.

  • Sebastian Rahtz

    Sebastian Rahtz - 2013-01-23

    I agree with Laurent. Promoting/explaining the use of @ana on <sex> (surely not on <person>? that would not have enough context) _alongside_ the existing minimal @(iso)sex seems desirable. One could imagine a new section (after http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDPERSEpc?\) which discussed the issue of how to represent non-binary sex.

  • James Cummings

    James Cummings - 2013-01-23

    Using @ana on <sex> would also be fine. I've usually preferred the @sex attribute because projects I've been involved with have not been recording human-readable text alongside the sex, just a very limited (and usually binary) interpretation of it viz male/female represented by a digit. to use <sex> then just provides additional markup when not required for those situations. But yes, that would work as well.

    None of this addresses the central problem of ISO 5218 and that our use of it may be offensive. I think there is a choice (before @rahtz's more detailed options) of either:
    a) we still use ISO 5218 for data.sex datatype (possibly encouraging alternatives and explaining its limitations and potential for offence) or
    b) we abandon ISO 5218 in favour of some-other-system, as yet not codified, that either we create or adopt from a different group.

    Of the two I would prefer a) but with an explanation of possible problems and limitations added in chapters and reference pages, and use of something like @ana pointing to a detailed taxonomy suggested as a mechanism for those who want more fine-grained (and potentially less offensive) methods.

  • BODARD Gabriel

    BODARD Gabriel - 2013-01-23

    I think we need a slightly more coherent approach than is being discussed here. I suggest:

    1. deprecating (but not removing) both person/@sex and sex/@value

    2. replacing both by a new (resumably classed, e.g. att.sex), @sex-iso (parallel with @when-iso which we don't recommend but provide for people who want to use the, to us not ideal, ISO standard)

    3. also add (maybe in the same class) @sexRef, as suggested by Sebastian, which allows linking to internal or externally defined taxonomies of sex via url/uri/pointer (I prefer this to @ana, especially on <person>, but am flexible)

    For those who wish to continue using ISO 5218 in the meantime, the only difference is that the attribute they are using is mildly disrecommended in favour of @sex-iso, and we encourage them to move over to that in the next couple of years. Of course, when ISO next update that datatype, we'll change the model of that attribute to follow.

    Brief additional prose to point out the problems with ISO 5218 would be welcome. I don't think we want to discuss "how to represent non-binary sex" especially, as again that's not our place. How to use taxonomies other than ISO (whatever the reason for your dissatisfaction for it) would be essential, however.

<< < 1 2 3 4 > >> (Page 2 of 4)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks