Menu

#548 Give more structure to abstract

AMBER
open
None
5(default)
2015-05-28
2015-03-15
No

The abstract element is particularly important for representing meta-data associated to scholarly works. Still, many disciplines have developed a tradition of reflecting the argument of a paper within a structured abstract.

For instance see the attached screen shot where the abstract has two subsections with corresponding headings.

We suggest to allow abstract to contain divs so that the preceding example could be represented as:

~~~~~~

<profileDesc>
<abstract>


<head>Motivation:</head>

A number of available program packages determine the significant enrichments and/or depletions of GO categories among a class of genes of interest. Whereas a correct formulation of the problem leads to a single exact null distribution, these GO tools use a large variety of statistical tests whose denominations often do not clarify the underlying P-value computations.




<head>Summary:</head>

We review the different formulations of the problem and the tests they lead to: the binomial, x 2 , equality of two probabilities, Fisher's exact and hypergeometric tests. We clarify the relationships existing between these tests, in particular the equivalence between the hypergeometric test and Fisher's exact test. We recall that the other tests are valid only for large samples, the test of equality of two probabilities and the x 2-test being equivalent. We discuss the appropriateness of one-and two-sided P-values, as well as some discreteness and conservatism issues.



</abstract>
</profileDesc>

We have two use cases for this currently: one is the back-office format for HAL, the French national open publication repository (yes, it does use the TEI for all content interchange) and the second is the Istex initiative (http://www.inist.fr/?ISTEX-51&lang=en) where data from publishers are ingested ni the context of a notional licence program.

1 Attachments

Discussion

  • Laurent Romary

    Laurent Romary - 2015-03-15

    Another example from: Kaur J, Lamb MM, Ogden CL. "The Association between Food Insecurity and Obesity in Children-The National Health and Nutrition Examination Survey."

    which could be encoded as:

    ~~~~~

    <profileDesc>
    <abstract>


    <head> BACKGROUND:</head>

    Food insecurity can put children at greater risk of obesity because of altered food choices and nonuniform consumption patterns.




    <head> OBJECTIVE:</head>

    We examined the association between obesity and both child-level food insecurity and personal food insecurity in US children.




    <head> DESIGN:</head>

    Data from 9,701 participants in the National Health and Nutrition Examination Survey, 2001-2010, aged 2 to 11 years were analyzed. Child-level food insecurity was assessed with the US Department of Agriculture's Food Security Survey Module based on eight child-specific questions. Personal food insecurity was assessed with five additional questions. Obesity was defined, using physical measurements, as body mass index (calculated as kg/m2) greater than or equal to the age- and sex-specific 95th percentile of the Centers for Disease Control and Prevention growth charts. Logistic regressions adjusted for sex, race/ethnic group, poverty level, and survey year were conducted to describe associations between obesity and food insecurity.




    <head> RESULTS:</head>

    Obesity was significantly associated with personal food insecurity for children aged 6 to 11 years (odds ratio=1.81; 95% CI 1.33 to 2.48), but not in children aged 2 to 5 years (odds ratio=0.88; 95% CI 0.51 to 1.51). Child-level food insecurity was not associated with obesity among 2- to 5-year-olds or 6- to 11-year-olds.




    <head> CONCLUSIONS:</head>

    Personal food insecurity is associated with an increased risk of obesity only in children aged 6 to 11 years. Personal food-insecurity measures may give different results than aggregate food-insecurity measures in children.



    </abstract>
    </profileDesc>

     
  • Laurent Romary

    Laurent Romary - 2015-03-15

    Note that this usage is reflected in most publishers' formats (from which we derive the TEI pivot format): Elsevier (with an abstract-sec element), Spinger (AbstractSection), and of course JATS, as shown in the example below:

    ~~~~~~

    <abstract abstract-type="executive-summary">
    <sec id="st1">
    Background

    Important controversies exist about the extent to which people's health status as adults is shaped by their living conditions in early life compared to adulthood. These debates have important policy implications, and one obstacle to resolving them is the relative lack of sufficient high-quality data on childhood and adult socioeconomic position and adult health status. We accordingly compared the health status among monozygotic and dizygotic women twin pairs who lived together through childhood (until at least age 14) and subsequently were discordant or concordant on adult socioeconomic position. This comparison permitted us to ascertain the additional impact of adult experiences on adult health in a population matched on early life experiences.


    </sec>
    <sec id="st2">
    Methods and Findings

    Our study employed data from a cross-sectional survey and physical examinations of twins in a population-based twin registry, the Kaiser Permanente Women Twins Study Examination II, conducted in 1989 to 1990 in Oakland, California, United States. The study population was composed of 308 women twin pairs (58% monozygotic, 42% dizygotic); data were obtained on childhood and adult socioeconomic position and on blood pressure, cholesterol, post-load glucose, body mass index, waist-to-hip ratio, physical activity, and self-rated health. Health outcomes among adult women twin pairs who lived together through childhood varied by their subsequent adult occupational class. Cardiovascular factors overall differed more among monozygotic twin pairs that were discordant compared to concordant on occupational class. Moreover, among the monozygotic twins discordant on adult occupational class, the working class twin fared worse and, compared to her professional twin, on average had significantly higher systolic blood pressure (mean matched difference = 4.54 mm Hg; 95% confidence interval [CI], 0.10–8.97), diastolic blood pressure (mean matched difference = 3.80 mm Hg; 95% CI, 0.44–7.17), and low-density lipoprotein cholesterol (mean matched difference = 7.82 mg/dl; 95% CI, 1.07–14.57). By contrast, no such differences were evident for analyses based on educational attainment, which does not capture post-education socioeconomic position.


    </sec>
    <sec id="st3">
    Conclusion

    These results provide novel evidence that lifetime socioeconomic position influences adult health and highlight the utility of studying social plus biological aspects of twinship.


    </sec>
    </abstract>

     
  • Lou Burnard

    Lou Burnard - 2015-03-15

    Permitting <div> elements within the header is quite a major change. I can think of at least two other less transgressive ways of meeting this use case
    (a) if the abstract is long enough to be subdivided, shouldn't it be a <div type="abstract"> within the <front>?
    (b) why not structure these examples using <list> <label> and <item>?

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2015-03-15

    These abstracts with formatting in look as if they belong in the <text>. It strikes me as a very undesirable change to allow the structural elements in the header. I would note that the ref page for <abstract> is quite specific: "The abstract for a born digital document should be located within the <front>".

     
  • Martin Holmes

    Martin Holmes - 2015-03-15

    The problem is that the header is the right place for an abstract relating to primary source text (which of course may have its own <front>). Since a primary source text can be a large volume, it's unrealistic to think that an abstract of it can be constrained to a single section. Therefore the <abstract> element in the header must be able to handle multi-section abstracts. How that's best achieved is another matter. Is there a principled objection to the use of <div> anywhere in the header? I've never heard one.

    The rule that says "The abstract for a born digital document should be located within the <front> is, IMHO, completely wrong. An abstract written by the authors of the original document should presumably be in the <front>; if I come along later (as an editor, publisher or whatever) and add an abstract for my own purposes, that's metadata, and it belongs in the header.

     
  • Piotr Banski

    Piotr Banski - 2015-03-15

    And the abstracts adduced by Laurent are all of the <front> flavour, aren't they.

    1. Martin writes: "Since a primary source text can be a large volume, it's unrealistic to think that an abstract of it can be constrained to a single section." Can we have a look at examples of this kind of abstracts? Would they not be good enough with <p>s, by analogy to the project description and other potentially wordy sections of the header?

    2. I am a bit afraid (though I will agree that my worry of itself does not constitute a general argument) that if <div>s are introduced for the sake of handling merely a sequence of paragraphs in a metadata-kind of abstract, encoders will try to (mis?)use that for the purpose of including <front>-type abstracts.

    3. Anticipating replies to point (1) above, I can imagine an artistically crafted metadata-kind of abstract (well, people have various hobbies), but in such cases, would it be not better to encode such an artistic abstract as a TEI text, and merely reference it from the header? Because such an artistic abstract would become a related work, rather than just metadata for the original text.

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2015-03-15

    Using the metadata header to contain what looks like a <floatingText> of the abstract just seems, well, weird. How is the header now any different from the front matter? might as well go straight down the road to JATS and have no distinct metadata at all.

    Is this really needed? Laurent's examples should plainly use <front>, since he's using TEI to simulate JATS in a controlled environment (cf earlier TEI/ISO work), and Martin's example doesn't actually exist yet of a new structured abstract being written by a 3rd party for a born-digital doc. I'd say let sleeping dogs lie until you have to wake them up.

     
  • Martin Holmes

    Martin Holmes - 2015-03-15

    @Piotr: One possible example of an abstract we will need to write is for John Stow's Survey of London, for which we're producing digital versions of several editions. If you look at the already-online BHO version of [the 1908 version of] the 1603 text:

    http://www.british-history.ac.uk/no-series/survey-of-london-stow/1603

    you'll see that the text is divided into several distinct sections with different purposes, including a series of individual chapters about the wards of the city, chapters on the municipal government system, sections on hospitals, parish churches, and various other curiousities. It's difficult to imagine constructing a worthwhile abstract for this text without dividing the abstract into headed sections; and only some sort of kludge would enable the use of headings interspersed inside a series of paragraphs.

    The important thing here is that front matter is from the original text; you shouldn't really put modern editorial content into the <front> alongside Stow's own front matter, which includes a dedicatory epistle. This, oddly enough, is what the BHO edition does: it has modern introductory material in its front matter, which then transitions (inside the same roman-numbered page sequence) into Stow's dedication. But this is because they're actually doing an edition of the Kingsford 1908 edition, and that's what Kingsford did, if I understand correctly.

    But I take the point that we haven't written this abstract yet. It may turn itself into a critical introduction, which would itself be a born-digital text external to the Stow.

    I also see the point about Laurent's cases, but I maintain there is a difference between an editor-supplied abstract (of which the author may know nothing at all) and an authorial abstract (such as an author would supply for an article they submit to a journal). The former belongs in the header; the latter in <front>.

    I worry about Piotr's point #2: this amounts to saying "if we provide this feature, people may abuse it for something it wasn't intended for." That's always true of every feature.

     
  • Lou Burnard

    Lou Burnard - 2015-03-16

    I think Laurent's examples could all be handled by using a list inside the existing abstract element, with label to indicate the types. Alternatively, if there really is an agreed typology for these pseudo-divs within an abstract, typed ab elements could be used.

     
  • Laurent Romary

    Laurent Romary - 2015-03-16

    I am completely in line with Martin. These are editorial abstracts which are maintained by publishers or, within publication repositories, by librarians as part of the metadata attached to whatever document. Finding arguments about why not to use the existing abstract element in such cases is weird. What is clear to me is that a) abstract contains a very poor number of possible children and b) div would ideally cover observed usages. I do not see the kind of ancestral fear generated by the idea of having div's there. This is a very localized use within a very specific element.

     
  • Laurent Romary

    Laurent Romary - 2015-03-16

    Just seeing Lou's last comment: we are speaking here of a huge corpus of documents across all possible scientific domains where local practices have induced a wide variety of forms for abstracts. And yes, they do look like "small" documents at times. Hacking lists to this purpose would definitely not be generalizable.

     
  • Lou Burnard

    Lou Burnard - 2015-05-28

    Discussion suggests we may need to define a different kind of <div> for the header (<section>)

     
  • Lou Burnard

    Lou Burnard - 2015-05-28
    • assigned_to: Hugh A. Cayless
     
  • Laurent Romary

    Laurent Romary - 2015-05-28

    Makes sense. Something simpler as div, I guess.

     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-05-28

    Reassigning to Lou to produce non-transcriptional divs and ps.

     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-05-28
    • assigned_to: Hugh A. Cayless --> Lou Burnard