Menu

#880 PubMed Import broken

open
nobody
None
5
2015-02-18
2010-05-19
Anonymous
No

Hi!

It seems the pubmed import is currently broken. Take e.g. http://www.ncbi.nlm.nih.gov/sites/entrez/5063906, select Send To -> File -> Medline. Store the resulting file to disk and try to import to JabRef. The importer seems to

- confuse FAU and AU records taking FAU as name for authors.
- Ignores document type (PT) and sets it to OTHER :(
- issn goes to "number" but only the second IS field is imported
- all I get is

@OTHER{,
author = {Aaij C FAU Borst, P and P, Borst},
number = {0006-3002 (Linking)},
pages = {--},
timestamp = {2010.05.19},
title = {{T}he gel electrophoresis of {DNA}.}
}

This is atcually pretty useless. :( Some mapping like the following pseudo-perl might be more appropriate.

Starting out with XML parsing:

    $record\{journal\}  = $node->findvalue\('//MedlineJournalInfo/MedlineTA'\);
    $record\{issn\}     = $node->findvalue\('//Article/Journal/ISSN'\);
    $record\{abbrev\}   = $node->findvalue\('//Article/Journal/ISOAbbreviation'\);

    $record\{vol\}      = $node->findvalue\('//Article/Journal/JournalIssue/Volume'\);
    $record\{issue\}    = $node->findvalue\('//Article/Journal/JournalIssue/Issue'\);
    $record\{year\}     = $node->findvalue\('//Article/Journal/JournalIssue/PubDate/Year'\);
    $record\{month\}    = $node->findvalue\('//Article/Journal/JournalIssue/PubDate/Month'\);
    $record\{day\}      = $node->findvalue\('//Article/Journal/JournalIssue/PubDate/Day'\);
    $record\{page\}     = $node->findvalue\('//Article/Pagination/MedlinePgn'\);

    $record\{title\}    = $node->findvalue\('//Article/ArticleTitle'\);
    $record\{abstract\} = $node->findvalue\('//Article/Abstract/AbstractText'\);

    $record\{aff\}      = $node->findvalue\('//Article/Affiliation'\);
    $record\{lang\}     = $node->findvalue\('//Article/Language'\);

    for \(my $i=1; $i <= eval\($node->find\('count\(//Article/AuthorList/Author\)'\)\)+1; $i++\) \{
        $record\{surnames\}\[$i-1\]   = $node->findvalue\("//Article/AuthorList/Author\[$i\]/LastName"\);
        $record\{givennames\}\[$i-1\] = $node->findvalue\("//Article/AuthorList/Author\[$i\]/ForeName"\);
        $record\{initials\}\[$i-1\] = $node->findvalue\("//Article/AuthorList/Author\[$i\]/Initials"\);
    \}
    for \(my $i=1; $i <= eval\($node->find\('count\(//MeshHeadingList/MeshHeading\)'\)\)+1; $i++\) \{
        $descriptor = $node->findvalue\("//MeshHeadingList/MeshHeading\[$i\]/DescriptorName"\);
        my $quantifier = "";
        for \(my $j=1; $j <= eval\($node->find\("count\(//MeshHeadingList/MeshHeading\[$i\]/QualifierName\)"\)\); $j++\) \{
            $quantifier .= $node->find\("//MeshHeadingList/MeshHeading\[$i\]/QualifierName"\) . ", ";
        \}
        $descriptor =~ s/&amp;/\&/;
        $quantifier =~ s/&amp;/\&/;
        if \($quantifier ne ""\) \{
            $record\{MeSH\}\[$i-1\] = $descriptor . ": $quantifier";
        \} else \{
            $record\{MeSH\}\[$i-1\] = $descriptor;
        \}
    \}
    for \(my $i=1; $i <= eval\($node->find\('count\(//ChemicalList/Chemical\)'\)\)+1; $i++\) \{
        $record\{Chemicals\}\[$i-1\] = $node->findvalue\("//ChemicalList/Chemical\[$i\]/NameOfSubstance"\);
        $record\{CAS\}\[$i-1\]       = $node->findvalue\("//ChemicalList/Chemical\[$i\]/RegistryNumber"\);
   \}

    $record\{pmid\}     = $node->findvalue\("//ArticleIdList/ArticleId\[\@IdType='pubmed'\]"\);
    $record\{pmc\}      = $node->findvalue\("//ArticleIdList/ArticleId\[\@IdType='pmc'\]"\);
    $record\{doi\}      = $node->findvalue\("//ArticleIdList/ArticleId\[\@IdType='doi'\]"\);
    $record\{so\}       = "$record\{journal\}. $record\{year\} $record\{month\};"
                            . "$record\{vol\} \($record\{iss\}\):$record\{page\}";

This could translate to

my \($startpage, $endpage\) = split\(/-/, $record\{page\}\);
$BibTeX .= "author      = \\\{";
for \(my $i = 0; $i < $\#\{$record\{surnames\}\}; $i++\) \{
    $BibTeX .= $record\{surnames\}\[$i\] . ", " . $record\{givennames\}\[$i\] . " AND ";
\}
$BibTeX =~ s/ AND $//;
$BibTeX .= "\\\},\n";
$BibTeX .= "title       = \\\{$record\{title\}\\\},\n";
$BibTeX .= "journal     = \\\{$record\{journal\}\\\},\n";
$BibTeX .= "volume      = \\\{$record\{vol\}\\\},\n";
$BibTeX .= "number      = \\\{$record\{issue\}\\\},\n";
$BibTeX .= "year        = \\\{$record\{year\}\\\},\n";
$BibTeX .= "pages       = \\\{$startpage--$endpage\\\},\n";
$BibTeX .= "issn        = \\\{$record\{issn\}\\\},\n";
$BibTeX .= "jabbrev     = \\\{$record\{abbrev\}\\\},\n";
$BibTeX .= "month       = \\\{$record\{month\}\\\},\n";
$BibTeX .= "day         = \\\{$record\{day\}\\\},\n";
$BibTeX .= "issn        = \\\{$record\{issn\}\\\},\n";
$BibTeX .= "institution = \\\{$record\{aff\}\\\},\n";
$BibTeX .= "abstract    = \\\{$record\{abstract\}\\\},\n";
$BibTeX .= "language    = \\\{$record\{lang\}\\\},\n";
$BibTeX .= "doi         = \\\{$record\{doi\}\\\},\n";
$BibTeX .= "pmid        = \\\{$record\{pmid\}\\\},\n";
$BibTeX .= "pmc         = \\\{$record\{pmc\}\\\},\n";
$BibTeX .= "keywords = \\\{";
for \(my $i = 0; $i < $\#\{$record\{MeSH\}\}; $i++\) \{
    $BibTeX .= $record\{MeSH\}\[$i\] . " / ";
\}
for \(my $i = 0; $i < $\#\{$record\{Chemicals\}\}; $i++\) \{
    $BibTeX .= $record\{CAS\}\[$i\] . " \(" . $record\{Chemicals\}\[$i\] . "\) / ";
\}
$BibTeX .= "\\\},\n";
$BibTeX .= "file = \\\{";
$BibTeX .= "Pubmed:http\\\://www.ncbi.nlm.nih.gov/sites/entrez/$record\{pmid\}:URL;";
$BibTeX .= 'PMC Fulltext:http\://www.pubmedcentral.nih.gov/articlerender.fcgi?artid='
        . $record\{pmc\} . '&blobtype=pdf:URL;' if $record\{pmc\} ne "";

Discussion


Log in to post a comment.