#112 Articles with duplicate pubmed ids are getting into the db

Bugs (45)

When trying to input scanned date for this paper:

I get the error message saying the pubmed id is already
in use by another paper. THe link is to

This should have been caught before - they are same
papers yet both are in Pub.


  • Danny Yoo

    Danny Yoo - 2005-08-09

    Logged In: YES


    The issue appears to be that duplicate checking should
    consider articles with the same pubmed id to be duplicates
    of each other.

    First, how extensive is this problem?

    mysql> select a.id from pub_article a, pub_article b where
    a.id < b.id and a.pubmed_id = b.pubmed_id and a.pubmed_id is
    not null and a.pubmed_id != '' and a.is_obsolete='n' and

    We're getting about 32 candidates here.

    Second, how do we correct existing damage to the database?
    We need to arbitrarily choose one of the articles to
    obsolete, and merge information as necessary. I'll look at
    the articles and see if this is a particularly simple thing
    to do.

    Finally, how do we prevent this from happening again?
    Offhand, we should have pubmed_id be a unique index, to
    prevent duplicate articles from having the same pubmed_id.

  • Tanya Berardini

    Tanya Berardini - 2005-08-23

    Logged In: YES

    When merging, please do NOT do this arbitrarily but retain
    the article that has annotations associated with it. If
    neither article has associations, that's fine, keep either one.

  • Tanya Berardini

    Tanya Berardini - 2005-09-28

    Logged In: YES

    I believe this is the item that describes the duplicate PMID
    problem. There needs to be some troubleshooting on how the
    duplicates are getting in and what criteria we need to use
    to merge them. I think Aleksey will be of some help when
    attacking this problem, at the very least in providing some
    background and some examples.

  • Tanya Berardini

    Tanya Berardini - 2005-09-28

    Logged In: YES

    I wanted to move this guy up to priority 9 but since I don't
    have ownership of the item, I can't do that. Can you,
    Danny? This is actually a very pressing problem and if I
    could , I'd make this priority 10.



  • Danny Yoo

    Danny Yoo - 2005-10-13
    • priority: 5 --> 9
  • Danny Yoo

    Danny Yoo - 2005-10-14
    • summary: cannot validate paper --> Articles with duplicate pubmed ids are getting into the db
  • Danny Yoo

    Danny Yoo - 2005-10-18

    Logged In: YES

    Existing data problems corrected in the database, and also
    modified the bulk pubmed loader to take pubmed id into
    consideration when loading.

    This still doesn't resolve the issue of pulling updated
    information from pubmed into pubsearch, but I think that
    should be considered a separate (but related) issue. I'll
    close this bug, and open a new one that describes the problem.

    Closing bug.

  • Danny Yoo

    Danny Yoo - 2005-10-18
    • status: open --> closed-fixed

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks