Menu

#24 "nuccore" for NCBI Nucleotide?

pending
None
2015-11-23
2015-11-23
No

It would be nice to have http://identifiers.org/ncbinucleotide as a complement to http://identifiers.org/ncbigene and http://identifiers.org/ncbiprotein. I don't have any strong opinion on what the namespace should be called; their URL scheme suggests nuccore, and the name of the collection seems to be Nucleotide.

Discussion

  • Nick Juty

    Nick Juty - 2015-11-23

    Hi Jon,
    Since 'nucleotide' comes under the INSDC collaboration (a good link here: http://www.ddbj.nig.ac.jp/sub/acc_def-e.html), we had always intended nucleotides to be references through that Registry entry: http://identifiers.org/insdc/
    Can you let me know if that would solve your current issue? I have added 'NCBI nucleotide' as a synonym to make this entry more 'findable' for others, if that was how you were searching for it? Please let me know, or else we can look at other potential solutions for your problem.

    cheers

    Nick

     
    • Jon Olav Vik

      Jon Olav Vik - 2015-11-23

      On Mon, Nov 23, 2015 at 11:58 AM, Nick Juty njuty@users.sf.net wrote:

      Since 'nucleotide' comes under the INSDC collaboration (a good link here:
      http://www.ddbj.nig.ac.jp/sub/acc_def-e.html), we had always intended
      nucleotides to be references through that Registry entry:
      http://identifiers.org/insdc/
      Can you let me know if that would solve your current issue?

      That may well be. However, the second link here:
      http://www.ncbi.nlm.nih.gov/nuccore/195970460
      http://identifiers.org/insdc/195970460
      fails with
      '195970460' does not match the regular expression
      '^([A-Z]\d{5}|[A-Z]{2}\d{6}|[A-Z]{4}\d{8}|[A-J][A-Z]{2}\d{5})(.\d+)?$' for
      insdc.

      So it seems that the regexp may need relaxing (as for
      http://identifiers.org/dbest).

      I have added 'NCBI nucleotide' as a synonym to make this entry more
      'findable' for others, if that was how you were searching for it? Please
      let me know, or else we can look at other potential solutions for your
      problem.

      Thanks for the synonym. I obviously need to try a little harder when
      searching for entries at identifiers.org. I have mostly used fragments of
      the URL scheme, such as nucest and nuccore. One idea might perhaps be to
      have the search return identifiers.org entries where URL fragments match
      the search term.

      Thanks again for your help,
      Jon Olav

       
  • Nick Juty

    Nick Juty - 2015-11-23
    • status: open --> pending
    • assigned_to: Nick Juty
    • Collection name: [u'NCBI Nucleotide', u'ncbinucleotide'] --> [u"[u'NCBI Nucleotide', u'ncbinucleotide']", u"[u'NCBI Nucleotide', u'ncbinucleotide']"]
    • Priority: low --> normal
     
  • Nick Juty

    Nick Juty - 2015-11-23

    Hi Jon,

    You are certainly giving me some interesting work today! Thanks!

    The identifiers.org URI you have (http://identifiers.org/insdc/195970460) is actually a gi, so this will work:
    http://identifiers.org/ncbigi/gi:195970460
    (Resolves to http://www.ncbi.nlm.nih.gov/nuccore/gi:195970460)

    Alternatively, there is another accession given of EU880417. That would actually work with the insdc URI (http://identifiers.org/insdc/EU880417.1), both with and without the version ('.1').
    (Resolves to: http://www.ncbi.nlm.nih.gov/nuccore/EU880417.1)

    Does that help you? (And I am truly very sorry about the number of ways to access this identical information, ie. 'nuccore' with just digits, 'gi' using same digits but prefixed, or insdc with a different identifier. Grr.)

    And yes, searching for URL fragments is a good idea. That would be another route in to find appropriate annotation/URI sources. Thanks for the suggestion!

    cheers

    Nick

     
    • Jon Olav Vik

      Jon Olav Vik - 2015-11-25

      On Mon, Nov 23, 2015 at 1:37 PM, Nick Juty njuty@users.sf.net wrote:

      The identifiers.org URI you have (http://identifiers.org/insdc/195970460)
      is actually a gi, so this will work:

      http://identifiers.org/ncbigi/gi:195970460
      (Resolves to http://www.ncbi.nlm.nih.gov/nuccore/gi:195970460)

      Alternatively, there is another accession given of EU880417. That would
      actually work with the insdc URI (http://identifiers.org/insdc/EU880417.1),
      both with and without the version ('.1').
      (Resolves to: http://www.ncbi.nlm.nih.gov/nuccore/EU880417.1)

      Does that help you? (And I am truly very sorry about the number of ways to
      access this identical information, ie. 'nuccore' with just digits, 'gi'
      using same digits but prefixed, or insdc with a different identifier. Grr.)

      For this application I think I'll stick with
      http://www.ncbi.nlm.nih.gov/nuccore/
      http://www.ncbi.nlm.nih.gov/nuccore/EU880417.1 URLs. What I need is quick
      access to records identified by my collaborators. I can live with multiple
      synonyms for the same NCBI record, especially as NCBI actually keeps two
      sets of primary keys http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html
      for sequences... However, I will script the looking up of an unambiguous
      identifier I can use as a join key in our further analyses.

      And yes, searching for URL fragments is a good idea. That would be another
      route in to find appropriate annotation/URI sources. Thanks for the
      suggestion!

      You're welcome. Thanks for your assistance!

      Best regards,
      Jon Olav

       

Log in to post a comment.