Hello all, again!

 

Just wondering if someone out there may have a backup (doesn’t has to be recent) of records harvested from PubMed Central, but with a special twist: metadataPrefix = pmc_fm

 

Although doing the harvesting at night (in the States), breaking it in the morning and resume it at night or weekends (off-peak times), we are talking about 2.1 million records and that is quite a considerable amount of work to ask pubmed servers to do.

 

If someone has that and would be so kind to share it (don’t think PubMed Central will not like that, quite the opposite)…

 

But you may ask why metadataPrefix = pmc_fm instead of the usual metadataPrefix = oai_dc…

 

Well, Demian’s lastest patch in http://vufind.org/jira/browse/VUFIND-258 (Use VuFind as article index) makes possible to full take advantage of the healthiness of information PubMed sends in this metadata schema, not present in dc one’s, beside that in  pmc_fm there is a full set of keywords that in oai_dc is just (usually):

 

<dc:subject>Primary Research</dc:subject>

 

In pmc_fm:

 

      <kwd-group>

        <kwd>breast cancer</kwd>

        <kwd>breast screening</kwd>

        <kwd>cohort study</kwd>

        <kwd>hormone replacement therapy</kwd>

        <kwd>lifestyle factors</kwd>

        <kwd>morbidity</kwd>

        <kwd>mortality</kwd>

      </kwd-group>

 

Besides that, I really want is this group:

 

<journal-meta>

      <journal-id journal-id-type="nlm-ta">Breast Cancer Res</journal-id>

      <journal-title>Breast Cancer Research</journal-title>

      <issn pub-type="ppub">1465-5411</issn>

      <issn pub-type="epub">1465-542X</issn>

      <publisher>

        <publisher-name>BioMed Central</publisher-name>

        <publisher-loc>London</publisher-loc>

      </publisher>

    </journal-meta>

 

and

 

      <volume>1</volume>

      <issue>1</issue>

      <fpage>73</fpage>

      <lpage>80</lpage>

 

Yes, it all there whilst in oai_dc the record is “just” (at least this random record):

 

---------------

<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><identifier>pubmed-13913</identifier><datestamp>2001-02-27</datestamp>

  <dc:title>The Million Women Study: design and characteristics of the study population</dc:title>

  <dc:creator/>

  <dc:subject>Primary Research</dc:subject>

  <dc:description/>

  <dc:publisher>BioMed Central</dc:publisher>

  <dc:identifier>http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=13913</dc:identifier>

  <dc:type>Text</dc:type>

  <dc:language>en</dc:language>

  <dc:rights>Copyright © 1999 Current Science Ltd</dc:rights>

</oai_dc:dc>

 

-----------------

 

Ok, I guess I will receive an e-mail soon from pubmed central blaming me for the quantity of extra requests from all of you out there that are re-harvesting their repository because of this message of mine… :)

 

Thanking in advance anyone who has done this harvesting and might share the .tar (.zip, whatever) of it;

 

All the best,

 

Filipe

 

--------------------------
Filipe Manuel S. Bento  |  http://about.me/filipeb
Universidade de Aveiro | Campus Universitário Santiago 
3810-193 Aveiro | Portugal