I have been using BibDesk for years and I really appreciate your effort.
I am on the latest version v1.9.4. When I searched PMID:38788221 (for example) on PubMed, BibDesk found the paper but failed to extract its page number, which should be "btae333". Earlier BibDesk versions worked fine because old papers in my library all have page numbers. The bug has probably been there for several months (when recently a production editor reminded me of missing page numbers in multiple references) but I don't know which exact version introduced this issue.
The parser has not changed effectively for a long time. So this is not caused by BibDesk. In fact, the data we get from this item does not contain an page information. So it is probabl;y a change by PubMed that caused you to fail getting it.
There seem multiple ways to retrieve data from PubMed. Which method are you using? The page number "bate333" is shown on the PubMed page: https://pubmed.ncbi.nlm.nih.gov/38788221/
We use their API to get XML data. What you see on a web page is not useful for automated download.
Oh, and BTW btae333 does not look like a page. It is in fact part of the id (including the DOI) of the article. So if the PubMed page gives you btae333 as the page in their pretty printed result, than they made (another)_ mistake there as well.
At least in our field (biomedical research), most online-only journals give you a "page" number like "btae333". There are no actual page numbers. I haven't checked old PubMed XML files because I don't have one. I guess old PubMed put "btae333" in the Pagination element [1] but later they decided that was confusing. Now they put "btae333" at an ELocationID element [2]. It looks like the following now:
<elocationid eidtype="pii" validyn="Y">btae333</elocationid>
Is it possible to use this as the page number when the Pagination element is missing?
That does not make much sense to me. PII is an identifier. It is not appropriate for a Pages field in BibTeX. It would give ill formed citations from BibTeX.
We need something equivalent to page number to pinpoint a paper in a volume/issue. The current practice unfortunately chooses this identifier. For example, a Nature article cites PMID:35357911 as (without DOI):
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science
376, eabl4178 (2022).
eabl4178 is de facto the page number. If you click "cite" on the Science website, the citation format is the basically same (plus DOI). In the PDF of this paper downloaded from Science, the page number starts with 1. I can give you many more examples.
Journals require to cite page numbers in the reference and they all use something like "eabl4178" as page numbers. This is a de facto standard no matter how you argue. Zotero is following this practice. EndNote and paperpile are probably doing that, too. We'd better have a solution in BibDesk.
As PubMed probably was using "eabl4178" in Pagination, many papers in my BibDesk library have pages like this and I haven't had any problems with BibTeX. Only missing the pages is a problem.
If you worry a page number like "eabl4178" may interfere with BibTeX, you may add an option and let users decide whether to use such a string as page numbers. The bottom line is we need page numbers for citation and many papers don't have real page numbers.
PS: this problem is not specific to PubMed as many journals, like Science, don't have real page numbers these days.
Last edit: lh3 2024-07-30
I'll add the ELocationID as the Pages fields when there is no Pagination.
It is a weird practice, to put it as a pii ID, even if it is not even the PII number, which is a totally different number.
Thank you so much! I do agree with you that the current is a weird practice. I didn't realize this mess until now.
To make sure pii is the right tag, I asked PubMed helpdesk how to retrieve "page numbers" for online-only papers without pagination. They replied:
and pointed me to the latest pubmed DTD.