|
From: Alan M. <am...@gm...> - 2024-09-06 13:58:38
|
<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Ah, ok that makes sense. Guessing encodings can be a real pain. Thanks for looking into this.</div> <div> </div> <div>Alan</div> <div><br/> </div> <div class="signature">--<br/> Alan Munn<br/> am...@gm...</div> <div> <div> <div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"> <div style="margin:0 0 10px 0;"><b>Sent:</b> Friday, September 06, 2024 at 8:39 AM<br/> <b>From:</b> "Christiaan Hofman" <cmh...@gm...><br/> <b>To:</b> "BibDesk users list" <bib...@li...><br/> <b>Subject:</b> Re: [Bibdesk-users] Spurious characters from DOI import</div> <div name="quoted-content"> <div>The problem seems to be that the DOI server does not tell us which incoming it is returning its data. And we seem to guess the wrong encoding. This will be improved in the next release. <div> </div> <div>Christiaan <div> <blockquote> <div>On 6 Sep 2024, at 11:18, Christiaan Hofman <<a href="mailto:cmh...@gm..." onclick="" target="_blank">cmh...@gm...</a>> wrote:</div> <div> <div>The problem is with the DOI site, which provides us with the bibtex record. <div> </div> <div>Christiaan <div> <blockquote> <div>On 6 Sep 2024, at 01:18, Alan Munn via Bibdesk-users <<a href="mailto:bib...@li..." onclick="" target="_blank">bib...@li...</a>> wrote:</div> <div> <div> <div style="font-family: Verdana;font-size: 12.0px;"> <div>Hi, when I paste the following DOI into BibDesk,</div> <div> </div> <div><a href="http://dx.doi.org/10.1080/10489223.2012.685026" target="_blank">http://dx.doi.org/10.1080/10489223.2012.685026</a></div> <div> </div> <div>the resulting record looks like this:</div> <div> </div> <div> <div><br/> @article{Miller_2012,<br/> author = {Miller, Karen L. and Schmitt, Cristina},<br/> date-added = {2024-09-05 16:26:28 -0400},<br/> date-modified = {2024-09-05 16:26:28 -0400},<br/> doi = {10.1080/10489223.2012.685026},<br/> issn = {1532-7817},<br/> journal = {Language Acquisition},<br/> month = jun,<br/> number = {3},<br/> pages = {223�261},<br/> publisher = {Informa UK Limited},<br/> title = {Variable Input and the Acquisition of Plural Morphology},<br/> url = {<a href="http://dx.doi.org/10.1080/10489223.2012.685026" target="_blank">http://dx.doi.org/10.1080/10489223.2012.685026</a>},<br/> volume = {19},<br/> year = {2012},<br/> bdsk-url-1 = {<a href="http://dx.doi.org/10.1080/10489223.2012.685026" target="_blank">http://dx.doi.org/10.1080/10489223.2012.685026</a>}}</div> <div> </div> <div>As you can see, the pages field contains an odd character, but the problem is worse than that, it also contains two invisible characters, so the what ends up in the .bib entry is 226 (latin small a circumflex), 128 (PAD), 147 (STS = set transmit state). This causes various problems when doing other things with the bibiliography (in my case using pandoc to generate an html bibliography; see <a href="https://github.com/jgm/pandoc/discussions/10151#discussioncomment-10548191" target="_blank">https://github.com/jgm/pandoc/discussions/10151#discussioncomment-10548191</a> for some discussion on the pandoc repo).</div> <div> </div> <div>Is this a problem with BibDesk's use of the scraped DOI data or is it coming directly from the DOI server itself?</div> <div> </div> <div>Thanks</div> <div> </div> <div>Alan</div> <div> </div> </div> <div class="signature">--<br/> Alan Munn<br/> <a href="mailto:am...@gm..." onclick="" target="_blank">am...@gm...</a></div> </div> </div> </div> </blockquote> </div> </div> </div> </div> </blockquote> </div> </div> _______________________________________________ Bibdesk-users mailing list Bib...@li... <a href="https://lists.sourceforge.net/lists/listinfo/bibdesk-users" target="_blank">https://lists.sourceforge.net/lists/listinfo/bibdesk-users</a></div> </div> </div> </div> </div></div></body></html> |