Re: [Refdb-users] med2ris failure
Status: Beta
Brought to you by:
mhoenicka
|
From: Markus H. <mar...@mh...> - 2007-10-15 10:28:51
|
Hi Bruce, Quoting Bruce Hayward <b.h...@le...>: > Recently I've been getting lots of failures when I try to convert a > Pubmed xml file to ris format. I can feel your pain, brother :-( This time it's not me but Pubmed who =20 screwed up things. They have introduced a couple of changes lately =20 which create invalid XML. > junk after document element at line 216, column 4, byte 11204 at > /usr/local/lib/perl5/site_perl/5.8.8/mach/XML/Parser.pm line 187 > > In this case the ris file contains only the first reference This is because the XML file contains consecutive top-level =20 PubmedArticle elements. In a valid XML file these should be wrapped in =20 a PubmedArticleSet element as they used to be until a few weeks ago. A =20 workaround is to request the datasets individually. > not well-formed (invalid token) at line 3, column 27, byte 44 at > /usr/local/lib/perl5/site_perl/5.8.8/mach/XML/Parser.pm line 187 > > In this case the ris file is empty. This is because Pubmed no longer correctly exports attribute values. =20 Compare the second element of a valid (until mid last week) and a =20 recent entry: <MedlineCitation Owner=3D"NLM" Status=3D"MEDLINE"> vs. <MedlineCitation Owner Status> This again is invalid XML which the expat parser used by med2ris =20 rightfully barfs at. One workaround is to copy the pretty-printed version (which happens to =20 be valid) to your editor and save that to a file, instead of using the =20 built-in "File" or "Text" tools in the drop-down box of the Pubmed =20 interface. I have encountered additional XML problems in the last week or so. =20 E.g. the author name "O'Brien" is no longer exported as is, but as =20 "O'Brien" which again is invalid XML as the non-standard entity =20 isn't declared anywhere. I've complained at the help desk about the latter two issues (I didn't =20 know about the first one yet), and they told me they were working at a =20 fix. But I think it won't hurt if you go ahead and contact them as =20 well to tighten the thumb screws a bit further. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |