While importing via DOI string, I received an error:
Fatal error: Call to a member function getTagContent() on a non-object in /var/www/localhost/htdocs/refbase/includes/import.inc.php on line 378
In order to debug what happen, I added a few print_r lines. I am fairly sure that splitSourceText() around line 323 is behaving badly.
The following is obtained when trying to import
Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023
print_r($recordArray) after splitSourceText gives:
Array ( => => Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023 )
Notice that Array is empty, leading to an empty XML later in the for loop.
I will try to understand splitSourceText later when I have some free time.
Here's the culprit:
After preg_split in splitSourceText, the result $sourceArray is:
<xml version = "1.0" encoding="UTF-8" ?>
<doi_records xmlns="http://www.crossref.org/xschema/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://www.crossref.org/schema/unixref1.1.xsd http://www.crossref.org/xschema/1.0 http://www.crossref.org/schema/unixref1.0.xsd">
And it should be ignored.
your analysis is correct, thanks for the report. It seems as if CrossRef did adjust their XML output recently. They've added an XML declaration and added namespaces & other attributes to the `<doi_records>`and `<doi_record>` tags. Also, the first (root) tag contains a newline character in its opening tag. Together, this caused the split pattern to fail.
To fix this, open file 'includes/import.inc.php' and replace this line (in function 'crossrefToRefbase()'):
$recordDelimiter = "(\s*<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'
with this one:
$recordDelimiter = "(\s*(<\?xml*\?>\s*)?<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'
Let me know if this doesn't work for you.
Sorry for the messed-up code, the Sourceforge Markdown parser (and its preview) still seems to be quite buggy, it certainly does not adhere to the Markdown format… :-(
The XML entities in the above code must be decoded, of course. I.e. &quot ; (sans the space) should be ", &lt ; should be <, and &gt ; should be >.
I had a dumb solution yesterday to shift the first element out, then later managed to learn some basic perl regexp and came up with the same match pattern.
after appying the patch the problem is fixed for the doi mathfield used (10.1098/rspa.1984.0023). But using the following doi (10.1007/978-3-540-68636-1) still results into the above mentioned error.
The only difference I figured out is that one ia a journal and the other a book. Could this be the problem?
Yes; the resource type issue is probably why that does not import. The DOI importer was made before the unixref-1.1 schema was released & much of the syntax found in that file is not offered by unixref-1.0 (and we weren't using a schema when we wrote the import routine, but only examples).
We will have to improve the importer to follow the newer schema in the future.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.