DOI import error

Matt - Zee
2009-09-16
2013-05-28
  • Matt - Zee
    Matt - Zee
    2009-09-16

    While importing via DOI string, I received an error:
    Fatal error: Call to a member function getTagContent() on a non-object in /var/www/localhost/htdocs/refbase/includes/import.inc.php on line 378

    In order to debug what happen, I added a few print_r lines. I am fairly sure that splitSourceText() around line 323 is behaving badly.

    The following is obtained when trying to import
    "doi:10.1098/rspa.1984.0023"

    print_r($sourceText) gives:
    Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023

    print_r($recordArray) after splitSourceText gives:
    Array (  =>   => Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023 )

    Notice that Array is empty, leading to an empty XML later in the for loop.

    I will try to understand splitSourceText later when I have some free time.

    -MZ

     
  • Hi MZ,

    your analysis is correct, thanks for the report. It seems as if CrossRef did adjust their XML output recently. They've added an XML declaration and added namespaces & other attributes to the `<doi_records>`and `<doi_record>` tags. Also, the first (root) tag contains a newline character in its opening tag. Together, this caused the split pattern to fail.

    To fix this, open file 'includes/import.inc.php' and replace this line (in function 'crossrefToRefbase()'):

        $recordDelimiter = "(\s*<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'

    with this one:

        $recordDelimiter = "(\s*(<\?xml*\?>\s*)?<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'

    Let me know if this doesn't work for you.

    Matthias

     
  • Sorry for the messed-up code, the Sourceforge Markdown parser (and its preview) still seems to be quite buggy, it certainly does not adhere to the Markdown format… :-(

    The XML entities in the above code must be decoded, of course. I.e. &quot ; (sans the space) should be ", &lt ; should be <, and  &gt ; should be >.

    Matthias

     
  • Matt - Zee
    Matt - Zee
    2009-09-18

    I had a dumb solution yesterday to shift the first element out, then later managed to learn some basic perl regexp and came up with the same match pattern.

    Thanks.

     
  • Martin Fluegge
    Martin Fluegge
    2009-09-28

    Hi guys,

    after appying the patch the problem is fixed for the doi mathfield used (10.1098/rspa.1984.0023). But using the following doi (10.1007/978-3-540-68636-1) still results into the above mentioned error.

    The only difference I figured out is that one ia a journal and the other a book. Could this be the problem?
    Cheers,

    Martin

     
  • Yes; the resource type issue is probably why that does not import.  The DOI importer was made before the unixref-1.1 schema was released & much of the syntax found in that file is not offered by unixref-1.0 (and we weren't using a schema when we wrote the import routine, but only examples).

    We will have to improve the importer to follow the newer schema in the future.