DOI import error

Matt - Zee
2009-09-16
2013-05-28
  • Matt - Zee

    Matt - Zee - 2009-09-16

    While importing via DOI string, I received an error:
    Fatal error: Call to a member function getTagContent() on a non-object in /var/www/localhost/htdocs/refbase/includes/import.inc.php on line 378

    In order to debug what happen, I added a few print_r lines. I am fairly sure that splitSourceText() around line 323 is behaving badly.

    The following is obtained when trying to import
    "doi:10.1098/rspa.1984.0023"

    print_r($sourceText) gives:
    Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023

    print_r($recordArray) after splitSourceText gives:
    Array (  =>   => Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences (1934-1990)0080-46300080463819843921802M. V.Berry38198463020064557Q62807W5JK4176U610.1098/rspa.1984.002320080220093117http://rspa.royalsocietypublishing.org/cgi/doi/10.1098/rspa.1984.0023http://journals.royalsociety.org/index/10.1098/rspa.1984.0023 )

    Notice that Array is empty, leading to an empty XML later in the for loop.

    I will try to understand splitSourceText later when I have some free time.

    -MZ

     
  • Matthias Steffens

    Hi MZ,

    your analysis is correct, thanks for the report. It seems as if CrossRef did adjust their XML output recently. They've added an XML declaration and added namespaces & other attributes to the `<doi_records>`and `<doi_record>` tags. Also, the first (root) tag contains a newline character in its opening tag. Together, this caused the split pattern to fail.

    To fix this, open file 'includes/import.inc.php' and replace this line (in function 'crossrefToRefbase()'):

        $recordDelimiter = "(\s*<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'

    with this one:

        $recordDelimiter = "(\s*(<\?xml*\?>\s*)?<doi_records*>)?\s*(?=<doi_record*>)" // splits before '<doi_record>'

    Let me know if this doesn't work for you.

    Matthias

     
  • Matthias Steffens

    Sorry for the messed-up code, the Sourceforge Markdown parser (and its preview) still seems to be quite buggy, it certainly does not adhere to the Markdown format… :-(

    The XML entities in the above code must be decoded, of course. I.e. &quot ; (sans the space) should be ", &lt ; should be <, and  &gt ; should be >.

    Matthias

     
  • Matt - Zee

    Matt - Zee - 2009-09-18

    I had a dumb solution yesterday to shift the first element out, then later managed to learn some basic perl regexp and came up with the same match pattern.

    Thanks.

     
  • Martin Fluegge

    Martin Fluegge - 2009-09-28

    Hi guys,

    after appying the patch the problem is fixed for the doi mathfield used (10.1098/rspa.1984.0023). But using the following doi (10.1007/978-3-540-68636-1) still results into the above mentioned error.

    The only difference I figured out is that one ia a journal and the other a book. Could this be the problem?
    Cheers,

    Martin

     
  • Richard Karnesky

    Yes; the resource type issue is probably why that does not import.  The DOI importer was made before the unixref-1.1 schema was released & much of the syntax found in that file is not offered by unixref-1.0 (and we weren't using a schema when we wrote the import routine, but only examples).

    We will have to improve the importer to follow the newer schema in the future.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks