Import Problem Bibtex/Endnote/Mods XML

Help
Jan Lorenz
2009-12-09
2013-05-28
  • Jan Lorenz
    Jan Lorenz
    2009-12-09

    I have a problem importing bibtex, endnote and mods xml from file and from snippet. I checked it with the corresponding example snippets from the web reference and with own bibtex files.

    For all formats I get the message:

    There were validation errors regarding the data you entered:
    Record 1: Unrecognized data format! Required field missing: TY
    Skip records with unrecognized data format

    When I check the checkbox (and reassign the path to the file when importing a file) it works but it omits the very first item in the file/snippet. The rest is imported mainly correct.

    What is wrong?
    Export works.

    Another problem is when I import the example bibtex snippet from the web reference. The item Rosler1990 is imported with everything in the author field like:
    Author: R"osler, J.; Arzt, E. title =. A.N.M.-B.C.E. for D.S.M. journal =. A.M. et M. year =. 1990 volume =. 38 number =. 4. pages =. 671–683
    (except for the Key Rosler1990 which is imported correctly)

     
  • Jan Lorenz
    Jan Lorenz
    2009-12-14

    Hi Matthias,

    thank you for your reply.
    It worked with RIS. The first entry is included then. I installed bibutils locally on my computer for this.

    Now my problem is that due to some encoding trouble the special characters are wrong after the bibutils conversations
    bib2xml xml2ris

    They were amazingly correct before (but in exchange there was the first-entry-missing-trouble). But perhaps I need to play around with bibutils.
    Any hints?

    Now, I think that my inital problem might be related to my MySQL-database having
    MySQL charset:  UTF-8 Unicode (utf8)
    (Cannot change this it is on  a shared host.)
    Could that be?

    Best regards,
    Jan

     
  • Jan Lorenz
    Jan Lorenz
    2009-12-15

    News:

    With

    bib2xml my.bib
    xml2ris -o utf8 my.xml

    I managed to produce a correct ris-file in utf8 from my utfb bib-file.

    But then the initial error this thread is about reappears. The first entries type is not recognized and an error is reported. If this is skipped the rest is fine.

    It seems to be a problem of refbase to handle utf8, right?

    Best,

    Jan

     
  • Hi Jan,

    thanks for the additional info.

    > It worked with RIS. The first entry is included then.

    Ok, this hints at a problem with the Bibutils integration then, since Bibutils is required for import of e.g. BibTeX, Endnote & MODS but not for RIS.

    > I installed bibutils locally on my computer for this.

    Is your refbase installation also on your local machine?

    If it's on a server instead, you'll need to install Bibutils on that same server in order to be able to import formats that require Bibutils.

    Also, which version of Bibutils are you using?

    If you're using Bibutils 4.x and you'd like to  on the refbase-0.9.5 server, you should try Bibutils v3.4 instead. This version is available at .

    : http://www.refbase.net/index.php/Bibutils#Installing_Bibutils_for_use_with_refbase
    : http://bibutils.refbase.org/

    What character encoding did you use when installing refbase? latin1 or utf8? And what's the value of variable '$contentTypeCharset' in file 'initialize/ini.inc.php'?

    If you are using Bibutils 4.x locally to convert your BibTeX file to RIS, the following terminal commands should work for a UTF-8 based workflow (i.e. if your input files as well as your refbase installation are UTF-8):

        bib2xml -i utf8 -literature.bib > literature.xml
        xml2ris -o utf8 -literature.xml > literature.ris

    Thanks for sending me your bib files privately. I see that you BibTeX file contains a JabRef header comment. Please remove the first three lines from the file:

        % This file was created with JabRef 2.3.1.
        % Encoding: UTF8
       

    Does the BibTeX file now import correctly, i.e. including the first entry?

    > It seems to be a problem of refbase to handle utf8, right?

    No, not necessarily. refbase is able to handle UTF-8 but the whole system (server, refbase, Bibutils) must have been setup correctly to allow for a correct UTF-8 workflow. I can give you more info if necessary.

    HTH, Matthias

     
  • Jan Lorenz
    Jan Lorenz
    2009-12-15

    Hi,

    my rebase installation is on a shared host (1und1) which has a MySQL charset UTF8 database. I think I use bibutils 4 there, but I had to contact the guys at 1und1 to compile them again, because they did not work when I copied the binaries. So, I'd like to not bother them quite often. Locally I also have bibutils 4 because thats what I get with apt-get in kubuntu.

    I know that the jabref lines are not the problem, because the error repeats when removed. Also the ris-file where it does not work when it is in utf8 has no such lines in the beginning. Ris only works when not in utf8 but then I get special character errors. I could try to check with bibutils 3.4 locally, but I am not sure if it is worth the try.
    How do down grade quickly in kubunutu with apt-get?

    Can you import the ris-file (in utf8) I sent you?

    I think I did a re-install with utf8 when I recognized that my database is utf8.
    Refbased is iniitialized with

        $contentTypeCharset = "UTF-8";

    Further on

        bib2xml -i utf8 literature.bib > literature.xml
        xml2ris -o utf8 literature.xml > literature.ris

    did not improve on

        bib2xml literature.bib > literature.xml
        xml2ris -o utf8 literature.xml > literature.ris

    Thanks for your efforts!

    Best,

    Jan

     
  • Hi Jan,

    > my rebase installation is on a shared host (1und1) which has a
    > MySQL charset UTF8 database. I think I use bibutils 4 there

    ATM, refbase-0.9.5 only works reliably with . Bibutils 4.x introduced changes that are incompatible with refbase-0.9.5 but which should get fixed in the next refbase update.

    : http://bibutils.refbase.org/

    Of course, you're fine to convert your BibTeX data locally to RIS (using Bibutils 4.x), then import that RIS file instead.

    > I know that the jabref lines are not the problem,
    > because the error repeats when removed.

    IIRC, the JabRef header comments have caused problems previously, but maybe Bibutils now accounts for them…

    > Can you import the ris-file (in utf8) I sent you?

    I first only tried with my local refbase installation (which is latin1-based). There, I can successfully import the RIS file you've sent me. I also tried to import your RIS file at the  (which is very similar to the stock refbase-0.9.5 release).

    : http://refbase.textdriven.com/beta/

    Trying to import your RIS file into the utf8-based refbase database, I can replicate your issue, i.e. I also get this warning:

        Record 1: Unrecognized data format! Required field missing: TY

    The problem seems to be the BOM character (byte order mark) at the beginning of the UTF-8 file which seems to confuse either PHP or refbase (I dunno).

    If I open your RIS file in a text editor and re-save it as "Unicode (UTF-8, no BOM)" (i.e. without a byte order mark), then I can import your RIS file just fine - and higher ASCII chars seem to get correctly imported.

    Also note that due to these issues with file encoding (and sometimes also line ending issues), it is often easier to open your file in a text editor (making sure that the file gets displayed correctly), and copy & paste everything into the refbase import text entry field, then hit the "Import" button.

    So, using Bibutils v4.x, please try to convert your BibTeX file to RIS (UTF-8, no BOM):

        bib2xml -i utf8 -un literature.bib | xml2ris -o utf8 -nb > literature.ris

    I have tested this locally using Bibutils v4.3. The  for  lists changes related to the handling of byte order marks in UTF8-encoded files. So you also might want to try these.

    : http://www.scripps.edu/~cdputnam/software/bibutils/v4hist.html
    : http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html

    HTH, Matthias

     
  • I realize I'm bumping a very old thread, but I just checked in code that will remove BOMs from uploaded files, if present.  RefWorks & some other providers include BOM by default.