[Refdb-devel] [ refdb-Bugs-2644739 ] refdbd yields invalid XML as RISX
Status: Beta
Brought to you by:
mhoenicka
From: SourceForge.net <no...@so...> - 2009-02-28 19:57:44
|
Bugs item #2644739, was opened at 2009-02-27 10:35 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2644739&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: refdbd yields invalid XML as RISX Initial Comment: Using "getref" with refdb using the RISX format, the output is not valid XML because the <entry> end tags are missing the closing angle bracket: ... <libinfo user="chantal"> <notes>accepted for publication</notes> <reprint status="NOTINFILE"/> </libinfo> </entry </ris> ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2009-02-28 20:57 Message: I've commited the fix in refdbdgetref.c and a similar one in refdbdnote.c, available in revision 660. Please give it a try if time permits. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2009-02-28 20:11 Message: I believe the "utf8" in your risx output is a consequence of a problem we've discussed a few days ago. PostgreSQL changed the encoding name from UNICODE to UTF8. Older versions of the pgsql libdbi driver do not recognize this encoding name and pass it unaltered instead of translating it to the IANA name, hoping that the application using libdbi can make sense of it. Revision 1.62 of the pgsql driver takes care of this problem, so you'd have to upgrade libdbi-drivers to get rid of this problem. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-02-28 15:10 Message: I must correct myself: I didn't see that refdbc passes utf8, I simply assumed that it does so (possibly implicitly) because it's what comes back from the server, and because my client worked with utf8, too. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2009-02-28 15:05 Message: I've also spent some time analyzing this problem. Turns out that the last character (the > in risx output and a CR in ris output) is only missing if refdbd has to do a character encoding transformation. This is why I didn't see this problem in my regular setup. However, requesting the data in any encoding other than the database encoding reproduces this problem. This will be fixed in svn shortly. As to your utf-8 vs. utf8 experiments: refdbc should not pass this string unless you tell it so. RefDB (or rather libdbi) expects encoding names using the IANA names, so utf8 is simply invalid. utf-8 should work ok with refdbc with the svn revisions that I'll check in shortly. I'd appreciate if you could verify the fix with your setup. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-02-28 14:38 Message: Supplement: Please don't consider this a workaround because utf8 is not a valid encoding name in XML. Even worse, the parser that I use rejects it. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-02-28 14:26 Message: I sniffed the socket communication with Wireshark and compared my client with refdbc. The difference is that I pass "-E utf-8" and refdbc passes "-E utf8". This is then embedded into the XML declaration of the response, so my response becomes one byte too long. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2009-02-27 11:06 Message: I've found no trace of this problem in the svn logs of backend-risx.c. I cannot reproduce it with refdbc, as you've noticed yourself. I'll test this with the Perl module tonight. However, as refdbc does not add anything to the data sent by refdbd, I'd rather suspect a bug in your implementation of the protocol, or a bug in the documentation of the protocol. Looks like a typical off-by-one error. If nothing else helps, I might try to read your code (although I'm not familiar with that language you're using) and see if something jumps at me. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-02-27 10:43 Message: Clarification: It happens if you connect though a socket to refdbd but not with refdbc. Each entry is sent as one socket chunk, however, the four NULLs at the end are one character too early: "...</libinfo>\n </entry\x00\x00\x00\x00" ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2644739&group_id=26091 |