[Refdb-users] Re: Is refdb international?
Status: Beta
Brought to you by:
mhoenicka
|
From: Markus H. <hoe...@co...> - 2002-05-27 21:29:24
|
Janusz S. Bie=F1 writes: > I hope so. On the other hand, the problem of different endianness > occurs only if you use UTF-16, while in practice more popular seems > UTF-8 form, already supported `out of the box' by GNU Emacs. >=20 You're right, I had multibyte character sets in mind, not Unicode per se. > Are you aware that the most official (ISO 2709) and popular MARC > format for bibliographic records has also an SGML/XML form? It's > somewhere on Library of Congress pages. When EndNote downloads a > record from a real library catalog, it probably get its in MARC form= at > and only later converts it and stores in the RIS file. I had no time= My background is Biochemistry, and the bibliographic data I use mostly come from Pubmed. There are only three formats available: XML, ASN.1, and Medline, the latter being almost identical to RIS. This is the official download format for importing Pubmed data into reference manager programs. The XML DTD is pretty specific to the Pubmed database, so I currently do not plan to support it as an input format. MARC is not supported at all, so apparently it depends on your field of expertise what you consider as "most popular". > yet to check what is the output of free Z39.50 clients such as YAZ > (which hopefully will be some time in future integrated with refdb > :-)), but I suspect it must be quite close to MARC. Well, at least I did the first steps towards this goal a while ago, but I didn't start to code yet. I just browsed the YAZ documentation again but couldn't find any mentioning of MARC. My understanding of both YAZ and MARC is quite poor, so I can't tell whether this is a problem. >=20 > Now the question is: do you have an idea how difficult it would be t= o > provide support for MARC format either as an advanced option or as t= he > primary format? Perhaps it is just switching to a different DTD in t= he > right place? >=20 www.loc.gov was down when I tried, so I don't have all the information I'd like to have about MARC yet. Without a documentation the MARC XML DTD looks like complete nonsense (I'm sure this impression will change as soon as I have access to the docs). In general, XML would be the best way to feed MARC datasets to RefDB. There are two different issues when adding new input formats to RefDB: - teaching refdbd to digest a new reference format is fairly easy. The current implementation simply sends the incoming chunk of data to a RIS parser. Adding a switch to send it to an XML parser would be trivial. The hard thing would then be to add the handlers for a specific XML DTD. - teaching RefDB databases to hold full MARC datasets could turn out to be a complex task, depending on the complexity of the MARC DTD. I'd prefer a way to use a common database format for both RIS and MARC stuff. Another issue is which MARC standard should be supported. USMARC sounds fairly straightforward as it uses ASCII, but this wouldn't solve your problem. UNIMARC allows to specify the ISO character set in each dataset, though. > I had a very, very quick look at EndNote demo and I am intrigued by > the lack of any mention of character code issue. This is strange, > because bibliographic references are inherently multilingual (the > libraries, before switching to UNICODE, seemed to use internally som= e > other multibyte encodings). Looks like there is a need for some > experiments consisting in downloading bibliographic references in > different languages and checking the character code in the resulting= > RIS file. >=20 I'd be thrilled to read the results of your experiments. I don't have access to a Windows box currently so I can't run these tests myself. regards, Markus --=20 Markus Hoenicka <hoe...@co...> http://ourworld.compuserve.com/homepages/hoenicka_markus/ |