[Refdb-users] Re: Is refdb international?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Janusz S. Bie=F1 writes:
 > I hope so. On the other hand, the problem of different endianness
 > occurs only if you use UTF-16, while in practice more popular seems
 > UTF-8 form, already supported `out of the box' by GNU Emacs.
 >=20
You're right, I had multibyte character sets in mind, not Unicode per
se.

 > Are you aware that the most official (ISO 2709) and popular MARC
 > format for bibliographic records has also an SGML/XML form? It's
 > somewhere on Library of Congress pages. When EndNote downloads a
 > record from a real library catalog, it probably get its in MARC form=
at
 > and only later converts it and stores in the RIS file. I had no time=

My background is Biochemistry, and the bibliographic data I use mostly
come from Pubmed. There are only three formats available: XML, ASN.1,
and Medline, the latter being almost identical to RIS. This is the
official download format for importing Pubmed data into reference
manager programs. The XML DTD is pretty specific to the Pubmed
database, so I currently do not plan to support it as an input
format. MARC is not supported at all, so apparently it depends on your
field of expertise what you consider as "most popular".

 > yet to check what is the output of free Z39.50 clients such as YAZ
 > (which hopefully will be some time in future integrated with refdb
 > :-)), but I suspect it must be quite close to MARC.

Well, at least I did the first steps towards this goal a while ago,
but I didn't start to code yet. I just browsed the YAZ documentation
again but couldn't find any mentioning of MARC. My understanding of
both YAZ and MARC is quite poor, so I can't tell whether this is a
problem.

 >=20
 > Now the question is: do you have an idea how difficult it would be t=
o
 > provide support for MARC format either as an advanced option or as t=
he
 > primary format? Perhaps it is just switching to a different DTD in t=
he
 > right place?
 >=20

www.loc.gov was down when I tried, so I don't have all the information
I'd like to have about MARC yet. Without a documentation the MARC XML
DTD looks like complete nonsense (I'm sure this impression will change
as soon as I have access to the docs). In general, XML would be the
best way to feed MARC datasets to RefDB. There are two different
issues when adding new input formats to RefDB:

- teaching refdbd to digest a new reference format is fairly easy. The
  current implementation simply sends the incoming chunk of data to a
  RIS parser. Adding a switch to send it to an XML parser would be
  trivial. The hard thing would then be to add the handlers for a
  specific XML DTD.

- teaching RefDB databases to hold full MARC datasets could turn out
  to be a complex task, depending on the complexity of the MARC
  DTD. I'd prefer a way to use a common database format for both RIS
  and MARC stuff.

Another issue is which MARC standard should be supported. USMARC
sounds fairly straightforward as it uses ASCII, but this wouldn't
solve your problem. UNIMARC allows to specify the ISO character set in
each dataset, though.

 > I had a very, very quick look at EndNote demo and I am intrigued by
 > the lack of any mention of character code issue. This is strange,
 > because bibliographic references are inherently multilingual (the
 > libraries, before switching to UNICODE, seemed to use internally som=
e
 > other multibyte encodings). Looks like there is a need for some
 > experiments consisting in downloading bibliographic references in
 > different languages and checking the character code in the resulting=

 > RIS file.
 >=20
I'd be thrilled to read the results of your experiments. I don't have
access to a Windows box currently so I can't run these tests myself.

regards,
Markus

--=20
Markus Hoenicka <hoe...@co...>
http://ourworld.compuserve.com/homepages/hoenicka_markus/