Thread: [Refdb-users] new bibliographic data schema proposal

Status: Beta

Brought to you by: mhoenicka

refdb-users

[Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-22 22:52:56

Hi,

I've been working on a successor for the risx.dtd for quite a while,
and I'd like to get some thoughts from interested fellows on the
results. The new RelaxNG schema (or rather an improved version
thereof) may or may not become the underlying data format of future
RefDB releases. The schema, a description, and some example entries
are available here:

http://refdb.sourceforge.net/rbib.html

Some of the reasoning behind this schema can be found in a blog entry
right here:

http://mhoenicka.de/system-cgi/blog/index.php?itemid=567

I apologize that there is no element reference or at least thoroughly
commented example data yet. I'd appreciate if you could give the
schema a try. Let me know how you get along with encoding the data
that you usually work with and feel free to send your example data for
a discussion. For best results use a validating XML editor for data
entry.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-23 01:09:41

On 4/22/06, Markus Hoenicka <mar...@mh...> wrote:

> I've been working on a successor for the risx.dtd for quite a while,
> and I'd like to get some thoughts from interested fellows on the
> results.

Markus, I posted something on my blog about it (had been working on
the Atom stuff anyway):

<http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-c=
omments>

I suppose in a nutshell my argument is that the three-level structure
and flat typing is still not very, um, relational. Also, I think there
are some recent developments around syndication technologies in
particular, as well as RDF, that make it pretty important to fit new
formats into those methods of modelling data, identifying things
(increasingly using uris) and so forth.

That said, if all you're interested in is improving RISX, then it
seems you're on the right track. But even so, I think you can bring it
closer in line with my suggestions.

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: <sh...@we...> - 2006-04-23 07:31:43

I agree with Bruce. If it is about improving more or less the "internal" 
data structure, the most important issue (at least in my eyes) is the 
integration of legacy formats. Since these formats are the ones the 
users work with (for docbook, bibtex, endnote and so on).

We should be thinking one step ahead and take into account the upcoming 
technologies and define some kind of "standard compliant exchange 
format". This is just a buzz word, but integrating the reference data in 
a rss format sounds great for me, since this can help to interconnect 
different reference databases more easily. If we think about the web as 
a possible interface it a great idea to be able to fetch entries from 
different sites and integrate them in your publications.

That's just my opinion,

Sebastian

Bruce D'Arcus schrieb:

>On 4/22/06, Markus Hoenicka <mar...@mh...> wrote:
>
>  
>
>>I've been working on a successor for the risx.dtd for quite a while,
>>and I'd like to get some thoughts from interested fellows on the
>>results.
>>    
>>
>
>Markus, I posted something on my blog about it (had been working on
>the Atom stuff anyway):
>
><http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-comments>
>
>I suppose in a nutshell my argument is that the three-level structure
>and flat typing is still not very, um, relational. Also, I think there
>are some recent developments around syndication technologies in
>particular, as well as RDF, that make it pretty important to fit new
>formats into those methods of modelling data, identifying things
>(increasingly using uris) and so forth.
>
>That said, if all you're interested in is improving RISX, then it
>seems you're on the right track. But even so, I think you can bring it
>closer in line with my suggestions.
>
>Bruce
>
>
>-------------------------------------------------------
>Using Tomcat but need to do more? Need to support web services, security?
>Get stuff done quickly with pre-integrated technology to make your job easier
>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>http://sel.as-us.falkag.net/sel?cmd=k&kid0709&bid&3057&dat1642
>_______________________________________________
>Refdb-users mailing list
>Ref...@li...
>https://lists.sourceforge.net/lists/listinfo/refdb-users
>  
>

Re: [Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-23 21:03:15

Bruce D'Arcus writes:
 > Markus, I posted something on my blog about it (had been working on
 > the Atom stuff anyway):
 > 
 > <http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-comments>
 > 

Thanks for these comments.

 > I suppose in a nutshell my argument is that the three-level structure
 > and flat typing is still not very, um, relational.

I can't follow you here. In risx, only periodicals were treated in a
more or less relational fashion. rbib treats the relationships of all
levels in a relational fashion, regardless of the reference type. You
can even define relations between datasets, regardless of the level of
the dataset. Is it that the relations are hard-linked only in the
database, not in the XML file?

 > Also, I think there
 > are some recent developments around syndication technologies in
 > particular, as well as RDF, that make it pretty important to fit new
 > formats into those methods of modelling data, identifying things
 > (increasingly using uris) and so forth.
 > 

Two things:

- isn't the data format definition (the RelaxNG schema in this case)
independent of the modes of distribution (e.g. an Atom feed)? I don't
know much about Atom, but it seems to me that you can embed just about
anything as a payload (at least with namespaces).

- if you work with URIs to address relational parts of a reference,
who defines/publishes these items? Do you expect them to be
universally available on the web? Do you expect the bibliographic
dataset an URI points to to be unique on the web? Why would you want
to store the stuff in a database then? What do you do when you're
offline? Seems like I don't really understand how this is supposed to
work.

 > That said, if all you're interested in is improving RISX, then it
 > seems you're on the right track. But even so, I think you can bring it
 > closer in line with my suggestions.
 > 

I wouldn't mind, but I guess I'll have to learn quite a bit.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-23 22:15:52

On 4/23/06, Markus Hoenicka <mar...@mh...> wrote:

>  > I suppose in a nutshell my argument is that the three-level structure
>  > and flat typing is still not very, um, relational.
>
> I can't follow you here. In risx, only periodicals were treated in a
> more or less relational fashion. rbib treats the relationships of all
> levels in a relational fashion, regardless of the reference type. You
> can even define relations between datasets, regardless of the level of
> the dataset. Is it that the relations are hard-linked only in the
> database, not in the XML file?

I guess I just mean that levels are only one kind of relation of
relevance, and that there are other important ones. Likewise, to have
top-level elements like journalarticle is flattening the typing model.

More below ...

>  > Also, I think there
>  > are some recent developments around syndication technologies in
>  > particular, as well as RDF, that make it pretty important to fit new
>  > formats into those methods of modelling data, identifying things
>  > (increasingly using uris) and so forth.
>  >
>
> Two things:
>
> - isn't the data format definition (the RelaxNG schema in this case)
> independent of the modes of distribution (e.g. an Atom feed)? I don't
> know much about Atom, but it seems to me that you can embed just about
> anything as a payload (at least with namespaces).

Correct. It's just that for an atom:entry, there is always a title. As
with most recent formats, the focus is on the object of interest. So
if I write an xpath to grab all titles from an atom feed, it'd be
"//atom:title".

Consider the xpath to grab all primary titles with the level structure
(not using your precise element names, but you get the idea):
"//part/title|//publication/title[not(part)]".

Or say you were modeling this with objects. In the RIS-ish approach, you'd =
get:

   article_title =3D ref1.part.title
   book_title =3D ref2.publication.title

... while the other approach you'd have:

   article_title =3D ref1.title
   book_title =3D ref2.title
   article_journal_title =3D ref1.container.title

So there's a mismatch between the models that is somewhat awkward. I
think it makes more sense, and its just as easy to handle
programatically, to do:

title
isPartOf (or whatever you want to call the level relation)
  title

Certainly in a database, one need not store parts and publications in
separate tables.

> - if you work with URIs to address relational parts of a reference,
> who defines/publishes these items?

Anybody, really. But having a standard id helps to at least solve the
problem of "how do we identify something". That it's hooked into the
architecture of the web is the all the more useful.

> Do you expect them to be universally available on the web?

In the long run, yes, and even now we can grab a lot of this data off
the web using standard ids like this. See, for example, this little
python script:

<http://onebiglibrary.net/project/opa/opa-0.2-release-with-json-wrapper>

But even if not available now, you can still always include that
information. For example, with RDF, if I have a link that says:

<dcterms:isPartOf rdf:resource=3D"urn:isbn:34354466"/>

I can always add that book too:

<Book rdf:about=3D"urn:isbn:34354466">
  ...
</Book>

> Do you expect the bibliographic dataset an URI points to to be unique on =
the web?

Not necessarily.  Exploiting uris and standard ids really just makes
it easier to associate and merge data. Even for citation purposes this
can be critical.

Consider, for example, if you are collborating on a document with
someone who uses a different database. How do you know you're
referring to the same items when you cite something with just a plain
key? Answer: you don't. It's why these days my citations look like:

<citation><biblioref xlink:href=3D"urn:isbn:343542566"/></citation>

There's a pretty good discussion of uris in the context of RDF here:

<http://taubz.for.net/code/semweb/whatisrdf/#Distributed%20Information>

> Why would you want to store the stuff in a database then? What do you do =
when you're
> offline?

Per above, I'm not saying the data needs to be only available on the
web, just that idenfitying things in this way opens up a lot of
flexibliity, and enhanced interoperability.

>  > That said, if all you're interested in is improving RISX, then it
>  > seems you're on the right track. But even so, I think you can bring it
>  > closer in line with my suggestions.
>
> I wouldn't mind, but I guess I'll have to learn quite a bit.

:-)

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-23 21:53:53

On 4/23/06, Bruce D'Arcus <bda...@gm...> wrote:

> In the long run, yes, and even now we can grab a lot of this data off
> the web using standard ids like this. See, for example, this little
> python script:
>
> <http://onebiglibrary.net/project/opa/opa-0.2-release-with-json-wrapper>

Actually this post is better:

<http://onebiglibrary.net/story/opa-release-0.1>

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-26 20:05:11

Hi,

Bruce D'Arcus writes:
 > I guess I just mean that levels are only one kind of relation of
 > relevance, and that there are other important ones. Likewise, to have
 > top-level elements like journalarticle is flattening the typing model.
 > 

Again I don't quite understand. journalarticle is a type that
describes an article published in a journal (using the analytical
information). You can link it to a periodical entry describing the
journal (using the monographic info). Alternatively you can provide
the monographic info in the same dataset, leaving it to the reference
database application to split it into two related entries. I wouldn't
call this flat.

 > Correct. It's just that for an atom:entry, there is always a title. As
 > with most recent formats, the focus is on the object of interest. So
 > if I write an xpath to grab all titles from an atom feed, it'd be
 > "//atom:title".
 > 
 > Consider the xpath to grab all primary titles with the level structure
 > (not using your precise element names, but you get the idea):
 > "//part/title|//publication/title[not(part)]".
 > 
 > Or say you were modeling this with objects. In the RIS-ish approach, you'd =
 > get:
 > 
 >    article_title =3D ref1.part.title
 >    book_title =3D ref2.publication.title
 > 
 > ... while the other approach you'd have:
 > 
 >    article_title =3D ref1.title
 >    book_title =3D ref2.title
 >    article_journal_title =3D ref1.container.title
 > 
 > So there's a mismatch between the models that is somewhat awkward. I
 > think it makes more sense, and its just as easy to handle
 > programatically, to do:
 > 
 > title
 > isPartOf (or whatever you want to call the level relation)
 >   title
 > 

Isn't that the ill-fated approach that RIS took with the TI element
(which can mean anything from an analytical to a monographic title)
and the likewise illogical AU to A1/A2/A3 mappings?

I took great care to make each level of a reference citable all by
itself. That is, a chapter in a book is represented as two entries in
the database: a chapter (analytical) and a book (monographic). The
former must be associated with the latter, whereas the latter may be a
standalone item. Querying for titles of the chapter reference thusly
means:

chapter_title = analytical.title
book_title = monographic.title

A direct query for the very same book results in:

book_title = monographic.title

In your approach the first query will give something like:

chapter_title = ref1.title
book_title = ref1.container.title

and the second:

book_title = ref2.title

That is, you have to run different queries depending on how the book
was initially added to the database. To me, this is just reinventing
the downsides of RIS.

 > Certainly in a database, one need not store parts and publications in
 > separate tables.
 > 

In order to be as relational as possible, you should store them in
separate tables. Both levels have different storage requirements.

 > > - if you work with URIs to address relational parts of a reference,
 > > who defines/publishes these items?
 > 
 > Anybody, really. But having a standard id helps to at least solve the
 > problem of "how do we identify something". That it's hooked into the
 > architecture of the web is the all the more useful.
 > 

Well, in my area of work the DOI seems to be more suitable to identify
an article. But you do have a point here.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-26 21:10:09

On 4/25/06, Markus Hoenicka <mar...@mh...> wrote:

> Bruce D'Arcus writes:
>  > I guess I just mean that levels are only one kind of relation of
>  > relevance, and that there are other important ones. Likewise, to have
>  > top-level elements like journalarticle is flattening the typing model.
>
> Again I don't quite understand. journalarticle is a type that
> describes an article published in a journal (using the analytical
> information).

But notice the difference even in how you describe it using natural
language. There are articles that are published in periodicals
(newspapers, magazines, journals, newsletters). The related type is
not intrinsic to the primary reference type.

> You can link it to a periodical entry describing the
> journal (using the monographic info). Alternatively you can provide
> the monographic info in the same dataset, leaving it to the reference
> database application to split it into two related entries. I wouldn't
> call this flat.

Just in the sense that you are collapsing the types for two related
items into a single type.

>  > So there's a mismatch between the models that is somewhat awkward. I
>  > think it makes more sense, and its just as easy to handle
>  > programatically, to do:
>  >
>  > title
>  > isPartOf (or whatever you want to call the level relation)
>  >   title
>
> Isn't that the ill-fated approach that RIS took with the TI element
> (which can mean anything from an analytical to a monographic title)
> and the likewise illogical AU to A1/A2/A3 mappings?

No, it's fundamentally different because the RIS approach is
incredibly limited.  You indicate a relation with an opaque integer
whose meaning varies by reference type.

In this approach, you are in fact very precisely modelling exactly how
reference metadata works. If you cite a journal article, you say:

    [article] has title "X" and i<is published in> [journal] "Y", etc.

... where the stuff in brackets indicates entities and the <>
indicates a relation.

> I took great care to make each level of a reference citable all by
> itself. That is, a chapter in a book is represented as two entries in
> the database: a chapter (analytical) and a book (monographic).

Good; makes sense.

> The former must be associated with the latter, whereas the latter may be =
a
> standalone item.

Right.

> Querying for titles of the chapter reference thusly
> means:
>
> chapter_title =3D analytical.title
> book_title =3D monographic.title
>
> A direct query for the very same book results in:
>
> book_title =3D monographic.title
>
> In your approach the first query will give something like:
>
> chapter_title =3D ref1.title
> book_title =3D ref1.container.title
>
> and the second:
>
> book_title =3D ref2.title

Correct.

> That is, you have to run different queries depending on how the book
> was initially added to the database.

Hmm ... am not following you here.

In the database, the book in any case is stored separately from the
chapter and they are linked. That should be true regardless of "how
the book was initially added."

But sure, the query to grab the part vs. monographic title is going to
be different when referring to the chapter than it is when referring
to the book it is in. This is only logical; isn't it?

E.g. you say to someone "look at the chapter titled 'abc' in the book
called 'xyz'." You don't say "look at analytical title 'abc' that has
a monographic title 'xyz'."

In Rails code, you might do:

    recommendation =3D Chapter.find_by_title("abc")
    buy_this_book =3D recommendation.publication
    puts buy_this_book.title

If I just want all books (including that one):

    Book.find_all

> To me, this is just reinventing the downsides of RIS.

Not at all. In the approach I am advocating, you can reliably
represent everything in RIS, and MUCH more. It is basically the MODS
approach, but with a much tighter modelling. Each entity gets a type,
the types have some sort of hierarchy (a journal is a subclass of
periodical, for example, which is a subclass of Collection) so that
stuff doesn't fall through the cracks, and the linking is clear.
Likewise, the locator information in analyticals always is associated
with the main level.

>  > Certainly in a database, one need not store parts and publications in
>  > separate tables.
>
> In order to be as relational as possible, you should store them in
> separate tables. Both levels have different storage requirements.

Not necessarily. The primary difference is that parts contain locators
like volume, issue, pages, while monographic items often contain
publishers. Those can (and probably should) be stored in separate
tables so that what is stored in parts and monographic rows per se is
pretty much the same.

I do think periodicals and such (collections) are a different matter though=
.

>  > > - if you work with URIs to address relational parts of a reference,
>  > > who defines/publishes these items?
>  >
>  > Anybody, really. But having a standard id helps to at least solve the
>  > problem of "how do we identify something". That it's hooked into the
>  > architecture of the web is the all the more useful.
>
> Well, in my area of work the DOI seems to be more suitable to identify
> an article.

Oh, yes, certainly. In fact, you can use the info uri schema to do
that like so "info:doi/10.1111/j.1467-8306.2005.00468.x".  I think
crossref does something similar:
"doi:10.1111/j.1467-8306.2005.00468.x". Both are valid uris.

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-26 22:19:45

Bruce D'Arcus writes:
 > On 4/25/06, Markus Hoenicka <mar...@mh...> wrote:
 > 
 > > Bruce D'Arcus writes:
 > >  > I guess I just mean that levels are only one kind of relation of
 > >  > relevance, and that there are other important ones. Likewise, to have
 > >  > top-level elements like journalarticle is flattening the typing model.
 > >
 > > Again I don't quite understand. journalarticle is a type that
 > > describes an article published in a journal (using the analytical
 > > information).
 > 
 > But notice the difference even in how you describe it using natural
 > language. There are articles that are published in periodicals
 > (newspapers, magazines, journals, newsletters). The related type is
 > not intrinsic to the primary reference type.
 > 

What is your primary reference type here? For me it is
newspaperarticle, journalarticle and so on. This is because these have
different content models. E.g. a newspaperarticle reference may
contain the section that the article appeared in (U.S., world, business,
arts, sports etc). A journalarticle reference doesn't. Remember that
the schema was designed to guide the user what data to enter.

 > But sure, the query to grab the part vs. monographic title is going to
 > be different when referring to the chapter than it is when referring
 > to the book it is in. This is only logical; isn't it?
 > 

What I was trying to say is this: in your system you have to query for
a container title if you're interested what book a chapter was
published in. You have to query for a standalone title if you're
interested in the book. But in both cases the data relate to the same
physical book. This distinction does not make sense to me. If you want
a book title, you're asking for monographic data, regardless of
whether that book contains chapters or is part of a series.

 > E.g. you say to someone "look at the chapter titled 'abc' in the book
 > called 'xyz'." You don't say "look at analytical title 'abc' that has
 > a monographic title 'xyz'."
 > 

But the meaning is equivalent (as every librarian could
witness). Whereas it is utmost confusing if you tell someone "look at
the item 'xyz' unless it is a book that contains chapters. In that
case, look at the container 'xyz' which will contain the item
'abc'". This just allows you to make the same errors like RIS in a
much more sophisticated way.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-26 23:53:59

On 4/26/06, Markus Hoenicka <mar...@mh...> wrote:
> Bruce D'Arcus writes:
>  > But notice the difference even in how you describe it using natural
>  > language. There are articles that are published in periodicals
>  > (newspapers, magazines, journals, newsletters). The related type is
>  > not intrinsic to the primary reference type.
>  >
>
> What is your primary reference type here?

Article.

> For me it is newspaperarticle, journalarticle and so on. This is because =
these have
> different content models. E.g. a newspaperarticle reference may
> contain the section that the article appeared in (U.S., world, business,
> arts, sports etc). A journalarticle reference doesn't. Remember that
> the schema was designed to guide the user what data to enter.

OK, though a) I'm not so sure journals don't have sections (indeed,
Nature's RSS feeds include section info), and b) even if not, allowing
an optional "section" element on article would hardly be problematic.

>  > But sure, the query to grab the part vs. monographic title is going to
>  > be different when referring to the chapter than it is when referring
>  > to the book it is in. This is only logical; isn't it?
>
> What I was trying to say is this: in your system you have to query for
> a container title if you're interested what book a chapter was
> published in. You have to query for a standalone title if you're
> interested in the book. But in both cases the data relate to the same
> physical book.

The point is that queries are contextual. If I ask for a person's
name, that is a different query than to ask for the name of an author,
even though they are the same person.

> This distinction does not make sense to me. If you want
> a book title, you're asking for monographic data, regardless of
> whether that book contains chapters or is part of a series.

I guess we're just going to have to agree to disagree. I don't find it
at all intuitive, and as I said before, I think consensus is on my
side.

>  > E.g. you say to someone "look at the chapter titled 'abc' in the book
>  > called 'xyz'." You don't say "look at analytical title 'abc' that has
>  > a monographic title 'xyz'."
>
> But the meaning is equivalent (as every librarian could
> witness). Whereas it is utmost confusing if you tell someone "look at
> the item 'xyz' unless it is a book that contains chapters. In that
> case, look at the container 'xyz' which will contain the item
> 'abc'". This just allows you to make the same errors like RIS in a
> much more sophisticated way.

I don't see how. I've already written a schema that works. It's as easy as:

element biblio:Article { rdf.id, (biblio.base_properties &
biblio.part_properties) }

The problem is that this level business comes from librarians, who in
fact a) have different metadata needs than scholera (their focus is
the "monographic"), and b) have moved on to more advanced modelling as
witnessed in MODS, and in particular, FRBR. Nowhere will you find in
those models hard-coded notions like part and monograph/publication.

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-27 08:09:09

Bruce D'Arcus <bda...@gm...> was heard to say:

>
> OK, though a) I'm not so sure journals don't have sections (indeed,
> Nature's RSS feeds include section info), and b) even if not, allowing
> an optional "section" element on article would hardly be problematic.

In fact, I've done just that, now that I reread my schema. journalabstract,
journalarticle, magazinearticle, and newspaperarticle share the same content
model (the schema went through dozens of reincarnations, so I'm confused at
times). However, as I noticed previously I have to record the distinction
between e.g. an abstract and an article published in a journal, although this
is the very same according to your model. If it comes to formatting, it is the
same no more, therefore I must be able to tell them apart. Hence two different
types.

> I guess we're just going to have to agree to disagree. I don't find it
> at all intuitive, and as I said before, I think consensus is on my
> side.
>

I'm not so sure although I don't have any statistical data. There was a
discussion on the TEI list about the biblStruct and biblItem elements just the
other day. There were several posts in favour of the existing
analytical/monographic/series distinction, and none in favour of replacing it
with something else. And most of these posts were from people who were
concerned about transforming the bibliographic data to particular output
formats.

> I don't see how. I've already written a schema that works. It's as easy as:
>
> element biblio:Article { rdf.id, (biblio.base_properties &
> biblio.part_properties) }
>

I've got here:

periodicalarticle-content = attlist.entry, periodicalarticle-part, (relation |
periodical-publication), refinfo, libinfo

I don't see a major difference , except that in my case "xyz-part" invariably
means analytical info, and that the monographic container can either be
referenced (if it already exists), or its information can be provided directly
(as a convenience when writing or importing datasets).

> The problem is that this level business comes from librarians, who in
> fact a) have different metadata needs than scholera (their focus is
> the "monographic"), and b) have moved on to more advanced modelling as
> witnessed in MODS, and in particular, FRBR. Nowhere will you find in
> those models hard-coded notions like part and monograph/publication.
>

In MODS, they shamefully insert a "monographic" into the issuance element of the
origin information of a monographic item. If you have an item a "is_part_of"
item b, you could not tell whether to format it as an analytical item that
appeared in a monograph or as a monographic item that appeared in a series
unless you mark one of the items as a monograph. And I don't see why this
should not be done explicitly.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-27 11:09:05

Markus Hoenicka <mar...@mh...> was heard to say:


> I'm not so sure although I don't have any statistical data. There was a
> discussion on the TEI list about the biblStruct and biblItem elements just
> the
> other day. There were several posts in favour of the existing
> analytical/monographic/series distinction, and none in favour of replacing it
> with something else. And most of these posts were from people who were
> concerned about transforming the bibliographic data to particular output
> formats.
>

I have to correct myself here. The discussion is still going on, and a few
minutes ago one of the proponents of the new biblItem raised his voice:

http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0604&L=tei-l&T=0&P=11282

It is interesting to see that the new structure is not far from what I proposed.
 You can code a two-level item (like chapter in book or article in journal)
either as nested bibItems or as linked biblItems. It is a bit weird that you
can code the former as either "chapter in book" or as "book containing chapter"
depending on how you intend to cite it. It somewhat runs against common sense to
have a child element encode the container of a parent element, but if you do it
in the more natual order (book as a parent biblItem, chapter as a child
biblItem) you're forced to reference the innermost biblItem when you cite it.
rbib offers only "chapter in book" in this case, and the chapter and book data
are not nested but put side-by-side, thus avoiding the possible orthogonality
of XML parent/child and bibliographic container/part relationships.

You should note though that even this fairly flexible system uses <title
level="a"> and <title level="m"> to denote analytical and monographic titles,
respectively. There seems to be an understanding that this is required to
getting things like citations and bibliography formatting right.

regards,
Markus



-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-27 12:36:08

On 4/27/06, Markus Hoenicka <mar...@mh...> wrote:

> You should note though that even this fairly flexible system uses <title
> level=3D"a"> and <title level=3D"m"> to denote analytical and monographic=
 titles,
> respectively. There seems to be an understanding that this is required to
> getting things like citations and bibliography formatting right.

I was involved in the development of the new TEI stuff, and I actually
argued strongly they should have gotten rid of this precisely because
it's unnecessary and inconsistent.

See:

<http://sourceforge.net/mailarchive/forum.php?thread_id=3D6708394&forum_id=
=3D41283>

I guess I lost that argument though.

Bruce

Re: [Refdb-users] new bibliographic data schema proposal

From: Bruce D'A. <bda...@gm...> - 2006-04-27 13:04:27

On 4/27/06, Markus Hoenicka <mar...@mh...> wrote:

> tHowever, as I noticed previously I have to record the distinction
> between e.g. an abstract and an article published in a journal, although =
this
> is the very same according to your model.

IIRC, I have Abstract as a subclass of article in my RDF schema . So
it does gets its own type.

> > The problem is that this level business comes from librarians, who in
> > fact a) have different metadata needs than scholera (their focus is
> > the "monographic"), and b) have moved on to more advanced modelling as
> > witnessed in MODS, and in particular, FRBR. Nowhere will you find in
> > those models hard-coded notions like part and monograph/publication.
>
> In MODS, they shamefully insert a "monographic" into the issuance element=
 of the
> origin information of a monographic item. If you have an item a "is_part_=
of"
> item b, you could not tell whether to format it as an analytical item tha=
t
> appeared in a monograph or as a monographic item that appeared in a serie=
s
> unless you mark one of the items as a monograph. And I don't see why this
> should not be done explicitly.

Yes, I agree.

To explain, I'm just using RDF -- the model and the class sytem -- to
achieve this.

So if you look at the schema (http://purl.org/net/biblio), you'lll see
Collection as a top-level class. Periodical is a subclass of
Collection, and Journal of Periodical.

Representing in XML/RDF, you then have:

<Article rdf:about=3D"info:doi/343254254556557x">
  <title>Whatever</title>
  <publishedIn rdf:resource=3D"urn:issn:2343-2314"/>
</Article>

<Journal rdf:about=3D"urn:issn:2343-2314">
  <title>Some Journal</title>
</Journal>

This is the RDF approach to data normalization, but you coulld also do:

<Article rdf:about=3D"info:doi/343254254556557x">
  <title>Whatever</title>
  <publishedIn>
    <Journal>
      <title>Some Journal</title>
   </Journal>
  </publishedIn>
</Article>

I started looking into RDF seriously after having finished my book and
going through and trying to fix a lot of my MODS data. I realized it
was hard to normalize it (in the XML), in part because MODS isn't RDF.
You can see the results of the conversion here:

<http://www.users.muohio.edu/darcusb/meta/references/>

I've now upgraded citeproc to format this stuff.

The OpenDocument group at OASIS (which I am part of now) has been
looking into using RDF to provide an extensible metadata system in the
format, something like Adobe's XMP (which is an RDF subset). The fact
that RDF has a simple and clear model makes extension trivial; it's
designed for mixing data.

Bruce

[Refdb-users] new bibliographic data schema proposal

From: Markus H. <mar...@mh...> - 2006-04-27 20:50:06

Markus Hoenicka writes:
 > RefDB releases. The schema, a description, and some example entries
 > are available here:
 > 
 > http://refdb.sourceforge.net/rbib.html
 > 

It just came to my attention that I forgot to upload three images
along with the above web page. If you ever wondered what the
nonsensical stuff approximately at the middle of the page was meant to
say, please go back again. It might make a few things clearer.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de