Thread: [Refdb-users] new bibliographic data schema proposal
Status: Beta
Brought to you by:
mhoenicka
From: Markus H. <mar...@mh...> - 2006-04-22 22:52:56
|
Hi, I've been working on a successor for the risx.dtd for quite a while, and I'd like to get some thoughts from interested fellows on the results. The new RelaxNG schema (or rather an improved version thereof) may or may not become the underlying data format of future RefDB releases. The schema, a description, and some example entries are available here: http://refdb.sourceforge.net/rbib.html Some of the reasoning behind this schema can be found in a blog entry right here: http://mhoenicka.de/system-cgi/blog/index.php?itemid=567 I apologize that there is no element reference or at least thoroughly commented example data yet. I'd appreciate if you could give the schema a try. Let me know how you get along with encoding the data that you usually work with and feel free to send your example data for a discussion. For best results use a validating XML editor for data entry. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bda...@gm...> - 2006-04-23 01:09:41
|
On 4/22/06, Markus Hoenicka <mar...@mh...> wrote: > I've been working on a successor for the risx.dtd for quite a while, > and I'd like to get some thoughts from interested fellows on the > results. Markus, I posted something on my blog about it (had been working on the Atom stuff anyway): <http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-c= omments> I suppose in a nutshell my argument is that the three-level structure and flat typing is still not very, um, relational. Also, I think there are some recent developments around syndication technologies in particular, as well as RDF, that make it pretty important to fit new formats into those methods of modelling data, identifying things (increasingly using uris) and so forth. That said, if all you're interested in is improving RISX, then it seems you're on the right track. But even so, I think you can bring it closer in line with my suggestions. Bruce |
From: <sh...@we...> - 2006-04-23 07:31:43
|
I agree with Bruce. If it is about improving more or less the "internal" data structure, the most important issue (at least in my eyes) is the integration of legacy formats. Since these formats are the ones the users work with (for docbook, bibtex, endnote and so on). We should be thinking one step ahead and take into account the upcoming technologies and define some kind of "standard compliant exchange format". This is just a buzz word, but integrating the reference data in a rss format sounds great for me, since this can help to interconnect different reference databases more easily. If we think about the web as a possible interface it a great idea to be able to fetch entries from different sites and integrate them in your publications. That's just my opinion, Sebastian Bruce D'Arcus schrieb: >On 4/22/06, Markus Hoenicka <mar...@mh...> wrote: > > > >>I've been working on a successor for the risx.dtd for quite a while, >>and I'd like to get some thoughts from interested fellows on the >>results. >> >> > >Markus, I posted something on my blog about it (had been working on >the Atom stuff anyway): > ><http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-comments> > >I suppose in a nutshell my argument is that the three-level structure >and flat typing is still not very, um, relational. Also, I think there >are some recent developments around syndication technologies in >particular, as well as RDF, that make it pretty important to fit new >formats into those methods of modelling data, identifying things >(increasingly using uris) and so forth. > >That said, if all you're interested in is improving RISX, then it >seems you're on the right track. But even so, I think you can bring it >closer in line with my suggestions. > >Bruce > > >------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=k&kid0709&bid&3057&dat1642 >_______________________________________________ >Refdb-users mailing list >Ref...@li... >https://lists.sourceforge.net/lists/listinfo/refdb-users > > |
From: Markus H. <mar...@mh...> - 2006-04-23 21:03:15
|
Bruce D'Arcus writes: > Markus, I posted something on my blog about it (had been working on > the Atom stuff anyway): > > <http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/04/22/rbib-comments> > Thanks for these comments. > I suppose in a nutshell my argument is that the three-level structure > and flat typing is still not very, um, relational. I can't follow you here. In risx, only periodicals were treated in a more or less relational fashion. rbib treats the relationships of all levels in a relational fashion, regardless of the reference type. You can even define relations between datasets, regardless of the level of the dataset. Is it that the relations are hard-linked only in the database, not in the XML file? > Also, I think there > are some recent developments around syndication technologies in > particular, as well as RDF, that make it pretty important to fit new > formats into those methods of modelling data, identifying things > (increasingly using uris) and so forth. > Two things: - isn't the data format definition (the RelaxNG schema in this case) independent of the modes of distribution (e.g. an Atom feed)? I don't know much about Atom, but it seems to me that you can embed just about anything as a payload (at least with namespaces). - if you work with URIs to address relational parts of a reference, who defines/publishes these items? Do you expect them to be universally available on the web? Do you expect the bibliographic dataset an URI points to to be unique on the web? Why would you want to store the stuff in a database then? What do you do when you're offline? Seems like I don't really understand how this is supposed to work. > That said, if all you're interested in is improving RISX, then it > seems you're on the right track. But even so, I think you can bring it > closer in line with my suggestions. > I wouldn't mind, but I guess I'll have to learn quite a bit. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bda...@gm...> - 2006-04-23 22:15:52
|
On 4/23/06, Markus Hoenicka <mar...@mh...> wrote: > > I suppose in a nutshell my argument is that the three-level structure > > and flat typing is still not very, um, relational. > > I can't follow you here. In risx, only periodicals were treated in a > more or less relational fashion. rbib treats the relationships of all > levels in a relational fashion, regardless of the reference type. You > can even define relations between datasets, regardless of the level of > the dataset. Is it that the relations are hard-linked only in the > database, not in the XML file? I guess I just mean that levels are only one kind of relation of relevance, and that there are other important ones. Likewise, to have top-level elements like journalarticle is flattening the typing model. More below ... > > Also, I think there > > are some recent developments around syndication technologies in > > particular, as well as RDF, that make it pretty important to fit new > > formats into those methods of modelling data, identifying things > > (increasingly using uris) and so forth. > > > > Two things: > > - isn't the data format definition (the RelaxNG schema in this case) > independent of the modes of distribution (e.g. an Atom feed)? I don't > know much about Atom, but it seems to me that you can embed just about > anything as a payload (at least with namespaces). Correct. It's just that for an atom:entry, there is always a title. As with most recent formats, the focus is on the object of interest. So if I write an xpath to grab all titles from an atom feed, it'd be "//atom:title". Consider the xpath to grab all primary titles with the level structure (not using your precise element names, but you get the idea): "//part/title|//publication/title[not(part)]". Or say you were modeling this with objects. In the RIS-ish approach, you'd = get: article_title =3D ref1.part.title book_title =3D ref2.publication.title ... while the other approach you'd have: article_title =3D ref1.title book_title =3D ref2.title article_journal_title =3D ref1.container.title So there's a mismatch between the models that is somewhat awkward. I think it makes more sense, and its just as easy to handle programatically, to do: title isPartOf (or whatever you want to call the level relation) title Certainly in a database, one need not store parts and publications in separate tables. > - if you work with URIs to address relational parts of a reference, > who defines/publishes these items? Anybody, really. But having a standard id helps to at least solve the problem of "how do we identify something". That it's hooked into the architecture of the web is the all the more useful. > Do you expect them to be universally available on the web? In the long run, yes, and even now we can grab a lot of this data off the web using standard ids like this. See, for example, this little python script: <http://onebiglibrary.net/project/opa/opa-0.2-release-with-json-wrapper> But even if not available now, you can still always include that information. For example, with RDF, if I have a link that says: <dcterms:isPartOf rdf:resource=3D"urn:isbn:34354466"/> I can always add that book too: <Book rdf:about=3D"urn:isbn:34354466"> ... </Book> > Do you expect the bibliographic dataset an URI points to to be unique on = the web? Not necessarily. Exploiting uris and standard ids really just makes it easier to associate and merge data. Even for citation purposes this can be critical. Consider, for example, if you are collborating on a document with someone who uses a different database. How do you know you're referring to the same items when you cite something with just a plain key? Answer: you don't. It's why these days my citations look like: <citation><biblioref xlink:href=3D"urn:isbn:343542566"/></citation> There's a pretty good discussion of uris in the context of RDF here: <http://taubz.for.net/code/semweb/whatisrdf/#Distributed%20Information> > Why would you want to store the stuff in a database then? What do you do = when you're > offline? Per above, I'm not saying the data needs to be only available on the web, just that idenfitying things in this way opens up a lot of flexibliity, and enhanced interoperability. > > That said, if all you're interested in is improving RISX, then it > > seems you're on the right track. But even so, I think you can bring it > > closer in line with my suggestions. > > I wouldn't mind, but I guess I'll have to learn quite a bit. :-) Bruce |
From: Bruce D'A. <bda...@gm...> - 2006-04-23 21:53:53
|
On 4/23/06, Bruce D'Arcus <bda...@gm...> wrote: > In the long run, yes, and even now we can grab a lot of this data off > the web using standard ids like this. See, for example, this little > python script: > > <http://onebiglibrary.net/project/opa/opa-0.2-release-with-json-wrapper> Actually this post is better: <http://onebiglibrary.net/story/opa-release-0.1> Bruce |
From: Markus H. <mar...@mh...> - 2006-04-26 20:05:11
|
Hi, Bruce D'Arcus writes: > I guess I just mean that levels are only one kind of relation of > relevance, and that there are other important ones. Likewise, to have > top-level elements like journalarticle is flattening the typing model. > Again I don't quite understand. journalarticle is a type that describes an article published in a journal (using the analytical information). You can link it to a periodical entry describing the journal (using the monographic info). Alternatively you can provide the monographic info in the same dataset, leaving it to the reference database application to split it into two related entries. I wouldn't call this flat. > Correct. It's just that for an atom:entry, there is always a title. As > with most recent formats, the focus is on the object of interest. So > if I write an xpath to grab all titles from an atom feed, it'd be > "//atom:title". > > Consider the xpath to grab all primary titles with the level structure > (not using your precise element names, but you get the idea): > "//part/title|//publication/title[not(part)]". > > Or say you were modeling this with objects. In the RIS-ish approach, you'd = > get: > > article_title =3D ref1.part.title > book_title =3D ref2.publication.title > > ... while the other approach you'd have: > > article_title =3D ref1.title > book_title =3D ref2.title > article_journal_title =3D ref1.container.title > > So there's a mismatch between the models that is somewhat awkward. I > think it makes more sense, and its just as easy to handle > programatically, to do: > > title > isPartOf (or whatever you want to call the level relation) > title > Isn't that the ill-fated approach that RIS took with the TI element (which can mean anything from an analytical to a monographic title) and the likewise illogical AU to A1/A2/A3 mappings? I took great care to make each level of a reference citable all by itself. That is, a chapter in a book is represented as two entries in the database: a chapter (analytical) and a book (monographic). The former must be associated with the latter, whereas the latter may be a standalone item. Querying for titles of the chapter reference thusly means: chapter_title = analytical.title book_title = monographic.title A direct query for the very same book results in: book_title = monographic.title In your approach the first query will give something like: chapter_title = ref1.title book_title = ref1.container.title and the second: book_title = ref2.title That is, you have to run different queries depending on how the book was initially added to the database. To me, this is just reinventing the downsides of RIS. > Certainly in a database, one need not store parts and publications in > separate tables. > In order to be as relational as possible, you should store them in separate tables. Both levels have different storage requirements. > > - if you work with URIs to address relational parts of a reference, > > who defines/publishes these items? > > Anybody, really. But having a standard id helps to at least solve the > problem of "how do we identify something". That it's hooked into the > architecture of the web is the all the more useful. > Well, in my area of work the DOI seems to be more suitable to identify an article. But you do have a point here. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bda...@gm...> - 2006-04-26 21:10:09
|
On 4/25/06, Markus Hoenicka <mar...@mh...> wrote: > Bruce D'Arcus writes: > > I guess I just mean that levels are only one kind of relation of > > relevance, and that there are other important ones. Likewise, to have > > top-level elements like journalarticle is flattening the typing model. > > Again I don't quite understand. journalarticle is a type that > describes an article published in a journal (using the analytical > information). But notice the difference even in how you describe it using natural language. There are articles that are published in periodicals (newspapers, magazines, journals, newsletters). The related type is not intrinsic to the primary reference type. > You can link it to a periodical entry describing the > journal (using the monographic info). Alternatively you can provide > the monographic info in the same dataset, leaving it to the reference > database application to split it into two related entries. I wouldn't > call this flat. Just in the sense that you are collapsing the types for two related items into a single type. > > So there's a mismatch between the models that is somewhat awkward. I > > think it makes more sense, and its just as easy to handle > > programatically, to do: > > > > title > > isPartOf (or whatever you want to call the level relation) > > title > > Isn't that the ill-fated approach that RIS took with the TI element > (which can mean anything from an analytical to a monographic title) > and the likewise illogical AU to A1/A2/A3 mappings? No, it's fundamentally different because the RIS approach is incredibly limited. You indicate a relation with an opaque integer whose meaning varies by reference type. In this approach, you are in fact very precisely modelling exactly how reference metadata works. If you cite a journal article, you say: [article] has title "X" and i<is published in> [journal] "Y", etc. ... where the stuff in brackets indicates entities and the <> indicates a relation. > I took great care to make each level of a reference citable all by > itself. That is, a chapter in a book is represented as two entries in > the database: a chapter (analytical) and a book (monographic). Good; makes sense. > The former must be associated with the latter, whereas the latter may be = a > standalone item. Right. > Querying for titles of the chapter reference thusly > means: > > chapter_title =3D analytical.title > book_title =3D monographic.title > > A direct query for the very same book results in: > > book_title =3D monographic.title > > In your approach the first query will give something like: > > chapter_title =3D ref1.title > book_title =3D ref1.container.title > > and the second: > > book_title =3D ref2.title Correct. > That is, you have to run different queries depending on how the book > was initially added to the database. Hmm ... am not following you here. In the database, the book in any case is stored separately from the chapter and they are linked. That should be true regardless of "how the book was initially added." But sure, the query to grab the part vs. monographic title is going to be different when referring to the chapter than it is when referring to the book it is in. This is only logical; isn't it? E.g. you say to someone "look at the chapter titled 'abc' in the book called 'xyz'." You don't say "look at analytical title 'abc' that has a monographic title 'xyz'." In Rails code, you might do: recommendation =3D Chapter.find_by_title("abc") buy_this_book =3D recommendation.publication puts buy_this_book.title If I just want all books (including that one): Book.find_all > To me, this is just reinventing the downsides of RIS. Not at all. In the approach I am advocating, you can reliably represent everything in RIS, and MUCH more. It is basically the MODS approach, but with a much tighter modelling. Each entity gets a type, the types have some sort of hierarchy (a journal is a subclass of periodical, for example, which is a subclass of Collection) so that stuff doesn't fall through the cracks, and the linking is clear. Likewise, the locator information in analyticals always is associated with the main level. > > Certainly in a database, one need not store parts and publications in > > separate tables. > > In order to be as relational as possible, you should store them in > separate tables. Both levels have different storage requirements. Not necessarily. The primary difference is that parts contain locators like volume, issue, pages, while monographic items often contain publishers. Those can (and probably should) be stored in separate tables so that what is stored in parts and monographic rows per se is pretty much the same. I do think periodicals and such (collections) are a different matter though= . > > > - if you work with URIs to address relational parts of a reference, > > > who defines/publishes these items? > > > > Anybody, really. But having a standard id helps to at least solve the > > problem of "how do we identify something". That it's hooked into the > > architecture of the web is the all the more useful. > > Well, in my area of work the DOI seems to be more suitable to identify > an article. Oh, yes, certainly. In fact, you can use the info uri schema to do that like so "info:doi/10.1111/j.1467-8306.2005.00468.x". I think crossref does something similar: "doi:10.1111/j.1467-8306.2005.00468.x". Both are valid uris. Bruce |
From: Markus H. <mar...@mh...> - 2006-04-26 22:19:45
|
Bruce D'Arcus writes: > On 4/25/06, Markus Hoenicka <mar...@mh...> wrote: > > > Bruce D'Arcus writes: > > > I guess I just mean that levels are only one kind of relation of > > > relevance, and that there are other important ones. Likewise, to have > > > top-level elements like journalarticle is flattening the typing model. > > > > Again I don't quite understand. journalarticle is a type that > > describes an article published in a journal (using the analytical > > information). > > But notice the difference even in how you describe it using natural > language. There are articles that are published in periodicals > (newspapers, magazines, journals, newsletters). The related type is > not intrinsic to the primary reference type. > What is your primary reference type here? For me it is newspaperarticle, journalarticle and so on. This is because these have different content models. E.g. a newspaperarticle reference may contain the section that the article appeared in (U.S., world, business, arts, sports etc). A journalarticle reference doesn't. Remember that the schema was designed to guide the user what data to enter. > But sure, the query to grab the part vs. monographic title is going to > be different when referring to the chapter than it is when referring > to the book it is in. This is only logical; isn't it? > What I was trying to say is this: in your system you have to query for a container title if you're interested what book a chapter was published in. You have to query for a standalone title if you're interested in the book. But in both cases the data relate to the same physical book. This distinction does not make sense to me. If you want a book title, you're asking for monographic data, regardless of whether that book contains chapters or is part of a series. > E.g. you say to someone "look at the chapter titled 'abc' in the book > called 'xyz'." You don't say "look at analytical title 'abc' that has > a monographic title 'xyz'." > But the meaning is equivalent (as every librarian could witness). Whereas it is utmost confusing if you tell someone "look at the item 'xyz' unless it is a book that contains chapters. In that case, look at the container 'xyz' which will contain the item 'abc'". This just allows you to make the same errors like RIS in a much more sophisticated way. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bda...@gm...> - 2006-04-26 23:53:59
|
On 4/26/06, Markus Hoenicka <mar...@mh...> wrote: > Bruce D'Arcus writes: > > But notice the difference even in how you describe it using natural > > language. There are articles that are published in periodicals > > (newspapers, magazines, journals, newsletters). The related type is > > not intrinsic to the primary reference type. > > > > What is your primary reference type here? Article. > For me it is newspaperarticle, journalarticle and so on. This is because = these have > different content models. E.g. a newspaperarticle reference may > contain the section that the article appeared in (U.S., world, business, > arts, sports etc). A journalarticle reference doesn't. Remember that > the schema was designed to guide the user what data to enter. OK, though a) I'm not so sure journals don't have sections (indeed, Nature's RSS feeds include section info), and b) even if not, allowing an optional "section" element on article would hardly be problematic. > > But sure, the query to grab the part vs. monographic title is going to > > be different when referring to the chapter than it is when referring > > to the book it is in. This is only logical; isn't it? > > What I was trying to say is this: in your system you have to query for > a container title if you're interested what book a chapter was > published in. You have to query for a standalone title if you're > interested in the book. But in both cases the data relate to the same > physical book. The point is that queries are contextual. If I ask for a person's name, that is a different query than to ask for the name of an author, even though they are the same person. > This distinction does not make sense to me. If you want > a book title, you're asking for monographic data, regardless of > whether that book contains chapters or is part of a series. I guess we're just going to have to agree to disagree. I don't find it at all intuitive, and as I said before, I think consensus is on my side. > > E.g. you say to someone "look at the chapter titled 'abc' in the book > > called 'xyz'." You don't say "look at analytical title 'abc' that has > > a monographic title 'xyz'." > > But the meaning is equivalent (as every librarian could > witness). Whereas it is utmost confusing if you tell someone "look at > the item 'xyz' unless it is a book that contains chapters. In that > case, look at the container 'xyz' which will contain the item > 'abc'". This just allows you to make the same errors like RIS in a > much more sophisticated way. I don't see how. I've already written a schema that works. It's as easy as: element biblio:Article { rdf.id, (biblio.base_properties & biblio.part_properties) } The problem is that this level business comes from librarians, who in fact a) have different metadata needs than scholera (their focus is the "monographic"), and b) have moved on to more advanced modelling as witnessed in MODS, and in particular, FRBR. Nowhere will you find in those models hard-coded notions like part and monograph/publication. Bruce |
From: Markus H. <mar...@mh...> - 2006-04-27 08:09:09
|
Bruce D'Arcus <bda...@gm...> was heard to say: > > OK, though a) I'm not so sure journals don't have sections (indeed, > Nature's RSS feeds include section info), and b) even if not, allowing > an optional "section" element on article would hardly be problematic. In fact, I've done just that, now that I reread my schema. journalabstract, journalarticle, magazinearticle, and newspaperarticle share the same content model (the schema went through dozens of reincarnations, so I'm confused at times). However, as I noticed previously I have to record the distinction between e.g. an abstract and an article published in a journal, although this is the very same according to your model. If it comes to formatting, it is the same no more, therefore I must be able to tell them apart. Hence two different types. > I guess we're just going to have to agree to disagree. I don't find it > at all intuitive, and as I said before, I think consensus is on my > side. > I'm not so sure although I don't have any statistical data. There was a discussion on the TEI list about the biblStruct and biblItem elements just the other day. There were several posts in favour of the existing analytical/monographic/series distinction, and none in favour of replacing it with something else. And most of these posts were from people who were concerned about transforming the bibliographic data to particular output formats. > I don't see how. I've already written a schema that works. It's as easy as: > > element biblio:Article { rdf.id, (biblio.base_properties & > biblio.part_properties) } > I've got here: periodicalarticle-content = attlist.entry, periodicalarticle-part, (relation | periodical-publication), refinfo, libinfo I don't see a major difference , except that in my case "xyz-part" invariably means analytical info, and that the monographic container can either be referenced (if it already exists), or its information can be provided directly (as a convenience when writing or importing datasets). > The problem is that this level business comes from librarians, who in > fact a) have different metadata needs than scholera (their focus is > the "monographic"), and b) have moved on to more advanced modelling as > witnessed in MODS, and in particular, FRBR. Nowhere will you find in > those models hard-coded notions like part and monograph/publication. > In MODS, they shamefully insert a "monographic" into the issuance element of the origin information of a monographic item. If you have an item a "is_part_of" item b, you could not tell whether to format it as an analytical item that appeared in a monograph or as a monographic item that appeared in a series unless you mark one of the items as a monograph. And I don't see why this should not be done explicitly. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-04-27 11:09:05
|
Markus Hoenicka <mar...@mh...> was heard to say: > I'm not so sure although I don't have any statistical data. There was a > discussion on the TEI list about the biblStruct and biblItem elements just > the > other day. There were several posts in favour of the existing > analytical/monographic/series distinction, and none in favour of replacing it > with something else. And most of these posts were from people who were > concerned about transforming the bibliographic data to particular output > formats. > I have to correct myself here. The discussion is still going on, and a few minutes ago one of the proponents of the new biblItem raised his voice: http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0604&L=tei-l&T=0&P=11282 It is interesting to see that the new structure is not far from what I proposed. You can code a two-level item (like chapter in book or article in journal) either as nested bibItems or as linked biblItems. It is a bit weird that you can code the former as either "chapter in book" or as "book containing chapter" depending on how you intend to cite it. It somewhat runs against common sense to have a child element encode the container of a parent element, but if you do it in the more natual order (book as a parent biblItem, chapter as a child biblItem) you're forced to reference the innermost biblItem when you cite it. rbib offers only "chapter in book" in this case, and the chapter and book data are not nested but put side-by-side, thus avoiding the possible orthogonality of XML parent/child and bibliographic container/part relationships. You should note though that even this fairly flexible system uses <title level="a"> and <title level="m"> to denote analytical and monographic titles, respectively. There seems to be an understanding that this is required to getting things like citations and bibliography formatting right. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bda...@gm...> - 2006-04-27 12:36:08
|
On 4/27/06, Markus Hoenicka <mar...@mh...> wrote: > You should note though that even this fairly flexible system uses <title > level=3D"a"> and <title level=3D"m"> to denote analytical and monographic= titles, > respectively. There seems to be an understanding that this is required to > getting things like citations and bibliography formatting right. I was involved in the development of the new TEI stuff, and I actually argued strongly they should have gotten rid of this precisely because it's unnecessary and inconsistent. See: <http://sourceforge.net/mailarchive/forum.php?thread_id=3D6708394&forum_id= =3D41283> I guess I lost that argument though. Bruce |
From: Bruce D'A. <bda...@gm...> - 2006-04-27 13:04:27
|
On 4/27/06, Markus Hoenicka <mar...@mh...> wrote: > tHowever, as I noticed previously I have to record the distinction > between e.g. an abstract and an article published in a journal, although = this > is the very same according to your model. IIRC, I have Abstract as a subclass of article in my RDF schema . So it does gets its own type. > > The problem is that this level business comes from librarians, who in > > fact a) have different metadata needs than scholera (their focus is > > the "monographic"), and b) have moved on to more advanced modelling as > > witnessed in MODS, and in particular, FRBR. Nowhere will you find in > > those models hard-coded notions like part and monograph/publication. > > In MODS, they shamefully insert a "monographic" into the issuance element= of the > origin information of a monographic item. If you have an item a "is_part_= of" > item b, you could not tell whether to format it as an analytical item tha= t > appeared in a monograph or as a monographic item that appeared in a serie= s > unless you mark one of the items as a monograph. And I don't see why this > should not be done explicitly. Yes, I agree. To explain, I'm just using RDF -- the model and the class sytem -- to achieve this. So if you look at the schema (http://purl.org/net/biblio), you'lll see Collection as a top-level class. Periodical is a subclass of Collection, and Journal of Periodical. Representing in XML/RDF, you then have: <Article rdf:about=3D"info:doi/343254254556557x"> <title>Whatever</title> <publishedIn rdf:resource=3D"urn:issn:2343-2314"/> </Article> <Journal rdf:about=3D"urn:issn:2343-2314"> <title>Some Journal</title> </Journal> This is the RDF approach to data normalization, but you coulld also do: <Article rdf:about=3D"info:doi/343254254556557x"> <title>Whatever</title> <publishedIn> <Journal> <title>Some Journal</title> </Journal> </publishedIn> </Article> I started looking into RDF seriously after having finished my book and going through and trying to fix a lot of my MODS data. I realized it was hard to normalize it (in the XML), in part because MODS isn't RDF. You can see the results of the conversion here: <http://www.users.muohio.edu/darcusb/meta/references/> I've now upgraded citeproc to format this stuff. The OpenDocument group at OASIS (which I am part of now) has been looking into using RDF to provide an extensible metadata system in the format, something like Adobe's XMP (which is an RDF subset). The fact that RDF has a simple and clear model makes extension trivial; it's designed for mixing data. Bruce |
From: Markus H. <mar...@mh...> - 2006-04-27 20:50:06
|
Markus Hoenicka writes: > RefDB releases. The schema, a description, and some example entries > are available here: > > http://refdb.sourceforge.net/rbib.html > It just came to my attention that I forgot to upload three images along with the above web page. If you ever wondered what the nonsensical stuff approximately at the middle of the page was meant to say, please go back again. It might make a few things clearer. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |