libdb-develop Mailing List for LibDB (Page 8)
Status: Inactive
Brought to you by:
morbus
You can subscribe to this list here.
2004 |
Jan
(48) |
Feb
(58) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(29) |
Aug
(36) |
Sep
(5) |
Oct
(1) |
Nov
(32) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
|
Feb
(4) |
Mar
|
Apr
(2) |
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Bruce D'A. <bd...@fa...> - 2004-01-28 15:46:06
|
Morbus, while I got your announcement, it seems I'm having stuff bounce back from the sourceforge lists (cc-ing here). A message I sent to the libdb list was trying to understand how represent the FRBR for my needs. I've gotten farther with it, now pasted again below. Do I have it right? Also, a question: why a separate "character" table? Are not characters (virtual) people with perhaps a different role? > They'll be a number of pre-built ... roles (Director, Writer, Special > Effects, etc.), and annotations (Chapters, Review, Summary, etc.) as > well. On the first, have you looked at the marc relator list? It's quite extensive, and covers most of what I need (though is missing a few things too). I'm big on annotation (my own) myself, but don't you need a way to track *who* is doing the annotating? Bruce <work ID="one"> <isCreatedBy role="speaker"> <person ID="doej"> <name> <given>John</given> <other abbrev="yes">Q</other> <family>Doe</family> </name> </person> </isCreatedBy> <hasTitle> <titleMain>Title</titleMain> <titleSub>Subtitle</titleSub> </hasTitle> <isRealizedThrough ID="one-A"> <event> <hasTitle> <titleMain>A Conference</titleMain> </hasTitle> <date>2002-10-10</date> <place>New York</place> </event> <isEmbodiedIn status="published"> <text> <isPartOf> <monograph> <hasTitle> <titleMain>A Book</titleMain> </hasTitle> <hasOrigin> <publisher> <organization> <name> <full>ABC Publishers</full> </name> <place>New York</place> </organization> </publisher> <dateIssued>2003</dateIssued> </hasOrigin> <hasNumbers> <range unit="page"> <start>21</start> <end>34</end> </range> </hasNumbers> </monograph> </isPartOf> </text> <isExemplfiedIn> <location>archive</location> </isExemplfiedIn> </isEmbodiedIn> </isRealizedThrough> </work> |
From: Bruce D'A. <bd...@fa...> - 2004-01-26 13:33:32
|
There's been an interesting discussion going on at the mods list about titles, and how to handle sort order coding. I think this suggestion is a good solution myself. <titleInfo> <title>A shield in space?</title> <titleSub>technology, politics, and the strategic defense initiative</titleSub> <titleSort>shield in space?</titleSort> <titleAbbrev>A shield in space?</titleAbbrev> </titleInfo> This is in contrast to the current solution: <titleInfo> <nonSort>A</nonSort> <title>shield in space?</title> <subTitle>technology, politics, and the strategic defense initiative</titleSub> </titleInfo> An alternate suggestion was to rename title to titleMain, which is also good, but breaks existing practice in MODS, something not too important for libdb. The question of names is an even bigger issue, and my earlier post reflects my thinking on this (need for articular, for abbreviation coding, and an other name element, etc. influenced by Morten's doc). Bruce |
From: Morbus I. <mo...@di...> - 2004-01-26 12:42:10
|
>Morbus, while I got your announcement, it seems I'm having stuff bounce >back from the sourceforge lists (cc-ing here). A message I sent to the Hmm. Do you have the bounce message still? >Also, a question: why a separate "character" table? Are not >characters (virtual) people with perhaps a different role? They are, but things get muddled a bit more when you consider that characters can be based off real people. If I have "Morbus Iff" as a real life person, and then you play me in a movie, there'd also be a "Morbus Iff" character. I fear the confusion that can arise: which of these (now) two "person" entries are the character, and which is the real person? This can be solved by adding a new column like "isCharacter", but then we run into issues (?) of database cleanliness: very rarely do characters in a movie (or even a book) have dates for when they were born and died - all those fields would go empty, resulting in a table that is only half being used. I also worried about it from a table length point of view: a million records in one table will certainly slow down queries, and merging characters into persons could easily create 50+ rows for *one* movie in *one* table. I dunno. Seemed like the inevitable "redesign the schema" event would come a lot sooner with something like that. As for the whole "role" thing and characters, I'm basing it on inference, really: * if movie has character entry... * and character has person entry... * then person "has role Cast" of movie. A similar relationship can be made from book author to character, though "Cast" wouldn't be the right term. I've not thought of what term it would be, solely because a) I've found no source for raw metadata about characters in a book, b) the current goals focus on movies, c) I don't think anyone, at this point in the game, is going to be too concerned with modeling characters in a book. >On the first, have you looked at the marc relator list? It's quite >extensive, and covers most of what I need (though is missing a few I don't recall that one - link? >I'm big on annotation (my own) myself, but don't you >need a way to track *who* is doing the annotating? Yup, this is possible in the current database. Annotations have referers (the entity doing the annotation) and referents (the entity receiving the annotation). So, in the current sample data, we've got these relationships: * the DVD manifestation has summary of the movie expression. * the DVD manifestation has tagline of the movie expression. * the DVD manifestation has chapters of the movie expression. * the person "Morbus Iff" has reviewed the movie expression. Likewise, you could add: * the person "Bruce D'Arcus" has noted the DVD manifestation. * the person "Bruce D'Arcus" left testimonial of the person "Morbus Iff". And so and so forth. The referer/referent ideals are used throughout the entire database schema, to support the idea that every row of every table should be able to relate, somehow, to every other row of every other table. ><work ID="one"> This seems good to me, yes. -- Morbus Iff ( small pieces of morbus loosely joined ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-25 14:13:26
|
Was just playing around trying to understand how to represent FRBR in XML. Is this more-or-less right? It's meant to be a speech, performed at a conference, and then published as, say, a book chapter that is in electronic form. Among other things, should these be nested levels, as I have them here? <record ID="one"> <work> <creator role="speaker"> <person ID="doej"> <name> <given>John</given> <other abbrev="yes">Q</other> <family>Doe</family> </name> </person> </creator> <title> <titleMain>Title</titleMain> <titleSub>Subtitle</titleSub> </title> </work> <expression ID="one-A"> <isPartOf> <event> <title> <titleMain>A Conference</titleMain> </title> <date>2002-10-10</date> <place>New York</place> </event> </isPartOf> <manifestation ID="one-B" status="published"> <isPartOf> <origin> <publisher> <organization> <name> <full>ABC Publishers</full> </name> <place>New York</place> </organization> </publisher> <dateIssued>2003</dateIssued> </origin> <partDesc> <range unit="page"> <start>21</start> <end>33</end> </range> </partDesc> </isPartOf> <item> <location>http://www.example.com/one.pdf</location> </item> </manifestation> </expression> </record> |
From: Morbus I. <mo...@di...> - 2004-01-22 02:14:01
|
Hey all. The first iteration of the database has been finalized and modeled with data from a rather unremarkable movie I own (THE POOL). The latest schema and sample data is heading up to CVS now. Here's your TODO list, if you're interested: * sanity check my SQL. ask questions. poke, prod. * sanity check my data modeling. ask questions. poke. prod. * help me define the following roles. "roles", in the database, are "occupations" or "jobs" that a person or corporate body can have. naturally, for a movie, there are an awful lot of roles (read: an awful lot of different types of crew members). I've defined (either myself, or from IMDb) about 75% of the roles found in THE POOL, but I need some help on the following, either from discovery on Google, or what have you: 2D Artist, Insurance, Lighting, Music Coordinator, Orchestra Contractor, Payroll Accountant, Post-Production Accountant, Score Engineer, Score Mixer, Score Recording, Score Performance, Set Directory, Sound Post-Production, Sound Re-Recording Mixer, Special Effects Coordinator, Special Effects Technician, Stereo Sound Consultant, Titles (company), Video Playback Engineer, Visual Effects Producer Over the next day or so, I'll be fiddling with the wiki to get that in a more presentable form, as well as creating a press release sort of thing for the dozen or so blogs I know waiting to announce it. Then, I'll be coding. Yup. Either way, the Database Schema on the wiki is updated to the latest/final: http://disobey.com/noos/LibDB/?DatabaseSchema -- Morbus Iff ( i still fail to see what this has to do with morocco ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-22 00:02:32
|
>Nice list! Thanks! >I'd like to throw in a suggestion for the DOI here: No problem-o. The following three identifiers have been added: "UPC Type A" "The basic UPC code, referred to as Type A, is composed of twelve digits. These twelve digits are broken up into four groups: Number System Character, Manufacturer's Code, Product Code, and Check Digit." "UPC Type E" "All Type-E UPC codes are eight digits long, the first being the Number System Character and the last being the check digit. The number system character is ALWAYS ZERO. The check digit is calculated from the digits in the Type-A UPC." "DOI" "DOIs are names \(characters and/or digits\) assigned to objects of intellectual property \(physical, digital or abstract\) such as electronic journal articles, images, learning objects, ebooks, images, any kind of content. They are used to provide current information, including where they \(or information about them\) can be found on the Internet. Information about a digital object may change over time, including where to find it, but its DOI will not change." -- Morbus Iff ( if i could change the future, i'd change the past instead ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-21 17:05:34
|
On Jan 21, 2004, at 12:02 PM, Morbus Iff wrote: > LibDB can be configured with an unlimited > number of identifiers per entity, and will ship with at least the > following (from the schema): Nice list! I'd like to throw in a suggestion for the DOI here: http://www.doi.org/ Bruce |
From: Morbus I. <mo...@di...> - 2004-01-21 17:01:29
|
>I thought this was kind of a neat service: >http://www.oclc.org/research/projects/xisbn/ Yeah, I had first read of this from Udell a month or so back: http://weblog.infoworld.com/udell/2003/11/13.html It is useful to LibDB, but I'm not entirely sure how just yet. The ISBN's are certainly helpful: LibDB can be configured with an unlimited number of identifiers per entity, and will ship with at least the following (from the schema): "Artisan Home Entertainment Cat. No." "The Artisan Home Entertainment Catalog Number is displayed in the UPC box of their DVD purchases." "ASIN", "ASIN stands for Amazon Standard Identification Number. Almost every product on our site has its own ASIN -- a unique code we use to identify it. For books, the ASIN is the same as the ISBN number, but for all other products a new ASIN is created when the item is uploaded to our catalogue." "IMDb" "The Internet Movie Database uses a nine-digit string to identify its resources; the first two characters determine whether it's a movie title or person's name." "ISSN" "The ISSN \(International Standard Serial Number\) is an eight-digit number which identifies periodical publications as such, including electronic serials. More than one million ISSN numbers have so far been assigned." "ISBN" "The ISBN is a unique machine-readable identification number, which marks any book unmistakably. For 30 years the ISBN has revolutionized the international book-trade. 159 countries and territories are officially ISBN members." "LCCN" "The Library of Congress began to print catalog cards in 1898 and began to distribute them in 1901. The Library of Congress Card Number was the number used to identify and control catalog cards." And here's an example (shortened for brevity) of THE POOL's identifiers: THE POOL (expression), IMDB: "tt0283027" THE POOL (dvd manifestation), ASIN: "B00006FD94" THE POOL (dvd manifestation), Artisan: "12997" I've also been meaning to add UPC identifiers as a default as well, since http://www.upcdatabase.com/ is a handy collection of them (which does contain THE POOL's DVD UPC, although my book SPIDERING HACKs had a different UPC than what I expected. Haven't investigated that one yet). As for the XISBN utility, it will certainly allow us to suck down more information about other manifestations (and possibly, expressions), but a human would have to be around to manually say "yeah, this ISBN is an expression, and oop, this one is a manifestation of this European printing", etc.). Programmatically, at most I'd be able to say "all these matching ISBN numbers from XISBN are related, somehow, to the FRBR work". The relationships would be revised more stringently when more information became available to the human cataloger. -- Morbus Iff ( if i could change the future, i'd change the past instead ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Ed S. <eh...@po...> - 2004-01-21 16:46:01
|
I thought this was kind of a neat service: http://www.oclc.org/research/projects/xisbn/ That has some parallels to libdb. //Ed |
From: Morbus I. <mo...@di...> - 2004-01-21 02:40:35
|
At 3:15 PM -0800 1/16/04, Andrea Leigh wrote: >You may want to take a look at the Moving Image Collections (MIC) project, >which contains a MIC/ViDe Application Profile database in Microsoft Access >to create records in MPEG-7 and Dublin Core for digital video, audio and >images, and it is available for download >http://gondolin.rutgers.edu/MIC/text/how/cataloging_utility.htm. I'm far >from an expert on MPEG-7, but have been informed that: 1) MPEG-7 uses XML >as the language of choice for the textual representation of content, 2) >supports creation of descriptions of dynamic and permanent segments, >3)supports textual and non-textual data, and can marry both in indexing, 4) >can reside native on an MPEG-4 stream, and 5) is inherently "FRBR-ized," >meaning descriptions can be structured in terms of work (e.g. Luhrman's >Romeo and Juliet), expression (director's cut of Luhrman's Romeo and >Juliet), and manifestation (VHS instantiation of director's cut of Luhrman's >Romeo and Juliet). > >Some other brief notes: >I was intrigued by the concept that Morbus brought up of contextualizing >works by bringing together the film along with ancillary materials used in >the making of the film, such as a script (conceptually more an archival >descriptive model than a bibliographic model of access through related >works). Howard Besser identifies this as a paradigm shift in user >expectations (See "Digital Preservation of Moving Image Material?" < >http://www.gseis.ucla.edu/~howard/Papers/amia-longevity.html>). A project >attempting to bring that type of contextualization through virtual means is >the British Film Institute's screenonline (http://www.screenonline.org.uk/). -- Morbus Iff ( oh, i wish i was a hoggle ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-21 01:57:42
|
Hey all, just catching up on old messages. My original email on this subject was: >Assuming, for the sake of conversational brevity: > > * the normal print book is letter-for-letter > the same as the large print book. > > * an NTSC movie is the frame-for-frame the same > as an American version, but shorter. > >Should they be split as different expressions or manifestations? For the archives and other readers, some clarification concerning NTSC and PAL movies: FPS - Frames Per Second. All NTSC video (the standard in North America and Japan) unreels at 24 fps, the same speed at which motion picture film is projected. PAL video (the European standard), on the other hand, plays at a slightly faster speed of 25 fps speed. Consequently, the same film will play shorter in PAL than in NTSC, even if it is absolutely uncut. -- http://www.videowatchdog.com/home/Glossary.htm "same film" is the key term here. A 89 minute movie filmed in Germany plays out as a 92 minute movie in the US, with absolutely no modification save for frame speed. Since a movie's frame speed is equivalent to a book's print size (ie. faster or slower frames in a movie has no effect on its content, enlarging the print-size of a book has no effect on its content), my determination is that: * one single movie, released in NTSC and PAL formats, is equivalent to one single book, released in normal and large print. Thus, they are two different manifestations of the same expression, even though the extant (page count, film duration) is different. Note that is the *film* itself, not counting ephemera as video company logos (the distributor of an NTSC film is invariably different from the distributor of the PAL version), FBI warnings, MPAA ratings certificates (or similar), etc. Thanks to Barbara B. Tillett and Martin Doerr for responding. -- Morbus Iff ( for safety's sake, don't humiliate me ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-21 01:44:28
|
Re: http://rdfweb.org/topic/NamesInFoaf I don't profess to have any answers, but I do profess to know how I've dealt with the naming issue in my newest project (which has yet to be officially announced): http://disobey.com/noos/LibDB/ http://disobey.com/noos/LibDB/?ProjectGoals Currently, I'm doing things very similarly to FOAF now: * have a master "name" element, containing the "full" username. * have more specific name elements, for splitting/sorting. * have "language (xml:lang) that determines display rulings. * variant names ("Jack Berger", "John Berger", "Morbus Iff", "Kevin Hemenway", "C.S. Lewis", "Clive Staples Lewis", are handled via a variant table, that define relationships back to the "authority record". Variant names would also cover the foaf:nick property. Myself, I know very little about names in other countries, and how they all work, besides the fact that "given+family" is not typical the world over. I continually read reviews and documentation about movies that have incorrectly parsed and displayed a foreign actor's name, blah, blah, blah. As such, I've attempted to make some efforts to supporting the parsing of names, but it's a novice appraoch. This is what I'm currently using, based on the comments and description of the "Person" entity in FRBR. For the sake of archival, I'll inline the relevant parts below. http://disobey.com/noos/LibDB/?DatabaseSchema#h-table__libdb_person NAME The name by which the person is known. A name may include one or more forenames (or given names), matronymics, patronymics, family names (or surnames), sobriquets, dynastic names, etc. A person may be known by more than one name, or by more than one form of the same name. A bibliographic agency normally selects one of those names as the uniform heading for purposes of consistency in naming and referencing the person. The other names or forms of name may be treated as variant names for the person. In some cases (e.g., in the case of a person who writes under more than one pseudonym, or a person who writes both in an official capacity and as an individual) the bibliographic agency may establish more than one uniform heading for the person. Variant names will be treated as relationships. GIVEN_NAME The name given to a person by their parents, themselves, or otherwise. FAMILY_NAME The name inherited to a person from themselves, their parents, or otherwise. LANGUAGE The language code associated with a person's name: en, fr, de, etc. TITLE A word or phrase indicative of rank, office, nobility, honour, etc. (e.g., Major, Premier, Duke, etc.), or a term of address (e.g., Sir, Mrs., etc.). DESIGNATION A numeral, word, or abbreviation indicating succession within a family or dynasty (e.g., III, Jr., etc.), or an epithet or other word or phrase associated with the person (e.g., the Brave, Professional Engineer, etc.). Of the above, only NAME is required. The LibDB project will be aggregating a lot of names, some of which hasn't been parsed knowingly (a MARC record is often parsed as "family, given", but IMDB seems to assume "given family", thus causing programmatic parsing problems on "Mary Kate Olson" or "Baron von Zychowski"). The breakup of NAME into given and family will allow a manual cataloguer to separate things accordingly. Romanization vs. character sets I've not dealt with. From what I can tell, I'll always be receiving romanized name (due to what I've seen on IMDb, as well as existing MARC records). Again, this is not set in stone, nor have I specifically recognized the problem. Since the data represented in LibDB is from a cataloguing perspective and not a "user's choice" perspective, the comments in 1.7 of NamesInFoaf do not apply: Most people have a single prefered (sic) form of their name, that they use to present themselves with. This is the name that most likely occurs on the persons homepage or in citations, and the name that fits well within the current definition of the foaf:name property. The example shows: <foaf:name>Mr. John Allan van Doe, Jr.</foaf:name> which, as per the previously database definition could certainly be possible in a display (based on the interface's coding), but would never be stored as such in the database. Concerning middle names, the LibDB database doesn't specifically support them, under the impression that if the person doesn't use it in their day to day output, it's not knowledge worth recording (neither, for example, would be their shoe size). If people do use their middle name in their day-to-day life, it'd be considered part of "given", and represented there and as part of "name". The first version of the LibDB database design has not specifically addressed the comments of the Dublin Core, as I just became aware: http://dublincore.org/documents/name-representation/. Concerning this though: * is there a non-obsolete report concerning it? * has anyone tried to match up each example in Appendix A to an actual xml:lang attribute? Is xml:lang even appropriate to help determine styling rules? -- Morbus Iff ( they should rename controlled chaos to morbus droppings ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-20 15:00:25
|
>Re: the schema, this may not be terribly relevant to movies (though it >could be depending on one's needs), but one thing I don't see (am I >just missing it?) is a way to represent parts. Concerning parts within movies, you may want to check out ECHO: http://pc-erato2.iei.pi.cnr.it/echo/public/ deliv/D3-1-1%20ECHO%20Metadata%20Modelling.pdf -- Morbus Iff ( oh, i wish i was a hoggle ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: <ar...@uw...> - 2004-01-16 03:18:26
|
Skipping the OpenURL debate and how it might lead to generic identifier= s, I can see great value in having XPath expressions in URLs. For example: http://somewhere.org/author[.=3D'Morbus Iff'] http://somewhere.org/poems/*/author/@techlevel[.=3D'advanced'] AxKit and Cocoon hint at how protocols like SQL can get wrapped in XPat= h expressions, and I suspect that URLs will become even more overloaded w= ith identification information. One aspect of XPath that I think gives it a= n edge is that there are engines out there to work out XPath expressions,= including one for JEdit that I have become quite addicted to. Whatever happens, URLs always seem to grow so having some extra space for this i= s probably a good idea. art= |
From: <joh...@no...> - 2004-01-16 00:21:02
|
The problem is that you are trying to reify (nice word, Dan!) a work when, in fact, there is no such thing. A "work" has no independent existence. You cannot see a "work", you cannot touch a "work", you cannot experience a "work". At best a "work" can be seen as a convenient handle for a set of expressions that purport to be the "same thing". That said, "Swimfan" is of course a "work", just as "A clockwork orange" is a "work". How well-known the work is or how many expressions/manifestations of it there are have NOTHING to do with the determination that it is a work - it is a work because it is distinct from other works. A work does not - cannot - exist independently of an expression of that work (and an expression cannot exist independently of a manifestation of that expression). There was no movie "A clockwork orange" before Kubrick and his associates made it. There was no novel "A clockwork orange" before Anthony Burgess wrote it. And the act of writing created an item, a manifestation, an expression and a work. All rolled up in one. Each draft would be a different expression (and manifestation and item). So a manifestation is an abstraction of features common to a set of items; an expression is an abstraction of features common to a set of manifestations; a work is an abstraction of features common to a set of expressions. Each of these sets can, of course, be a set of one. In every case, however, the properties of the manifestation flow from the properties of the item(s), the properties of the expression flow from the properties of the manifestation(s) and the properties of the work flow from the properties of the expression(s). In the simplest case the item, manifestation, expression and work are coterminous. My own view of movies in the FRBR world is that the work and expression are largely co-terminous, and the various ways of publishing (film reels, VHS cassette, DVD, NTSC vs PAL, director's cuts, etc.) are different at the manifestation level. Dubbing a film into a foreign language almost certainly makes a new expression; I'm not so sure about adding subtitles. This also implies that the novel "A clockwork orange" and the movie "A clockwork orange" are different (but related) works. It implies that a "remake" of a movie (eg. Sabrina or King Kong or Planet of the Apes) are different but related works. A director's commentary would (might?) be a different, but related work; it is not a different expression of the work itself. The intellectual property folks, however, take a rather different position as can be seen from the fact that movie producers pay publishers money for the right to base a movie on a book. It might be reasonable for the library community to follow this approach, but so far this has not been the view taken in the FRBR discussions. Note that it is not only expressions that are "difficult". The boundary conditions for all these sets are difficult. We do, however, have more experience with some (e.g. manifestations) than others. But - suppose a paperback novel is issued with the words "Now a major motion picture" on the cover and later the words are changed to "Winner of Oscar for Best Picture 2003", but everything remains the same - is that a new manifestation? Or just a minor difference between subsets of items forming the manifestation? In the handpress era, typographical corrections could be and were made in the middle of printing a run of sheets. Does such a change establish a new manifestation? a new expression? The boundary conditions are subtle and require judgement calls that we can't make algorithmically with any great accuracy. OCLC's problem was that the intellectual decisions on what constitutes a manifestation were already reflected in the data (each bibliographic record represents a manifestation), and the decisions on what constitutes a work were also largely reflected in the data (in the form of titles proper and uniform titles). The intellectual decisions on what constitutes an expression were not to be found in the data, with the exception of certain subfields in uniform titles, so machine processing to identify expressions was difficult and unsatisfactory. The problem isn't so much that the concept of expression is difficult, but that that we don't have the data to allow the machine to make reliable determinations. Johan Zeeman RLG Morbus Iff <morbus@disobey.c To: fr...@in... om> cc: lib...@li... Subject: Still Fighting with Movie Expressions 01/15/2004 01:58 PM I'm still having conceptual problems breaking a part a movie into WEMI. It's relatively "easy" when the movie is "known", like with a Kubrick or Spielberg or similar. But, stuff like SWIMFAN, MOTEL HELL, STACY, etc., I simply can't figure it out. I originally suspected it was purely of ignorance and the fact that no one has published their own approaches to movies under FRBR, but the more I read, the more I see the same sort of thing over and over again: "expressions are difficult". Most recently (in my reading, not publication-wise) is the Humphry Clinker examination by the OCLC: "While it was possible to identify works and manifestations, identifying expressions was problematic ... Enhanced manifestation records where the roles of editors, illustrators, translators, and other contributors are explicitly identified may be a viable alternative to expressions ... With the enhanced manifestation record ... the FRBR model provides a powerful means to improve bibliographic organization and navigation. How evil and destructive is it, for the time being, to not support expressions, at all, within an FRBRized application? If I have a movie called SWIMFAN, is that a work? It is a "distinct intellectual or artistic creation", but it is also the REALIZATION: it's audio/visual committed to film. The only way I can think to get "one higher" than "realized through film" is the actual shooting script used. The audiovisual elements of a shooting script are REALIZED through the work of many people: the directory, the cinematographer, etc., etc. But, the shooting script won't work... because FRBR says: "By contrast, when the modification of a work involves a significant degree of independent intellectual or artistic effort, the result is viewed, for the purpose of this study, as a new work. Thus paraphrases, rewritings, adaptations for children, parodies, musical variations on a theme and free transcriptions of a musical composition are considered to represent new works." Taking a script and turning it into film seems like a significant degree of work to me, so a film would have to be a work. Treating a film as a work is correct, because FRBR says/infers as much: "Translations from one language to another, musical transcriptions and arrangements, and dubbed or subtitled versions of a film are also considered simply as different expressions of the same original work." An expression "exclude[s] aspects of physical form", so I can't treat the expression of a movie as a DVD release. If I had two different translations of the movie, I don't think there's a problem: W1: Swimfan E1: Swimfan (English language) M1: The DVD from Paramount Pictures. E2: Swimfan (German language dub) The above seems sane to me, and seems like the answer to my problems. In fact, FRBR says the above is sane in it's description of the work entity, which I've already snippetted above. But, I feel the model is "dirty" when I don't have multiple expressions. If SWIMFAN ONLY had an English translation, then it "feels" like there's absolutely no difference between work and expression: W1: Swimfan E1: Swimfan (English language) M1: The DVD from Paramount Pictures In the above model, there's really no difference, whatsoever, between the Swimfan work and the Swimfan expression. Is there? Or am I being too granular? Should I treat WEMI as buckets, with an intended revisiability and extensibility of "always"? Should I always assume (nay, hope!) that someone WILL translate SWIMFAN into another language? Should I, in the face of seemingly duplicity, always consider "language" the shining difference between a work (where language is not defined) and expression (where it is)? I feel like I'm running around in circles on this expression thing - toeing the line between "yes, that's how you do it!" and "noOOOo, you've got it alLLLl wrong, bucko!". Any tips are appreciated. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-15 23:55:59
|
On Jan 15, 2004, at 4:58 PM, Morbus Iff wrote: > How evil and destructive is it, for the time being, to not > support expressions, at all, within an FRBRized application? I may be wrong, but based on my brief look into this, I think of expressions as relating to events; particularly performances. So in my previous example, a speech is a work, its performance is an expression, and anything derived from that is a manifestation. The same would apply for a play or a musical performance of a Beethoven Symphony, both of which could conceivably end up in a movie database. How's that? It's fair, I would assume, to have a record that consists of only a work and its manifestation. Am I wrong? Bruce |
From: Morbus I. <mo...@di...> - 2004-01-15 21:58:54
|
I'm still having conceptual problems breaking a part a movie into WEMI. It's relatively "easy" when the movie is "known", like with a Kubrick or Spielberg or similar. But, stuff like SWIMFAN, MOTEL HELL, STACY, etc., I simply can't figure it out. I originally suspected it was purely of ignorance and the fact that no one has published their own approaches to movies under FRBR, but the more I read, the more I see the same sort of thing over and over again: "expressions are difficult". Most recently (in my reading, not publication-wise) is the Humphry Clinker examination by the OCLC: "While it was possible to identify works and manifestations, identifying expressions was problematic ... Enhanced manifestation records where the roles of editors, illustrators, translators, and other contributors are explicitly identified may be a viable alternative to expressions ... With the enhanced manifestation record ... the FRBR model provides a powerful means to improve bibliographic organization and navigation. How evil and destructive is it, for the time being, to not support expressions, at all, within an FRBRized application? If I have a movie called SWIMFAN, is that a work? It is a "distinct intellectual or artistic creation", but it is also the REALIZATION: it's audio/visual committed to film. The only way I can think to get "one higher" than "realized through film" is the actual shooting script used. The audiovisual elements of a shooting script are REALIZED through the work of many people: the directory, the cinematographer, etc., etc. But, the shooting script won't work... because FRBR says: "By contrast, when the modification of a work involves a significant degree of independent intellectual or artistic effort, the result is viewed, for the purpose of this study, as a new work. Thus paraphrases, rewritings, adaptations for children, parodies, musical variations on a theme and free transcriptions of a musical composition are considered to represent new works." Taking a script and turning it into film seems like a significant degree of work to me, so a film would have to be a work. Treating a film as a work is correct, because FRBR says/infers as much: "Translations from one language to another, musical transcriptions and arrangements, and dubbed or subtitled versions of a film are also considered simply as different expressions of the same original work." An expression "exclude[s] aspects of physical form", so I can't treat the expression of a movie as a DVD release. If I had two different translations of the movie, I don't think there's a problem: W1: Swimfan E1: Swimfan (English language) M1: The DVD from Paramount Pictures. E2: Swimfan (German language dub) The above seems sane to me, and seems like the answer to my problems. In fact, FRBR says the above is sane in it's description of the work entity, which I've already snippetted above. But, I feel the model is "dirty" when I don't have multiple expressions. If SWIMFAN ONLY had an English translation, then it "feels" like there's absolutely no difference between work and expression: W1: Swimfan E1: Swimfan (English language) M1: The DVD from Paramount Pictures In the above model, there's really no difference, whatsoever, between the Swimfan work and the Swimfan expression. Is there? Or am I being too granular? Should I treat WEMI as buckets, with an intended revisiability and extensibility of "always"? Should I always assume (nay, hope!) that someone WILL translate SWIMFAN into another language? Should I, in the face of seemingly duplicity, always consider "language" the shining difference between a work (where language is not defined) and expression (where it is)? I feel like I'm running around in circles on this expression thing - toeing the line between "yes, that's how you do it!" and "noOOOo, you've got it alLLLl wrong, bucko!". Any tips are appreciated. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-15 20:39:01
|
>common might be where there are clearly defined parts. For example, >DVDs have titled and numbered scenes. Or say you have a compilation of Well, the chapter titles would be placed under a "summarization": >A summarization of the content of an expression is an abstract, >summary, synopsis, etc., or a list of chapter headings, songs, parts, >etc. included in the expression. To make things more annoying, however, summarization's are only defined on an EXPRESSION of a movie. Rarely does a movie have chapters in the filmic visual/audio expression (the only one I can think of recently is KILL BILL VOLUME 1); they only show up once it gets to the manifestation stage. In LibDB, I've worked around this by making "summarization" an "annotation". An annotation can be defined against any entity (including people, corporations, and bodies). -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-15 20:26:05
|
On Jan 15, 2004, at 3:09 PM, Morbus Iff wrote: > The main problem that I see it, though, is that an article in a serial > CAN be considered a work all by itself - it doesn't take a huge leap > of faith to make that association. But, I rarely find myself think of > scenes in a movie as individual, stand-alone entities: it's difficult > to just pick 35 seconds of a movie and say "this stands alone". > > "this stands alone" seems to be a required assumption of FRBR, though > they do mention "aggregate works", and that may be key to what we're > trying to solve. If a movie is considered an aggregate work of images > (it is, technically, frame by frame), then we've suddenly got logic on > our side to take a 35 second piece of scenery and call it a work (or a > "part"). I'm too tired and busy to think hard about the details of your example just now (I didn't think a book or journal could be an expression, though am not sure), but in general: Yes, librarians aren't used to thinking much about parts. MODS didn't have that structure until I pointed out the problem and managed to convince a few people why they should care. For my needs they're essential: chapters, articles, and legal cases are all parts. But with respect to movies, obviously there could be some key scenes that might be catalogued as a part. That'd be rare I imagine. More common might be where there are clearly defined parts. For example, DVDs have titled and numbered scenes. Or say you have a compilation of television episodes on a single tape or DVD. In my example, I was imagining a video that had collected speeches. Bruce |
From: Morbus I. <mo...@di...> - 2004-01-15 20:09:31
|
>OK, I'm trying to wrap my head around FRBR and the parts stuff. >Schematically, is this right, using an example of a speech? I'm still new with FRBR too, so take this with a grain of salt. ---------------------------------------------------- work1 = a speech title and creator expression1 = performance place and date manifestations1 = text Relationship: manifestation1 [isPartOf] work2 (parts details = volume, issue, pages) ---------------------------------------------------- work2 = academic journal expression2 = academic journal volume, issue manifestation2 = text (pages, etc.) Relationship: work2 containsPart work1 ---------------------------------------------------- That's the same thing you said (I think), just more verbose. To me, that looks right. The main problem that I see it, though, is that an article in a serial CAN be considered a work all by itself - it doesn't take a huge leap of faith to make that association. But, I rarely find myself think of scenes in a movie as individual, stand-alone entities: it's difficult to just pick 35 seconds of a movie and say "this stands alone". "this stands alone" seems to be a required assumption of FRBR, though they do mention "aggregate works", and that may be key to what we're trying to solve. If a movie is considered an aggregate work of images (it is, technically, frame by frame), then we've suddenly got logic on our side to take a 35 second piece of scenery and call it a work (or a "part"). ---------------------------------------------------- work1 = swimming scene with Kari Wuhrer expression1 = 15 seconds of filmed footage. manifestations1 = film Relationship: manifestation1 [isPartOf] work2 (parts details = duration) ---------------------------------------------------- work2 = Final Examination (movie) expression2 = Final Examination (YYYY; movie) manifestation2 = Final Examination (DVD) Relationship: work2 containsPart work1 ---------------------------------------------------- work3 = Poison (movie) expression3 = Poison (YYYY; movie) manifestation3 = Poison (DVD) Relationship: work3 containsPart work1 Relationship: work3 sharesPart work2 ---------------------------------------------------- Is that we're you're getting at? I had no responses to my similar query about parts in FRBR, but I think your mapping is a lot clearer that whatever gobbledygook I sprouted off to them. If the above is where you were heading, let me know and I'll pop it over to the FRBR list and face the silence again. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-15 19:13:11
|
OK, I'm trying to wrap my head around FRBR and the parts stuff. Schematically, is this right, using an example of a speech? work = a speech title and creator expression = performance place and date manifestations = text [isPartOf] academic journal (parts details = volume, issue, pages) moving image [isPartOf] video compilation (parts details = ______ ) Bruce |
From: Morbus I. <mo...@di...> - 2004-01-15 18:44:51
|
In a lot of the tables within LibDB, I use a 20 character alphanumeric ID to uniquely identify an item. One of the prime reasons for this was to create unique URLs, something like: /person/129387123kj1h23/ /expression/129387123987/ /concept/1237192387193/ and so forth. Going to that URL would give you information about the data being described, as you'd expect. I've been using these 20chars on everything that I felt people would want to look at: a list of identifiers ("create a list based on Artisan's cataloging ID"), people, etc., etc. I've NOT been using 20chars when it came to, what I felt, were database only associations, namely the relationships. The 23rd relationship would have id#23 and so forth. So, here's the question: should I use 20chars on everything? What if, programmatically, a script needed to have a relationship (and only one relationship) described in XML or SQL or whatever else? /relationship/1237192387193/sql /relationship/1237192387193/mods /relationship/1237192387193/n3 Would that ever be useful to people? Would that ever be useful to programming or a web service? Should I just bite the bullet now and use char20's for every table in the database? Thoughts appreciated. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-15 01:45:56
|
Hey there. I've recently committed the first "proper" draft of Group 1 entities and some sample data to go along with it, to CVS. To recap, FRBR Group 1 covers work, expressions, manifestations, and items. I need comments. I'll be starting Group 2 (person and corporate body, probably the most time consuming of this process) sometime tomorrow. If you don't have CVS set up on your machine, you can view the two files through the following URL: http://cvs.sourceforge.net/viewcvs.py/libdb/LibDB/databases/ You should be looking at Rev 1.5 (which should be there shortly, compensating for the lag of the web cvs servers). mysql_sample.sql is a "real world" mapping of piss-poor movie THE POOL, and mysql_schema.sql is the database schema that describes it. For now, ignore the ?DatabaseSchema documentation that is on the wiki. It is a day out of date, which is a lot more than you think. I'll be updating it once I get these two files complete with Group 2 and Group 3 entities. Some specific areas to look at are the relationships defined. In the current schema, there are (way at the bottom), six relationships defined; three come from FRBR and are well-defined: $work is realized through $expression $expression is a realization of $work $expression is embodied in $manifestation $manifestation is an embodiment of $expression $manifestation is exemplified by $item $item is an exemplar of $manifestation You can tell this is a hierarchy. However, three relationships were defined explicitly by me to cover some of the attributes of the various entities above. That's what I need sanity checking on (both in the terms used to link the relationship, and in the actual SQL implementation and layout): $entity is also named $name $name is a variant name of $entity $entity is summarized with $annotation $annotation is a summarization of $entity $annotation is summarized by $entity $entity is a summarization from $entity The last two relationships can be used validly to say: "THE POOL is summarized with [this annotation]"; "[This annotation] is summarized by [this author]"; but, in the sample data, the summarization actually comes from the back of the DVD. So, in essence, another example is: "THE POOL is summarized with [this annotation]"; "[This annotation] is summarized by [the DVD manifestation]"; Thoughts on that? Don't hesitate to ask questions. I'm also posting this to the FRBR list, but that list is incredibly quiet, so I'm not sure I'll get concrete responses back. -- Morbus Iff ( in japan, i'm known as a puchi-iede. ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-14 18:28:42
|
>I'm with Ed. I prefer not to constrain date >representation to simply year anywhere. YYYY-MM-DD it is! -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-14 18:06:43
|
On Jan 14, 2004, at 12:23 PM, Morbus Iff wrote: > >I guess I don't see it as particuarly evil to have 1999-00-00 in the > db, if > >the display code knows that it needs to reformat the date. Especially > >since 1999-07-04 may have to be reformatted as well :) > > Fair enough. Anyone else want to weigh in? Bruce? I'm with Ed. I prefer not to constrain date representation to simply year anywhere. Bruce |