libdb-develop Mailing List for LibDB (Page 8)

Status: Inactive

Brought to you by: morbus

libdb-develop — Developer discussions and bickerings.

You can subscribe to this list here.

2004	_Jan (48)	_Feb (58)	_Mar	_Apr (1)	_May	_Jun	_Jul (29)	_Aug (36)	_Sep (5)	_Oct (1)	_Nov (32)	_Dec (1)
2005	_Jan	_Feb (4)	_Mar	_Apr (2)	_May (2)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2006	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug (3)	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 6 7 8 9 > >> (Page 8 of 9)

[libdb-develop] couple things (character, roles, and the FRBR)

From: Bruce D'A. <bd...@fa...> - 2004-01-28 15:46:06

Morbus, while I got your announcement, it seems I'm having stuff bounce 
back from the sourceforge lists (cc-ing here).  A message I sent to the 
libdb list was trying to understand how represent the FRBR for my 
needs.  I've gotten farther with it, now pasted again below.  Do I have 
it right?

Also, a question: why a separate "character" table?  Are not characters 
(virtual) people with perhaps a different role?

> They'll be a number of pre-built ... roles (Director, Writer, Special 
> Effects, etc.), and annotations (Chapters, Review, Summary, etc.) as 
> well.

On the first, have you looked at the marc relator list?  It's quite 
extensive, and covers most of what I need (though is missing a few 
things too).

I'm big on annotation (my own) myself, but don't you need a way to 
track *who* is doing the annotating?

Bruce

<work ID="one">
   <isCreatedBy role="speaker">
     <person ID="doej">
       <name>
	<given>John</given>
	<other abbrev="yes">Q</other>
	<family>Doe</family>
       </name>
     </person>
   </isCreatedBy>
   <hasTitle>
     <titleMain>Title</titleMain>
     <titleSub>Subtitle</titleSub>
   </hasTitle>
   <isRealizedThrough ID="one-A">
     <event>
       <hasTitle>
	<titleMain>A Conference</titleMain>
       </hasTitle>
       <date>2002-10-10</date>
       <place>New York</place>
     </event>
     <isEmbodiedIn status="published">
       <text>
	<isPartOf>
	  <monograph>
	    <hasTitle>
	      <titleMain>A Book</titleMain>
	    </hasTitle>
	    <hasOrigin>
	      <publisher>
		<organization>
		  <name>
		    <full>ABC Publishers</full>
		  </name>
		  <place>New York</place>
		</organization>
	      </publisher>
	      <dateIssued>2003</dateIssued>
	    </hasOrigin>
	    <hasNumbers>
	      <range unit="page">
		<start>21</start>
		<end>34</end>
	      </range>
	    </hasNumbers>
	  </monograph>
	</isPartOf>
       </text>
       <isExemplfiedIn>
	<location>archive</location>
       </isExemplfiedIn>
     </isEmbodiedIn>
   </isRealizedThrough>
</work>

[libdb-develop] title and names (mods)

From: Bruce D'A. <bd...@fa...> - 2004-01-26 13:33:32

There's been an interesting discussion going on at the mods list about 
titles, and how to handle sort order coding.  I think this suggestion 
is a good solution myself.

<titleInfo>
     <title>A shield in space?</title>
     <titleSub>technology, politics, and the strategic defense 
initiative</titleSub>
     <titleSort>shield in space?</titleSort>
     <titleAbbrev>A shield in space?</titleAbbrev>
</titleInfo>

This is in contrast to the current solution:

<titleInfo>
     <nonSort>A</nonSort>
     <title>shield in space?</title>
     <subTitle>technology, politics, and the strategic defense 
initiative</titleSub>
</titleInfo>

An alternate suggestion was to rename title to titleMain, which is also 
good, but breaks existing practice in MODS, something not too important 
for libdb.

The question of names is an even bigger issue, and my earlier post 
reflects my thinking on this (need for articular, for abbreviation 
coding, and an other name element, etc. influenced by Morten's doc).

Bruce

[libdb-develop] Re: couple things (character, roles, and the FRBR)

From: Morbus I. <mo...@di...> - 2004-01-26 12:42:10

>Morbus, while I got your announcement, it seems I'm having stuff bounce
>back from the sourceforge lists (cc-ing here).  A message I sent to the

Hmm. Do you have the bounce message still?

>Also, a question: why a separate "character" table?  Are not
>characters (virtual) people with perhaps a different role?

They are, but things get muddled a bit more when you consider that
characters can be based off real people. If I have "Morbus Iff" as
a real life person, and then you play me in a movie, there'd also
be a "Morbus Iff" character. I fear the confusion that can arise:
which of these (now) two "person" entries are the character, and
which is the real person?

This can be solved by adding a new column like "isCharacter", but
then we run into issues (?) of database cleanliness: very rarely
do characters in a movie (or even a book) have dates for when
they were born and died - all those fields would go empty,
resulting in a table that is only half being used.

I also worried about it from a table length point of view: a
million records in one table will certainly slow down queries,
and merging characters into persons could easily create 50+
rows for *one* movie in *one* table. I dunno. Seemed like
the inevitable "redesign the schema" event would come a lot
sooner with something like that.

As for the whole "role" thing and characters,
I'm basing it on inference, really:

 * if movie has character entry...
 * and character has person entry...
 * then person "has role Cast" of movie.

A similar relationship can be made from book author to character, though
"Cast" wouldn't be the right term. I've not thought of what term it would
be, solely because a) I've found no source for raw metadata about
characters in a book, b) the current goals focus on movies, c) I don't
think anyone, at this point in the game, is going to be too concerned with
modeling characters in a book.

>On the first, have you looked at the marc relator list?  It's quite
>extensive, and covers most of what I need (though is missing a few

I don't recall that one - link?

>I'm big on annotation (my own) myself, but don't you
>need a way to track *who* is doing the annotating?

Yup, this is possible in the current database. Annotations have
referers (the entity doing the annotation) and referents (the
entity receiving the annotation). So, in the current sample
data, we've got these relationships:

 * the DVD manifestation has summary of the movie expression.
 * the DVD manifestation has tagline of the movie expression.
 * the DVD manifestation has chapters of the movie expression.
 * the person "Morbus Iff" has reviewed the movie expression.

Likewise, you could add:

 * the person "Bruce D'Arcus" has noted the DVD manifestation.
 * the person "Bruce D'Arcus" left testimonial of the person "Morbus Iff".

And so and so forth. The referer/referent ideals are used throughout the
entire database schema, to support the idea that every row of every table
should be able to relate, somehow, to every other row of every other table.

><work ID="one">

This seems good to me, yes.

-- 
Morbus Iff ( small pieces of morbus loosely joined )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] FRBR in XML

From: Bruce D'A. <bd...@fa...> - 2004-01-25 14:13:26

Was just playing around trying to understand how to represent FRBR in 
XML.  Is this more-or-less right?  It's meant to be a speech, performed 
at a conference, and then published as, say, a book chapter that is in 
electronic form.  Among other things, should these be nested levels, as 
I have them here?

<record ID="one">
   <work>
     <creator role="speaker">
       <person ID="doej">
	<name>
	  <given>John</given>
	  <other abbrev="yes">Q</other>
	  <family>Doe</family>
	</name>
       </person>
     </creator>
     <title>
       <titleMain>Title</titleMain>
       <titleSub>Subtitle</titleSub>
     </title>
   </work>
   <expression ID="one-A">
     <isPartOf>
       <event>
	<title>
	  <titleMain>A Conference</titleMain>
	</title>
	<date>2002-10-10</date>
	<place>New York</place>
       </event>
     </isPartOf>
     <manifestation ID="one-B" status="published">
       <isPartOf>
	<origin>
	  <publisher>
	    <organization>
	      <name>
		<full>ABC Publishers</full>
	      </name>
	      <place>New York</place>
	    </organization>
	  </publisher>
	  <dateIssued>2003</dateIssued>
	</origin>
	<partDesc>
	   <range unit="page">
	      <start>21</start>
	      <end>33</end>
	   </range>
	</partDesc>
       </isPartOf>
       <item>
	<location>http://www.example.com/one.pdf</location>
       </item>
     </manifestation>
   </expression>
</record>

[libdb-develop] Database First Draft Complete

From: Morbus I. <mo...@di...> - 2004-01-22 02:14:01

Hey all. The first iteration of the database has been finalized and
modeled with data from a rather unremarkable movie I own (THE POOL).
The latest schema and sample data is heading up to CVS now.

Here's your TODO list, if you're interested:

 * sanity check my SQL. ask questions. poke, prod.

 * sanity check my data modeling. ask questions. poke. prod.

 * help me define the following roles. "roles", in the database,
   are "occupations" or "jobs" that a person or corporate body can
   have. naturally, for a movie, there are an awful lot of roles
   (read: an awful lot of different types of crew members). I've
   defined (either myself, or from IMDb) about 75% of the roles
   found in THE POOL, but I need some help on the following,
   either from discovery on Google, or what have you:

     2D Artist, Insurance, Lighting, Music Coordinator,
     Orchestra Contractor, Payroll Accountant,
     Post-Production Accountant, Score Engineer,
     Score Mixer, Score Recording, Score Performance,
     Set Directory, Sound Post-Production,
     Sound Re-Recording Mixer, Special Effects Coordinator,
     Special Effects Technician, Stereo Sound Consultant,
     Titles (company), Video Playback Engineer, Visual
     Effects Producer

Over the next day or so, I'll be fiddling with the wiki to get that
in a more presentable form, as well as creating a press release sort
of thing for the dozen or so blogs I know waiting to announce it.
Then, I'll be coding. Yup. Either way, the Database Schema on the
wiki is updated to the latest/final:

  http://disobey.com/noos/LibDB/?DatabaseSchema

-- 
Morbus Iff ( i still fail to see what this has to do with morocco )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] xISBN

From: Morbus I. <mo...@di...> - 2004-01-22 00:02:32

>Nice list!

Thanks!

>I'd like to throw in a suggestion for the DOI here:

No problem-o. The following three identifiers have been added:

  "UPC Type A"
    "The basic UPC code, referred to as Type A, is composed of twelve
    digits. These twelve digits are broken up into four groups: Number
    System Character, Manufacturer's Code, Product Code, and Check Digit."

  "UPC Type E"
    "All Type-E UPC codes are eight digits long, the first being
    the Number System Character and the last being the check digit.
    The number system character is ALWAYS ZERO. The check digit is
    calculated from the digits in the Type-A UPC."

  "DOI"
    "DOIs are names \(characters and/or digits\) assigned to objects of
    intellectual property \(physical, digital or abstract\) such as
    electronic journal articles, images, learning objects, ebooks,
    images, any kind of content. They are used to provide current
    information, including where they \(or information about them\)
    can be found on the Internet. Information about a digital object
    may change over time, including where to find it, but its
    DOI will not change."

-- 
Morbus Iff ( if i could change the future, i'd change the past instead )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] xISBN

From: Bruce D'A. <bd...@fa...> - 2004-01-21 17:05:34

On Jan 21, 2004, at 12:02 PM, Morbus Iff wrote:

> LibDB can be configured with an unlimited
> number of identifiers per entity, and will ship with at least the
> following (from the schema):

Nice list!

I'd like to throw in a suggestion for the DOI here:

http://www.doi.org/

Bruce

Re: [libdb-develop] xISBN

From: Morbus I. <mo...@di...> - 2004-01-21 17:01:29

>I thought this was kind of a neat service:
>http://www.oclc.org/research/projects/xisbn/

Yeah, I had first read of this from Udell a month or so back:

  http://weblog.infoworld.com/udell/2003/11/13.html

It is useful to LibDB, but I'm not entirely sure how just yet. The
ISBN's are certainly helpful: LibDB can be configured with an unlimited
number of identifiers per entity, and will ship with at least the
following (from the schema):

  "Artisan Home Entertainment Cat. No."
   "The Artisan Home Entertainment Catalog Number
   is displayed in the UPC box of their DVD purchases."

  "ASIN",
    "ASIN stands for Amazon Standard Identification Number. Almost
    every product on our site has its own ASIN -- a unique code we use
    to identify it. For books, the ASIN is the same as the ISBN number,
    but for all other products a new ASIN is created when the item
    is uploaded to our catalogue."

  "IMDb"
    "The Internet Movie Database uses a nine-digit string to
    identify its resources; the first two characters determine
    whether it's a movie title or person's name."

  "ISSN"
    "The ISSN \(International Standard Serial Number\) is an
    eight-digit number which identifies periodical publications
    as such, including electronic serials. More than one million
    ISSN numbers have so far been assigned."

  "ISBN"
    "The ISBN is a unique machine-readable identification number,
    which marks any book unmistakably. For 30 years the ISBN has
    revolutionized the international book-trade. 159 countries
    and territories are officially ISBN members."

  "LCCN"
    "The Library of Congress began to print catalog cards in 1898
    and began to distribute them in 1901. The Library of Congress
    Card Number was the number used to identify and control
    catalog cards."

And here's an example (shortened for brevity) of THE POOL's identifiers:

 THE POOL (expression), IMDB: "tt0283027"
 THE POOL (dvd manifestation), ASIN: "B00006FD94"
 THE POOL (dvd manifestation), Artisan: "12997"

I've also been meaning to add UPC identifiers as a default as well,
since http://www.upcdatabase.com/ is a handy collection of them
(which does contain THE POOL's DVD UPC, although my book SPIDERING
HACKs had a different UPC than what I expected. Haven't investigated
that one yet).

As for the XISBN utility, it will certainly allow us to suck down
more information about other manifestations (and possibly, expressions),
but a human would have to be around to manually say "yeah, this ISBN
is an expression, and oop, this one is a manifestation of this
European printing", etc.).

Programmatically, at most I'd be able to say "all these matching
ISBN numbers from XISBN are related, somehow, to the FRBR work".
The relationships would be revised more stringently when more
information became available to the human cataloger.

-- 
Morbus Iff ( if i could change the future, i'd change the past instead )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] xISBN

From: Ed S. <eh...@po...> - 2004-01-21 16:46:01

I thought this was kind of a neat service:

    http://www.oclc.org/research/projects/xisbn/

That has some parallels to libdb. 

//Ed

Re: [libdb-develop] parts

From: Morbus I. <mo...@di...> - 2004-01-21 02:40:35

At 3:15 PM -0800 1/16/04, Andrea Leigh wrote:
>You may want to take a look at the Moving Image Collections (MIC) project,
>which contains a MIC/ViDe Application Profile database in Microsoft Access
>to create records in MPEG-7 and Dublin Core for digital video, audio and
>images, and it is available for download
>http://gondolin.rutgers.edu/MIC/text/how/cataloging_utility.htm.  I'm far
>from an expert on MPEG-7, but have been informed that: 1)  MPEG-7 uses XML
>as the language of choice for the textual representation of content, 2)
>supports creation of descriptions of dynamic and permanent segments,
>3)supports textual and non-textual data, and can marry both in indexing, 4)
>can reside native on an MPEG-4 stream, and 5) is inherently "FRBR-ized,"
>meaning descriptions can be structured in terms of work (e.g. Luhrman's
>Romeo and Juliet), expression (director's cut of Luhrman's Romeo and
>Juliet), and manifestation (VHS instantiation of director's cut of Luhrman's
>Romeo and Juliet).
>
>Some other brief notes:
>I was intrigued by the concept that Morbus brought up of contextualizing
>works by bringing together the film along with ancillary materials used in
>the making of the film, such as a script (conceptually more an archival
>descriptive model than a bibliographic model of access through related
>works). Howard Besser identifies this as a paradigm shift in user
>expectations (See "Digital Preservation of Moving Image Material?" <
>http://www.gseis.ucla.edu/~howard/Papers/amia-longevity.html>). A project
>attempting to bring that type of contextualization through virtual means is
>the British Film Institute's screenonline (http://www.screenonline.org.uk/).

-- 
Morbus Iff ( oh, i wish i was a hoggle )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] Re: Movies/NTSC Equivalent To Books/Large Print?

From: Morbus I. <mo...@di...> - 2004-01-21 01:57:42

Hey all, just catching up on old messages.
My original email on this subject was:

 >Assuming, for the sake of conversational brevity:
 >
 >  * the normal print book is letter-for-letter
 >    the same as the large print book.
 >
 >  * an NTSC movie is the frame-for-frame the same
 >    as an American version, but shorter.
 >
 >Should they be split as different expressions or manifestations?

For the archives and other readers, some
clarification concerning NTSC and PAL movies:

 FPS - Frames Per Second. All NTSC video (the standard in North America
 and Japan) unreels at 24 fps, the same speed at which motion picture film
 is projected. PAL video (the European standard), on the other hand, plays
 at a slightly faster speed of 25 fps speed. Consequently, the same film
 will play shorter in PAL than in NTSC, even if it is absolutely uncut.

                     -- http://www.videowatchdog.com/home/Glossary.htm

"same film" is the key term here. A 89 minute movie filmed in Germany
plays out as a 92 minute movie in the US, with absolutely no modification
save for frame speed.

Since a movie's frame speed is equivalent to a book's print size
(ie. faster or slower frames in  a movie has no effect on its
content, enlarging the print-size of a book has no effect on its
content), my determination is that:

 * one single movie, released in NTSC and PAL formats, is equivalent
   to one single book, released in normal and large print. Thus,
   they are two different manifestations of the same expression,
   even though the extant (page count, film duration) is different.

Note that is the *film* itself, not counting ephemera as video
company logos (the distributor of an NTSC film is invariably
different from the distributor of the PAL version), FBI warnings,
MPAA ratings certificates (or similar), etc.

Thanks to Barbara B. Tillett and Martin Doerr for responding.

-- 
Morbus Iff ( for safety's sake, don't humiliate me )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] On Naming: NamesInFoaf and LibDB

From: Morbus I. <mo...@di...> - 2004-01-21 01:44:28

Re: http://rdfweb.org/topic/NamesInFoaf

I don't profess to have any answers, but I do profess to know how I've
dealt with the naming issue in my newest project (which has yet to be
officially announced):

  http://disobey.com/noos/LibDB/
  http://disobey.com/noos/LibDB/?ProjectGoals

Currently, I'm doing things very similarly to FOAF now:

 * have a master "name" element, containing the "full" username.

 * have more specific name elements, for splitting/sorting.

 * have "language (xml:lang) that determines display rulings.

 * variant names ("Jack Berger", "John Berger", "Morbus Iff",
   "Kevin Hemenway", "C.S. Lewis", "Clive Staples Lewis", are
   handled via a variant table, that define relationships back
   to the "authority record". Variant names would also
   cover the foaf:nick property.

Myself, I know very little about names in other countries, and how
they all work, besides the fact that "given+family" is not typical
the world over. I continually read reviews and documentation about
movies that have incorrectly parsed and displayed a foreign actor's
name, blah, blah, blah. As such, I've attempted to make some efforts
to supporting the parsing of names, but it's a novice appraoch.

This is what I'm currently using, based on the comments
and description of the "Person" entity in FRBR. For the sake
of archival, I'll inline the relevant parts below.

 http://disobey.com/noos/LibDB/?DatabaseSchema#h-table__libdb_person

 NAME
 The name by which the person is known. A name may include one or more
 forenames (or given names), matronymics, patronymics, family names (or
 surnames), sobriquets, dynastic names, etc. A person may be known by more
 than one name, or by more than one form of the same name. A
 bibliographic agency normally selects one of those names as the uniform
 heading for purposes of consistency in naming and referencing the person.
 The other names or forms of name may be treated as variant names for the
 person. In some cases (e.g., in the case of a person who writes under
 more than one pseudonym, or a person who writes both in an official
 capacity and as an individual) the bibliographic agency may establish
 more than one uniform heading for the person. Variant names will
 be treated as relationships.

 GIVEN_NAME
 The name given to a person by their parents, themselves, or otherwise.

 FAMILY_NAME
 The name inherited to a person from themselves,
 their parents, or otherwise.

 LANGUAGE
 The language code associated with a person's name: en, fr, de, etc.

 TITLE
 A word or phrase indicative of rank, office, nobility, honour, etc. (e.g.,
 Major, Premier, Duke, etc.), or a term of address (e.g., Sir, Mrs., etc.).

 DESIGNATION
 A numeral, word, or abbreviation indicating succession within a family
 or dynasty (e.g., III, Jr., etc.), or an epithet or other word or phrase
 associated with the person (e.g., the Brave, Professional Engineer, etc.).

Of the above, only NAME is required. The LibDB project will be
aggregating a lot of names, some of which hasn't been parsed
knowingly (a MARC record is often parsed as "family, given",
but IMDB seems to assume "given family", thus causing programmatic
parsing problems on "Mary Kate Olson" or "Baron von Zychowski").
The breakup of NAME into given and family will allow a manual
cataloguer to separate things accordingly.

Romanization vs. character sets I've not dealt with. From what
I can tell, I'll always be receiving romanized name (due to what
I've seen on IMDb, as well as existing MARC records). Again, this
is not set in stone, nor have I specifically recognized the problem.

Since the data represented in LibDB is from a cataloguing
perspective and not a "user's choice" perspective, the comments
in 1.7 of NamesInFoaf do not apply:

  Most people have a single prefered (sic) form of their name, that
  they use to present themselves with. This is the name that most
  likely occurs on the persons homepage or in citations, and the name
  that fits well within the current definition of the foaf:name property.

The example shows:

  <foaf:name>Mr. John Allan van Doe, Jr.</foaf:name>

which, as per the previously database definition could certainly
be possible in a display (based on the interface's coding), but
would never be stored as such in the database.

Concerning middle names, the LibDB database doesn't specifically
support them, under the impression that if the person doesn't use
it in their day to day output, it's not knowledge worth recording
(neither, for example, would be their shoe size). If people do use
their middle name in their day-to-day life, it'd be considered part
of "given", and represented there and as part of "name".

The first version of the LibDB database design has not specifically
addressed the comments of the Dublin Core, as I just became aware:
http://dublincore.org/documents/name-representation/. Concerning
this though:

 * is there a non-obsolete report concerning it?

 * has anyone tried to match up each example in Appendix
   A to an actual xml:lang attribute? Is xml:lang even
   appropriate to help determine styling rules?

-- 
Morbus Iff ( they should rename controlled chaos to morbus droppings )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] parts

From: Morbus I. <mo...@di...> - 2004-01-20 15:00:25

>Re: the schema, this may not be terribly relevant to movies (though it
>could be depending on one's needs), but one thing I don't see (am I
>just missing it?) is a way to represent parts.

Concerning parts within movies,
you may want to check out ECHO:

  http://pc-erato2.iei.pi.cnr.it/echo/public/
  deliv/D3-1-1%20ECHO%20Metadata%20Modelling.pdf

-- 
Morbus Iff ( oh, i wish i was a hoggle )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] Hashes or Auto Increment IDs?

From: <ar...@uw...> - 2004-01-16 03:18:26





Skipping the OpenURL debate and how it might lead to generic identifier=
s, I
can see great value in having XPath expressions in URLs. For example:

http://somewhere.org/author[.=3D'Morbus Iff']
http://somewhere.org/poems/*/author/@techlevel[.=3D'advanced']

AxKit and Cocoon hint at how protocols like SQL can get wrapped in XPat=
h
expressions, and I suspect that URLs will become even more overloaded w=
ith
identification information. One aspect of XPath that I think gives it a=
n
edge is that there are engines out there to work out XPath expressions,=

including one for JEdit that I have become quite addicted to. Whatever
happens, URLs always seem to grow so having some extra space for this i=
s
probably a good idea.

art=

[libdb-develop] Re: Still Fighting with Movie Expressions

From: <joh...@no...> - 2004-01-16 00:21:02

The problem is that you are trying to reify (nice word, Dan!) a work when,
in fact, there is no such thing. A "work" has no independent existence.
You cannot see a "work", you cannot touch a "work", you cannot experience a
"work". At best a "work" can be seen as a convenient handle for a set of
expressions that purport to be the "same thing".

That said, "Swimfan" is of course a "work", just as "A clockwork orange"
is a "work". How well-known the work is or how many
expressions/manifestations of it there are have NOTHING to do with the
determination that it is a work - it is a work because it is distinct from
other works.

A work does not - cannot - exist independently of an expression of that
work (and an expression cannot exist independently of a manifestation of
that expression). There was no movie "A clockwork orange" before Kubrick
and his associates made it. There was no novel "A clockwork orange" before
Anthony Burgess wrote it. And the act of writing created an item, a
manifestation, an expression and a work. All rolled up in one. Each draft
would be a different expression (and manifestation and item).

So a manifestation is an abstraction of features common to a set of items;
an expression is an abstraction of features common to a set of
manifestations; a work is an abstraction of features common to a set of
expressions. Each of these sets can, of course, be a set of one. In every
case, however, the properties of the manifestation flow from the properties
of the item(s), the properties of the expression flow from the properties
of the manifestation(s) and the properties of the work flow from the
properties of the expression(s). In the simplest case the item,
manifestation, expression and work are coterminous.

My own view of movies in the FRBR world is that the work and expression are
largely co-terminous, and the various ways of publishing (film reels, VHS
cassette, DVD, NTSC vs PAL, director's cuts, etc.) are different at the
manifestation level. Dubbing a film into a foreign language almost
certainly makes a new expression; I'm not so sure about adding subtitles.
This also implies that the novel "A clockwork orange" and the movie "A
clockwork orange" are different (but related) works. It implies that a
"remake" of a movie (eg. Sabrina or King Kong or Planet of the Apes) are
different but related works. A director's commentary would (might?) be a
different, but related work; it is not a different expression of the work
itself. The intellectual property folks, however, take a rather different
position as can be seen from the fact that movie producers pay publishers
money for the right to base a movie on a book. It might be reasonable for
the library community to follow this approach, but so far this has not been
the view taken in the FRBR discussions.

Note that it is not only expressions that are "difficult". The boundary
conditions for all these sets are difficult. We do, however, have more
experience with some (e.g. manifestations) than others. But - suppose a
paperback novel is issued with the words "Now a major motion picture" on
the cover and later the words are changed to "Winner of Oscar for Best
Picture 2003", but everything remains the same - is that a new
manifestation? Or just a minor difference between subsets of items forming
the manifestation? In the handpress era, typographical corrections could
be and were made in the middle of printing a run of sheets. Does such a
change establish a new manifestation? a new expression? The boundary
conditions are subtle and require judgement calls that we can't make
algorithmically with any great accuracy. OCLC's problem was that the
intellectual decisions on what constitutes a manifestation were already
reflected in the data (each bibliographic record represents a
manifestation), and the decisions on what constitutes a work were also
largely reflected in the data (in the form of titles proper and uniform
titles). The intellectual decisions on what constitutes an expression were
not to be found in the data, with the exception of certain subfields in
uniform titles, so machine processing to identify expressions was difficult
and unsatisfactory. The problem isn't so much that the concept of
expression is difficult, but that that we don't have the data to allow the
machine to make reliable determinations.

Johan Zeeman
RLG

Morbus Iff
<morbus@disobey.c To: fr...@in...
om> cc: lib...@li...
Subject: Still Fighting with Movie Expressions
01/15/2004 01:58
PM

I'm still having conceptual problems breaking a part a movie
into WEMI. It's relatively "easy" when the movie is "known",
like with a Kubrick or Spielberg or similar. But, stuff like
SWIMFAN, MOTEL HELL, STACY, etc., I simply can't figure it out.

I originally suspected it was purely of ignorance and the fact
that no one has published their own approaches to movies under
FRBR, but the more I read, the more I see the same sort of thing
over and over again: "expressions are difficult".

Most recently (in my reading, not publication-wise)
is the Humphry Clinker examination by the OCLC:

"While it was possible to identify works and manifestations,
identifying expressions was problematic ... Enhanced manifestation
records where the roles of editors, illustrators, translators, and
other contributors are explicitly identified may be a viable
alternative to expressions ... With the enhanced manifestation
record ... the FRBR model provides a powerful means to improve
bibliographic organization and navigation.

How evil and destructive is it, for the time being, to not
support expressions, at all, within an FRBRized application?

If I have a movie called SWIMFAN, is that a work? It is
a "distinct intellectual or artistic creation", but it is
also the REALIZATION: it's audio/visual committed to film.
The only way I can think to get "one higher" than "realized
through film" is the actual shooting script used. The
audiovisual elements of a shooting script are REALIZED
through the work of many people: the directory, the
cinematographer, etc., etc.

But, the shooting script won't work... because FRBR says:

"By contrast, when the modification of a work involves a
significant degree of independent intellectual or artistic
effort, the result is viewed, for the purpose of this study,
as a new work. Thus paraphrases, rewritings, adaptations for
children, parodies, musical variations on a theme and free
transcriptions of a musical composition are considered to
represent new works."

Taking a script and turning it into film seems like a significant
degree of work to me, so a film would have to be a work. Treating
a film as a work is correct, because FRBR says/infers as much:

"Translations from one language to another, musical
transcriptions and arrangements, and dubbed or subtitled
versions of a film are also considered simply as different
expressions of the same original work."

An expression "exclude[s] aspects of physical form", so I can't
treat the expression of a movie as a DVD release. If I had two
different translations of the movie, I don't think there's a problem:

W1: Swimfan
E1: Swimfan (English language)
M1: The DVD from Paramount Pictures.
E2: Swimfan (German language dub)

The above seems sane to me, and seems like the answer to my
problems. In fact, FRBR says the above is sane in it's description
of the work entity, which I've already snippetted above.

But, I feel the model is "dirty" when I don't have multiple
expressions. If SWIMFAN ONLY had an English translation, then it
"feels" like there's absolutely no difference between work and
expression:

W1: Swimfan
E1: Swimfan (English language)
M1: The DVD from Paramount Pictures

In the above model, there's really no difference, whatsoever, between
the Swimfan work and the Swimfan expression. Is there? Or am I being
too granular? Should I treat WEMI as buckets, with an intended
revisiability and extensibility of "always"? Should I always assume
(nay, hope!) that someone WILL translate SWIMFAN into another
language? Should I, in the face of seemingly duplicity, always
consider "language" the shining difference between a work (where
language is not defined) and expression (where it is)?

I feel like I'm running around in circles on this expression
thing - toeing the line between "yes, that's how you do it!"
and "noOOOo, you've got it alLLLl wrong, bucko!".

Any tips are appreciated.

--
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] Still Fighting with Movie Expressions

From: Bruce D'A. <bd...@fa...> - 2004-01-15 23:55:59

On Jan 15, 2004, at 4:58 PM, Morbus Iff wrote:

> How evil and destructive is it, for the time being, to not
> support expressions, at all, within an FRBRized application?

I may be wrong, but based on my brief look into this, I think of 
expressions as relating to events; particularly performances.  So in my 
previous example, a speech is a work, its performance is an expression, 
and anything derived from that is a manifestation.  The same would 
apply for a play or a musical performance of a Beethoven Symphony, both 
of which could conceivably end up in a movie database.

How's that?

It's fair, I would assume, to have a record that consists of only a 
work and its manifestation.  Am I wrong?

Bruce

[libdb-develop] Still Fighting with Movie Expressions

From: Morbus I. <mo...@di...> - 2004-01-15 21:58:54

I'm still having conceptual problems breaking a part a movie
into WEMI. It's relatively "easy" when the movie is "known",
like with a Kubrick or Spielberg or similar. But, stuff like
SWIMFAN, MOTEL HELL, STACY, etc., I simply can't figure it out.

I originally suspected it was purely of ignorance and the fact
that no one has published their own approaches to movies under
FRBR, but the more I read, the more I see the same sort of thing
over and over again: "expressions are difficult".

Most recently (in my reading, not publication-wise)
is the Humphry Clinker examination by the OCLC:

  "While it was possible to identify works and manifestations,
   identifying expressions was problematic ... Enhanced manifestation
   records where the roles of editors, illustrators, translators, and
   other contributors are explicitly identified may be a viable
   alternative to expressions ... With the enhanced manifestation
   record ... the FRBR model provides a powerful means to improve
   bibliographic organization and navigation.

How evil and destructive is it, for the time being, to not
support expressions, at all, within an FRBRized application?

If I have a movie called SWIMFAN, is that a work? It is
a "distinct intellectual or artistic creation", but it is
also the REALIZATION: it's audio/visual committed to film.
The only way I can think to get "one higher" than "realized
through film" is the actual shooting script used. The
audiovisual elements of a shooting script are REALIZED
through the work of many people: the directory, the
cinematographer, etc., etc.

But, the shooting script won't work... because FRBR says:

  "By contrast, when the modification of a work involves a
   significant degree of independent intellectual or artistic
   effort, the result is viewed, for the purpose of this study,
   as a new work. Thus paraphrases, rewritings, adaptations for
   children, parodies, musical variations on a theme and free
   transcriptions of a musical composition are considered to
   represent new works."

Taking a script and turning it into film seems like a significant
degree of work to me, so a film would have to be a work. Treating
a film as a work is correct, because FRBR says/infers as much:

  "Translations from one language to another, musical
   transcriptions and arrangements, and dubbed or subtitled
   versions of a film are also considered simply as different
   expressions of the same original work."

An expression "exclude[s] aspects of physical form", so I can't
treat the expression of a movie as a DVD release. If I had two
different translations of the movie, I don't think there's a problem:

  W1: Swimfan
   E1: Swimfan (English language)
    M1: The DVD from Paramount Pictures.
   E2: Swimfan (German language dub)

The above seems sane to me, and seems like the answer to my
problems. In fact, FRBR says the above is sane in it's description
of the work entity, which I've already snippetted above.

But, I feel the model is "dirty" when I don't have multiple
expressions. If SWIMFAN ONLY had an English translation, then it
"feels" like there's absolutely no difference between work and
expression:

   W1: Swimfan
    E1: Swimfan (English language)
     M1: The DVD from Paramount Pictures

In the above model, there's really no difference, whatsoever, between
the Swimfan work and the Swimfan expression. Is there? Or am I being
too granular? Should I treat WEMI as buckets, with an intended
revisiability and extensibility of "always"? Should I always assume
(nay, hope!) that someone WILL translate SWIMFAN into another
language? Should I, in the face of seemingly duplicity, always
consider "language" the shining difference between a work (where
language is not defined) and expression (where it is)?

I feel like I'm running around in circles on this expression
thing - toeing the line between "yes, that's how you do it!"
and "noOOOo, you've got it alLLLl wrong, bucko!".

Any tips are appreciated.



-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] parts example

From: Morbus I. <mo...@di...> - 2004-01-15 20:39:01

 >common might be where there are clearly defined parts.  For example,
 >DVDs have titled and numbered scenes.  Or say you have a compilation of

Well, the chapter titles would be placed under a "summarization":

  >A summarization of the content of an expression is an abstract,
  >summary, synopsis, etc., or a list of chapter headings, songs, parts,
  >etc. included in the expression.

To make things more annoying, however, summarization's are only defined
on an EXPRESSION of a movie. Rarely does a movie have chapters in the
filmic visual/audio expression (the only one I can think of recently
is KILL BILL VOLUME 1); they only show up once it gets to the
manifestation stage.

In LibDB, I've worked around this by making "summarization" an
"annotation". An annotation can be defined against any entity
(including people, corporations, and bodies).


-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] parts example

From: Bruce D'A. <bd...@fa...> - 2004-01-15 20:26:05

On Jan 15, 2004, at 3:09 PM, Morbus Iff wrote:

> The main problem that I see it, though, is that an article in a serial 
> CAN be considered a work all by itself - it doesn't take a huge leap 
> of faith to make that association. But, I rarely find myself think of 
> scenes in a movie as individual, stand-alone entities: it's difficult 
> to just pick 35 seconds of a movie and say "this stands alone".
>
> "this stands alone" seems to be a required assumption of FRBR, though 
> they do mention "aggregate works", and that may be key to what we're 
> trying to solve. If a movie is considered an aggregate work of images 
> (it is, technically, frame by frame), then we've suddenly got logic on 
> our side to take a 35 second piece of scenery and call it a work (or a 
> "part").

I'm too tired and busy to think hard about the details of your example 
just now (I didn't think a book or journal could be an expression, 
though am not sure), but in general:

Yes, librarians aren't used to thinking much about parts.  MODS didn't 
have that structure until I pointed out the problem and managed to 
convince a few people why they should care.  For my needs they're 
essential: chapters, articles, and legal cases are all parts.

But with respect to movies, obviously there could be some key scenes 
that might be catalogued as a part.  That'd be rare I imagine.  More 
common might be where there are clearly defined parts.  For example, 
DVDs have titled and numbered scenes.  Or say you have a compilation of 
television episodes on a single tape or DVD. In my example, I was 
imagining a video that had collected speeches.

Bruce

Re: [libdb-develop] parts example

From: Morbus I. <mo...@di...> - 2004-01-15 20:09:31

 >OK, I'm trying to wrap my head around FRBR and the parts stuff.
 >Schematically, is this right, using an example of a speech?

I'm still new with FRBR too, so take this with a grain of salt.

  ----------------------------------------------------
  work1           = a speech title and creator
  expression1     = performance place and date
  manifestations1 = text

  Relationship: manifestation1 [isPartOf] work2
                (parts details = volume, issue, pages)
  ----------------------------------------------------
  work2           = academic journal
  expression2     = academic journal volume, issue
  manifestation2  = text (pages, etc.)

  Relationship: work2 containsPart work1
  ----------------------------------------------------

That's the same thing you said (I think),
just more verbose. To me, that looks right.

The main problem that I see it, though, is that an article in a serial CAN 
be considered a work all by itself - it doesn't take a huge leap of faith 
to make that association. But, I rarely find myself think of scenes in a 
movie as individual, stand-alone entities: it's difficult to just pick 35 
seconds of a movie and say "this stands alone".

"this stands alone" seems to be a required assumption of FRBR, though they 
do mention "aggregate works", and that may be key to what we're trying to 
solve. If a movie is considered an aggregate work of images (it is, 
technically, frame by frame), then we've suddenly got logic on our side to 
take a 35 second piece of scenery and call it a work (or a "part").

  ----------------------------------------------------
  work1           = swimming scene with Kari Wuhrer
  expression1     = 15 seconds of filmed footage.
  manifestations1 = film

  Relationship: manifestation1 [isPartOf] work2
                (parts details = duration)
  ----------------------------------------------------
  work2           = Final Examination (movie)
  expression2     = Final Examination (YYYY; movie)
  manifestation2  = Final Examination (DVD)

  Relationship: work2 containsPart work1
  ----------------------------------------------------
  work3           = Poison (movie)
  expression3     = Poison (YYYY; movie)
  manifestation3  = Poison (DVD)

  Relationship: work3 containsPart work1
  Relationship: work3 sharesPart work2
  ----------------------------------------------------

Is that we're you're getting at?

I had no responses to my similar query about parts in FRBR, but I think 
your mapping is a lot clearer that whatever gobbledygook I sprouted off to 
them. If the above is where you were heading, let me know and I'll pop it 
over to the FRBR list and face the silence again.


-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] parts example

From: Bruce D'A. <bd...@fa...> - 2004-01-15 19:13:11

OK, I'm trying to wrap my head around FRBR and the parts stuff.  
Schematically, is this right, using an example of a speech?

work = 	a speech title and creator
expression = 	performance place and date
manifestations = 	text [isPartOf] academic journal (parts details = 
volume, issue, pages)
				moving image [isPartOf] video compilation (parts details = ______ )

Bruce

[libdb-develop] Hashes or Auto Increment IDs?

From: Morbus I. <mo...@di...> - 2004-01-15 18:44:51

In a lot of the tables within LibDB, I use a 20 character alphanumeric ID 
to uniquely identify an item. One of the prime reasons for this was to 
create unique URLs, something like:

    /person/129387123kj1h23/
    /expression/129387123987/
    /concept/1237192387193/

and so forth. Going to that URL would give you information about the data 
being described, as you'd expect. I've been using these 20chars on
everything that I felt people would want to look at: a list of
identifiers ("create a list based on Artisan's cataloging ID"),
people, etc., etc.

I've NOT been using 20chars when it came to, what I felt, were
database only associations, namely the relationships. The 23rd
relationship would have id#23 and so forth.

So, here's the question: should I use 20chars on everything? What
if, programmatically, a script needed to have a relationship (and
only one relationship) described in XML or SQL or whatever else?

    /relationship/1237192387193/sql
    /relationship/1237192387193/mods
    /relationship/1237192387193/n3

Would that ever be useful to people? Would that ever be useful to
programming or a web service? Should I just bite the bullet now
and use char20's for every table in the database?

Thoughts appreciated.




-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

[libdb-develop] Group 1 Entities Mapped To SQL; Comments Please

From: Morbus I. <mo...@di...> - 2004-01-15 01:45:56

Hey there. I've recently committed the first "proper" draft of Group 1
entities and some sample data to go along with it, to CVS. To recap, FRBR
Group 1 covers work, expressions, manifestations, and items.

I need comments. I'll be starting Group 2 (person and corporate body,
probably the most time consuming of this process) sometime tomorrow.
If you don't have CVS set up on your machine, you can view the two
files through the following URL:

 http://cvs.sourceforge.net/viewcvs.py/libdb/LibDB/databases/

You should be looking at Rev 1.5 (which should be there shortly,
compensating for the lag of the web cvs servers). mysql_sample.sql
is a "real world" mapping of piss-poor movie THE POOL, and
mysql_schema.sql is the database schema that describes it.

For now, ignore the ?DatabaseSchema documentation that is on the
wiki. It is a day out of date, which is a lot more than you think.
I'll be updating it once I get these two files complete with
Group 2 and Group 3 entities.

Some specific areas to look at are the relationships defined.
In the current schema, there are (way at the bottom), six
relationships defined; three come from FRBR and are
well-defined:

  $work           is realized through    $expression
  $expression     is a realization of    $work

  $expression     is embodied in         $manifestation
  $manifestation  is an embodiment of    $expression

  $manifestation  is exemplified by      $item
  $item           is an exemplar of      $manifestation

You can tell this is a hierarchy. However, three relationships
were defined explicitly by me to cover some of the attributes
of the various entities above. That's what I need sanity
checking on (both in the terms used to link the relationship,
and in the actual SQL implementation and layout):

  $entity	  is also named		 $name
  $name           is a variant name of   $entity

  $entity         is summarized with     $annotation
  $annotation     is a summarization of  $entity

  $annotation     is summarized by         $entity
  $entity         is a summarization from  $entity

The last two relationships can be used validly to say:

 "THE POOL is summarized with [this annotation]";
  "[This annotation] is summarized by [this author]";

but, in the sample data, the summarization actually comes
from the back of the DVD. So, in essence, another example is:

 "THE POOL is summarized with [this annotation]";
 "[This annotation] is summarized by [the DVD manifestation]";

Thoughts on that? Don't hesitate to ask questions. I'm also
posting this to the FRBR list, but that list is incredibly
quiet, so I'm not sure I'll get concrete responses back.

-- 
Morbus Iff ( in japan, i'm known as a puchi-iede. )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] Dates In An RDBMS - Years Only?

From: Morbus I. <mo...@di...> - 2004-01-14 18:28:42

 >I'm with Ed.  I prefer not to constrain date
 >representation to simply year anywhere.

YYYY-MM-DD it is!


-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

Re: [libdb-develop] Dates In An RDBMS - Years Only?

From: Bruce D'A. <bd...@fa...> - 2004-01-14 18:06:43

On Jan 14, 2004, at 12:23 PM, Morbus Iff wrote:

> >I guess I don't see it as particuarly evil to have 1999-00-00 in the 
> db, if
> >the display code knows that it needs to reformat the date. Especially
> >since 1999-07-04 may have to be reformatted as well :)
>
> Fair enough. Anyone else want to weigh in? Bruce?

I'm with Ed.  I prefer not to constrain date representation to simply 
year anywhere.

Bruce

Flat | Threaded

<< < 1 .. 6 7 8 9 > >> (Page 8 of 9)

2004	Jan (48)	Feb (58)	Mar	Apr (1)	May	Jun	Jul (29)	Aug (36)	Sep (5)	Oct (1)	Nov (32)	Dec (1)
2005	Jan	Feb (4)	Mar	Apr (2)	May (2)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (3)	Sep	Oct	Nov	Dec