[libdb-develop] Re: libdb and FRBR

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'll be CCing this to the libdb-develop list at Sourceforge
sans any identification. If you want jump on board and
introduce yourself, responding to my comments, have a blast.

 >My understanding of FRBR, as a spec and meme, is that they are just what
 >they claim -- functional requirements, but for users, not systems.

Exactly. It's a concept model, not a data model.

 >Seeing that you've modelled its core concepts as literal tables, I'm
 >wondering what your processing model will be, and how you're planning to
 >do basic object storage.

Heh, heh. Why don't you talk to uh, my um, marketing department <g>.

 >Hmm, let me try that again:  How do you plan to populate the FRBR
 >primitives tables (work, expression, etc.)?  Are you sucking in
 >MARC/MODS/etc records and running a local implementation of OCLC's FRBR
 >algorithm on them?  If so, are you discarding the original source
 >metadata?  Will you count on users to identify the relationships, or will
 >it be semi-/fully-automated?

Aaaahh. Much better! <g>

Yes, there will be aggregated data. This shouldn't be too surprising,
since my latest book has been O'Reilly's SPIDERING HACKS. The first
type of data I'm *specifically* attacking for LibDB is movies, but
if you looked at the database tables without that knowledge, it
may not be immediately obvious (ie. the database tables are not
dependent on an implied media). As such, data would be sucked
down from IMDb, but for books and other standard librarian
stuff, it'd be sucked in through (whatever formats LibDB
supports, which could be MARC, MODS, etc.).

The original source metadata would be discarded. LibDB will
support export formats (in a RESTian URL structure), such that
you'd be able to get data as RDF, MARC, FOAF, etc., etc.

With that in mind, the planned workflow for movies:

  1) User types in movie name and year.
  2) User gets back either:

     a) the matching movie from IMDb, split up in a
        giant form that doesn't mention any FRBR terms.

     b) a list of matching movies, to which they'd choose
        the right one, and be faced with a), above.

  3) user verifies all information.

There's a heckuva lot missing between 2a and 3, and that's mainly all 
interface/forms. I don't have any plans to mention the term "relationships",
whatsoever. LibDB will handle all the core relationships implicitly:
it will create the work/expression relationship based on the data
sucked down, and the expression/manifestation/item relationships
based on user data ("i own the dvd, it's in the third box, and
I thought the movie sucked").

Relationships with Group 1 and Group 2 entities (for movies, cast,
crew, and companies) is handled automatically within the code.
The user will merely see a list of all the people who starred in
the movie, all the people/companies who worked on the movie,
and they'll have the option of choosing which info they want
to save into the database (though, I'm up in the air on that
one), as well as the ability to override any of the "roles"
relationships.

Now again, I won't mention "roles" at all.
The interface would look something like:

  "Julia Roberts"         Cast Member  "CharacterName"
  "Something Someone"     Crew Member  [ "2nd Post Production Assistant" ]
  "Artisan Entertainment"              [ "Distributor" ]

In this example, the [] indicates a select/popup box, and "2nd Post 
Production Assistant" is the data received from IMDb. The user would
be able to (as I would) pick the more generic "Post Production
Assistant" from that select box. The select box is populated
with all the roles the database knows of (in a future version of
the database, roles will be associated with an authority/form,
so that if you were adding a "book", you wouldn't see "Post
Production Assistant", and if you were adding a "film", you'd
see "Titles" instead of "Typesetter").

Likewise, Group 3 entities would be defined as relationships, but
to the end user, they'd just see a big text box that says "Enter
Concepts, one per line", "Enter Events, one per line". I'm still
debating on having a popup of known Concepts, Events, and just
having the user pick from a dozen possible popups (along with
write-ins).

Once the user has gone through all the data, making changes where
they'd see fit, data verification would occur. This is largely
grey area at the moment, but stuff like this would happen:

   "The concept 'Murder' exists, and has been assigned."
   "The event 'Sherwood Forest' did not exist, and has been assigned."

   "You already have a person in the database named 'Julia Roberts'.
    Is this the same 'Julia Roberts' that was involved with:

         * Work 1 (ie. movie 'Erin Brockavich')
         * Work 2 (ie. movie 'Runaway Bride')"

and so on.

Of course, at some point, users will want to more granularly define
relationships. They may want to "create a relationship type" called
"Sister", and then "make a relationship" between "person Mary Kate" and
"person Ashley". Those sorts of relationships can't be implied easily
from any data that currently exists. However, once they're created, the
relationship becomes usable to other application (ie. when a user exports
either of those sisters as RDF or FOAF data).

Does this answer your questions?

-- 
Morbus Iff ( i put the demon back in codemonkey )
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus