libdb-develop Mailing List for LibDB (Page 9)
Status: Inactive
Brought to you by:
morbus
You can subscribe to this list here.
2004 |
Jan
(48) |
Feb
(58) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(29) |
Aug
(36) |
Sep
(5) |
Oct
(1) |
Nov
(32) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
|
Feb
(4) |
Mar
|
Apr
(2) |
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Morbus I. <mo...@di...> - 2004-01-14 17:24:35
|
>Is it forseeable that you might have a date assigned to an item, like >Jimi Hendrix Live At Woodstock? If not, then I guess this distinction could That'd be a date assigned to an expression (since, regardless of CD or cassette, it's the same date at woodstock), and in that regard, it'd just be the year itself. >I guess I don't see it as particuarly evil to have 1999-00-00 in the db, if >the display code knows that it needs to reformat the date. Especially >since 1999-07-04 may have to be reformatted as well :) Fair enough. Anyone else want to weigh in? Bruce? -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Ed S. <eh...@po...> - 2004-01-14 16:58:25
|
On Wed, Jan 14, 2004 at 11:28:58AM -0500, Morbus Iff wrote: > * all dates not related to a person or corporate body are years. > * all dates related to a person or corporate body must be yyyy-mm-dd. > > In MySQL, at least, there's no "proper" way to represent a YYYY or > YYYY-MM-DD in *one* column: we'd end up with stuff like "1999-00-00", which > seems rather evil to me (or is it? let me know your thoughts.) Is it forseeable that you might have a date assigned to an item, like Jimi Hendrix Live At Woodstock? If not, then I guess this distinction could work. I guess I don't see it as particuarly evil to have 1999-00-00 in the db, if the display code knows that it needs to reformat the date. Especially since 1999-07-04 may have to be reformatted as well :) //Ed |
From: Morbus I. <mo...@di...> - 2004-01-14 16:29:12
|
For purposes explained below: * all dates not related to a person or corporate body are years. * all dates related to a person or corporate body must be yyyy-mm-dd. In MySQL, at least, there's no "proper" way to represent a YYYY or YYYY-MM-DD in *one* column: we'd end up with stuff like "1999-00-00", which seems rather evil to me (or is it? let me know your thoughts.) -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-14 15:17:21
|
I init'd and imported the LibDB CVS today: Modulename: LibDB CVSROOT: cvs.sourceforge.net:/cvsroot/libdb Currently, there are only two blank holder files: /databases/mysql_schema.sql /databases/mysql_sample.sql I'll be working on these throughout the day, translating one of my horror movies into the layout (representative because I've watched it recently and have written a review for it). These files will become representative once they're finalized, and will be linked from the wiki to the CVS web output (which lags behind a day or so). -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-14 13:00:25
|
On Jan 14, 2004, at 7:43 AM, Morbus Iff wrote: >> BTW, I posted this example of parts handling in MODS on my blog. It's >> generic enough the handle movie scenes and journal articles (in part >> because of the non-controlled attribute values on the part > > Know Perl at all? No, I can't program at all, save for a bit of XSLT. One of these days (probably after I get tenure!), I'll learn to script though. > I'll probably be calling on you to write an output > handler for MOD. Or, at the very least, with a set of data, marking > it up and then me writing the output handler. That I can do. I've been working with this guy in this way, who is rewriting his conversion tools based on MODS: http://www.scripps.edu/~cdputnam/software/bibutils.html When some time opens up, I'll try to post an archive of example records for people to work with. >> <relatedItem type="host"> >> <titleInfo> >> <title>Journal of Interdisciplinary History > > Missing end-tag. Ah, crap. The original source is correct; it's just that the HTML from which I pasted the example is (incorrectly) generated by vim and I missed a few details cleaning it up. I'm sure you get the idea though ;-) Bruce |
From: Morbus I. <mo...@di...> - 2004-01-14 12:44:49
|
>> So, something like: >> >> E1: Final examination (directed by some guy) >> EN1: 23:45 - 24:15, swimming pool dive by Kari Wuhrer >> Relationship: EN1 PartAppearsIn E2. > >To keep the language generic, maybe call that "isPartOf" a la DC? Incidentally, in thinking of this last night, these shouldn't be based around the expression ("movie"), but rather manifestations ("dvd", "vhs"). A different cut of the movie often appears on the DVD compared to the tape, so the timestamps will be different. -- Morbus Iff ( relax have a happy meal ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-14 12:43:20
|
>> E1: Final examination (directed by some guy) >> EN1: 23:45 - 24:15, swimming pool dive by Kari Wuhrer >> Relationship: EN1 PartAppearsIn E2. > >To keep the language generic, maybe call that "isPartOf" a la DC? Soo, EN1 IsPartOf E2; E2 HasPart EN1. >BTW, I posted this example of parts handling in MODS on my blog. It's >generic enough the handle movie scenes and journal articles (in part >because of the non-controlled attribute values on the part Know Perl at all? I'll probably be calling on you to write an output handler for MOD. Or, at the very least, with a set of data, marking it up and then me writing the output handler. > <relatedItem type="host"> > <titleInfo> > <title>Journal of Interdisciplinary History Missing end-tag. > <detail type="volume"><number>31</detail> Missing end-tag. > <detail type="issue"><number>2 > <extent unit="page"> > <start>259 > <end>260 Ok, now you're just getting lazy ;) -- Morbus Iff ( relax have a happy meal ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-14 11:36:04
|
On Jan 14, 2004, at 12:39 AM, Morbus Iff wrote: > So, something like: > > E1: Final examination (directed by some guy) > EN1: 23:45 - 24:15, swimming pool dive by Kari Wuhrer > Relationship: EN1 PartAppearsIn E2. To keep the language generic, maybe call that "isPartOf" a la DC? BTW, I posted this example of parts handling in MODS on my blog. It's generic enough the handle movie scenes and journal articles (in part because of the non-controlled attribute values on the part subelements). <relatedItem type="host"> <titleInfo> <title>Journal of Interdisciplinary History </titleInfo> <typeOfResource>text</typeOfResource> <originInfo> <dateIssued>2000</dateIssued> <issuance>continuing</issuance> </originInfo> <genre>periodical</genre> <part> <detail type="volume"><number>31</detail> <detail type="issue"><number>2 <extent unit="page"> <start>259 <end>260 </extent> <extent unit="paragraph"> <list>21, 24</list> </extent> </part> </relatedItem> Bruce |
From: Morbus I. <mo...@di...> - 2004-01-14 05:39:30
|
CCing to the FRBR list... >Re: the schema, this may not be terribly relevant to movies (though it >could be depending on one's needs), but one thing I don't see (am I >just missing it?) is a way to represent parts. I can certainly see this being relevant from an FRBR point of view. One of the big things of FRBR is relationships: how an expression (a movie) is related to its manifestation (a DVD), how an expression (a movie) is related to its work (the shooting script) and so forth. But, I can rattle off, and I'm sure a lot of other people can too, movies (expressions) that contain scenes of other expressions (movies - am I flipflopping too much?). For instance, there's one entire scene ripped from POISON that is shown in FINAL EXAMINATION. Having something more than just a generic "note" would be helpful, especially when the same scene is repeated a number of times (most notably, car explosions used in Lloyd Kaufman and Roger Corman flicks; "show me all movies that contain this expression[ette]"). I'm not sure how to properly represent this within FRBR - at most, portions of an expression (a paragraph, an article, a slice of time) would be related to another expression (E1 containsPartOf E2; E2 hasPartAppearingIn E1), but that still (at least, in the current schema) doesn't give us extent and other helpful info. From a schema point of view, I could see a new "expressionettes" mini-entity (/database table) that could segregate an unique expression (note, that an expressionette would NOT break apart things that are already separate expressions, like a DVD and it's liner notes, or an English book and it's study guide). So, something like: E1: Final examination (directed by some guy) EN1: 23:45 - 24:15, swimming pool dive by Kari Wuhrer Relationship: EN1 PartAppearsIn E2. E2: Poison (directed by some guy) Relationship: EN2 containsPart EN1. -- Morbus Iff ( i think the "good book" is missing some pages ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-14 04:06:33
|
Re: the schema, this may not be terribly relevant to movies (though it could be depending on one's needs), but one thing I don't see (am I just missing it?) is a way to represent parts. Example: I want to represent a scene in a movie, or an article in a periodical, complete with extent identifiers (a time range for the first, and maybe standard volume, issue, date, page numbers for the second). There's been a bit of discussion on my blog about this... Bruce |
From: Bruce D'A. <bd...@fa...> - 2004-01-13 03:15:26
|
On Jan 12, 2004, at 6:59 PM, Morten Frederiksen wrote: > In the FOAF project we'll likely soon(tm) be revisting the subject of > names, > and also how to model the various levels of ownership and "levels" of > a work. Hmm...this sounds intriguing. Can you explain the last bit? Sounds like FOAF is treading into the realm of bibliographic metadata ;-) > ... perhaps a long-term project goal could be to have users > world wide contribute to getting all movie-people's names "marked up" > within > a model like the above proposes... Also interesting. One issue is that of what the library community refers to as authorized data, of which there are two aspects: 1) How to ensure names are properly marked up in the distributed world of something like FOAF? 2) I wonder if there needs to be some communication with the library world on this? There is a lot of effort (and money) invested in the controlled name data that the library community refers to as "authority records." The LoC is even working on an XML Schema for these data. I wonder if there's an argument to be made not only that such data ultimately ought to be served on the web, but that perhaps it might be designed to be RDF compatible? Bruce |
From: Morten F. <mo...@mf...> - 2004-01-12 23:58:03
|
Hi all, This just a quick note to introduce myself: Morten Frederiksen, Denmark:=20 Freelance developer etc., mostly working with RDF in general and FOAF... On Monday 12 January 2004 21:00, Bruce D'Arcus wrote: > See this for an RDF-specific discussion: > http://rdfweb.org/topic/NamesInFoaf =2E.. where the issues of names came up (again) last year, resulting in t= he=20 above draft proposal. Bruce pointed me to this project from a post of his, but it turns out I'd= =20 already heard a bit about it, and have run into Morbus before. In the FOAF project we'll likely soon(tm) be revisting the subject of nam= es,=20 and also how to model the various levels of ownership and "levels" of a w= ork.=20 I shall soon be reading up on FRBR... On the names issue, I think it's best in the long run to get something=20 "right", not just stick to first name and last name, western idiosyncrasi= es.=20 However, I do realize that it's not at the top of the priority list of th= is=20 project, and that getting initial data into it may be impossible without=20 simplification, but perhaps a long-term project goal could be to have use= rs=20 world wide contribute to getting all movie-people's names "marked up" wit= hin=20 a model like the above proposes... Anyway, I'm currently busy with other stuff, but will follow this list an= d=20 project (there's obviously some overlap with the FOAF project), hopefully= =20 being able to contribute something sometime. Regards, Morten |
From: Morbus I. <mo...@di...> - 2004-01-12 22:46:55
|
> W1: Texas chainsaw massacre (directed by Tobe Hooper) > E1: Texas chainsaw massacre (directed by Tobe Hooper) > E2: Texas chainsaw massacre (remake; directed by some other guy) Actually, a thought. In the realm of movies, perhaps, is the writer/director a decent schism for authority? That'd give some ability to distinguish W/E: W1: Texas chainsaw massacre (written by PERSON1) E1: Texas chainsaw massacre (original, directed by PERSON2) E2: Texas chainsaw massacre (remake, directed by PERSON3) Granted, some writer's are their own directors, but this does bring a clearer definition of the evolution from "intellectual or artistic creation" (ie., the words that make up a movie script) to the realization of this work (the final movie itself). I'll let it bounce in my head some more, but this seems even clearer. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-12 22:42:11
|
At 05:14 PM 1/12/2004, Morbus Iff wrote: >> ECHO: >> >> W: 2001: A space odyssey >> E1: The film "2001: A space odyssey" by Stanley Kubrick >> M1: The 35 mm format >> M2: The DVD from Paramount (or wherever). >> I1: The one I own, located in box 17. >> >> Martha M. Yee: >> >> If you look at FRBR itself, the work (W) would be the film 2001: a space >> odyssey; Arthur C. Clarke's original short story (Sentinel) would be a >> related work, as would Clarke's 1999 science fiction novel that came out >> after the film. An example of an expression of the film might be a DVD >> version with an audio commentary by Kubrick or some such... >> >> My meandering: >> >> W1: 2001: A space odyssey (film) >> E1: 2001: A space odyssey (directed by Stanley Kubrick) >> M1: The DVD from Paramount (or wherever). >> I1: The one I own, located in box 17. I think I'm going to end up going with my approach: W1: Texas chainsaw massacre (film) E1: Texas chainsaw massacre (original; directed by Tobe Hooper) M1: The VHS from Paramount (or wherever). M2: The special edition DVD from Paramount. E2: Texas chainsaw massacre (remake; directed by some other guy) M1: The DVD from New Line. W1: Texas chainsaw massacre (comics) E1: Texas chainsaw massacre (written by person). etc., etc. Any thoughts on this? The one downside, as mentioned previously, is the lack of authority for the Work, but I can't think of any immediate way that would remove the seeming duplication of the Expression: W1: Texas chainsaw massacre (directed by Tobe Hooper) E1: Texas chainsaw massacre (directed by Tobe Hooper) E2: Texas chainsaw massacre (remake; directed by some other guy) -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-12 22:14:19
|
For those who know FRBR, what approach should we take? > ECHO: > > W: 2001: A space odyssey > E1: The film "2001: A space odyssey" by Stanley Kubrick > M1: The 35 mm format > M2: The DVD from Paramount (or wherever). > I1: The one I own, located in box 17. > > Martha M. Yee: > > If you look at FRBR itself, the work (W) would be the film 2001: a space > odyssey; Arthur C. Clarke's original short story (Sentinel) would be a > related work, as would Clarke's 1999 science fiction novel that came out > after the film. An example of an expression of the film might be a DVD > version with an audio commentary by Kubrick or some such... > > My meandering: > > W1: 2001: A space odyssey (film) > E1: 2001: A space odyssey (directed by Stanley Kubrick) > M1: The DVD from Paramount (or wherever). > I1: The one I own, located in box 17. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morbus I. <mo...@di...> - 2004-01-12 20:28:47
|
>http://rdfweb.org/topic/NamesInFoaf Indeed. Although the site seems to be down now (danbri mentioned he was changing servers recently - may be just a side effect of that), he and I have had discussions about it in the past. Once this site comes back up, I'll see what I can see, but the current plan is to: Define person with the following fields: name = full name, unparsed, undetermined. given_name = as you'd expect, "Morbus". family_name = as you'd expect, "Iff". lang_name = corresponds to xml:lang. This gives us: * the flexibility for aggregators to do their work with unparsed name without user intervention. * manual labor to specify (or modify) the given and family's during manual addition or standard editing practices. * support for eventual serialization and specific cultural understanding with xml:lang. Thoughts? Incidentally, I swear to God, I'm gonna start working on those pages again tonight. Been very hectic with article writing and a bunch of other jazz, so I didn't get much (ok, ok, any) stuff done last week. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-12 20:00:47
|
Ed asked: > Probably one of the main reasons why Terry Rawlings is not in a 700 is > because someone was going to have to look up an authority record for > them > and couldn't be bothered! I think that library cataloging standards > are often > counter productive, and that using Amazon and IMDB as models > eventhough all we > have is their interface, and we will have to guess at the underlying > framework, > a reverse engineering process of sorts. > > So what was the question again? :) Answer: how to properly model people and their names? It's a big issue -- which is exactly why there are authority records to begin with -- particularly if you're interested in bibliographic metadata for my (scholarly) needs. Yes, I know, this is starting as a movie DB; but why not be ambitious? See this for an RDF-specific discussion: http://rdfweb.org/topic/NamesInFoaf Bruce |
From: Morbus I. <mo...@di...> - 2004-01-07 21:13:25
|
[Note, this is a copy of a message I've sent to an FRBR mailing list. I'll copy any relevant replies back to this mailing list for archival.] Good day all. I'm a new subscriber, working on an as-yet-unannounced open source project based on FRBR. I'm new to FRBR, and I've some questions concerning your feelings, thoughts, and (possibly) in-use techniques. I want to FRBRize movies, where carrier doesn't matter. The amount of data I'd like to catalog is analogous to the IMDB: as much as possible, with a minimal of five (top-billed or not) cast members of the movie. ---- Question: what do you call a "movie"? To say I want to "index movies" immediately restricts me to merely movies - it leaves out documentaries, television series, cartoons, broadcasts of events (the Oscar's, etc.). Similarly, "film" is not perfect, because you have "digital video" elements that have never seen celluloid at all. "Video", perhaps, is the most applicable, though it seems to give off a taint of the ancients: "video is dead, long live DVDs". So, what do you collectively call movies, documentaries, cartoons, and TV shows? ---- Question: is anyone actually indexing movies? Is anyone then providing that data for public use? My initial reason for joining the list was to scour the archives for answers, but the helpful moderators/owners pointed me to the following document concerning the ECHO Metadata implementation: http://pc-erato2.iei.pi.cnr.it/echo/public/ deliv/D3-1-1%20ECHO%20Metadata%20Modelling.pdf where it breaks down (loosely transcribed from p13): W: 2001: A space odyssey E1: The film "2001: A space odyssey" by Stanley Kubrick M1: The 35 mm format M2: The DVD from Paramount (or wherever). I1: The one I own, located in box 17. Should expressions always contain the director? Should "the film" be replaced with "the tv series", "the documentary", "the cartoon", etc.? ---- Question: is anyone, movie or not, cataloging Character, which presumably have the same information as a Person? If you are, how are you distinguishing Corporate Body's from fictional bodies? How are you distinguishing a Person who is playing a Character that is based on a real Person? I'd love to be able to say "show me all the items I have that haveCharacter Sherlock Holmes". For now, that's it. Don't want to overstay my welcome with a 15k email. ;) -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: <ed-...@in...> - 2004-01-07 19:41:12
|
Hi folks: On Wed, Jan 07, 2004 at 02:25:20PM -0500, Bruce D'Arcus wrote: > I just looked in my library catalog for the Blade Runner video, which > returned this MARC record. Seems key names are parsed (see the 700 > fields), and other names (e.g. cast) are not: > > 001 15494415 > 005 19971103080721.0 > 007 vfucbaho- > 008 870406s1986 cau122 g vleng dcgmIa > 040 IJC|cIJC|dDLM > 049 MIBN > 090 PN1997|b.B592 1986 > 245 00 Blade runner|h[videorecording] /|cThe Ladd Company > 260 Los Angeles :|bEmbassy Home Entertainment,|cc1986 > 300 1 videocassette (122 min.) :|bsd., col. ;|c1/2 in > 500 VHS format > 500 "VHS hi-fi stereo 1380." > 500 "Mono compatible". --- container > 500 Based on the novel Do androids dream of electric sheep? by > Philip K. Dick > 500 Winner of: Los Angeles film critic's award, RIAA and ITA > video awards, 3 British academy awards, 2 academy award > nominations > 500 Videocassette release of the 1982 motion picture by The > Ladd Company, originally rated R > 500 Miami University's MCIS video collection > 508 Producer, Michael Deeley : director, Ridley Scott ; > screenplay, Hampton Fancher and David Peoples ; visual > effects, Douglas Trumbull ; original music, Vangelis ; > photography, Jordan Cronenweth ; editor, Terry Rawlings > 511 1 Harrison Ford, Rutger Hauer, Sean Young, Edward James > Olmos, Daryl Hannah > 520 A futuristic tale set in the Los Angeles of 2020 > 520 Directed by Ridley Scott with Harrison Ford, Rutger Hauer, > Sean Young and Edward James Olmos. Harrison Ford is at his > suspenseful best as he takes you on a frightening, > futuristic detective mission to track down and eliminate > four renegade "replicants", genetically engineered humans > of superior strength and intelligence. Riveting visual > effects reflect the bleakness of a world winding down. > It's like nothing you've ever seen. (Tamarelle) 1982 R > 650 0 Science fiction films > 650 0 Feature films > 700 1 Ford, Harrison,|d1942- > 700 1 Hauer, Rutger,|d1944- > 700 1 Dick, Philip K.|tDo androids dream of electric sheep? > |h[videorecording] > 700 10 Scott, Ridley > 710 2 Ladd Company > 710 2 Embassy Home Entertainment (Firm) > 830 0 MCIS video collection ;|vFS-9 > 947 upd df Which means that if you are looking for films which Terry Rawlings is credited in you better hope you have a key word index on the 508 :) Whereas in IMDB you have a much nicer granularity: http://www.imdb.com/name/nm0712625/ Probably one of the main reasons why Terry Rawlings is not in a 700 is because someone was going to have to look up an authority record for them and couldn't be bothered! I think that library cataloging standards are often counter productive, and that using Amazon and IMDB as models eventhough all we have is their interface, and we will have to guess at the underlying framework, a reverse engineering process of sorts. So what was the question again? :) //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org |
From: Morbus I. <mo...@di...> - 2004-01-07 19:32:11
|
>OK, fair enough; it's not exactly "easy" ;-) > >Part of what you're observing, though, is not a data issue, but an >interface issue Of course they're related, but not so closely that we >can't disentangle them (perhaps?). A stop-gap measure would be support "name, given, family, lang" all at once in the database: with "name" being useful for aggregated, foreign, or unknown parses, and "given" and "family" for instances when they're known. >I know nothing about the formats of either of these. What's the >problem that makes it "impossible"? Maybe there are library sources of >MARC records you could source? I've yet to find a great selection of MARC records for movies (which source did your library pull it's MARC from?), much less movies full of foreign names (perhaps you could try poking around for CITY OF LOST CHILDREN? STACY? AUDITION? As for IMDB, there is no parsing: http://imdb.com/title/tt0368296/. >I just looked in my library catalog for the Blade Runner video, which >returned this MARC record. Seems key names are parsed (see the 700 >fields), and other names (e.g. cast) are not: Annoyingly, the client will want to include cast lists in the database <g>. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-07 19:23:40
|
On Jan 7, 2004, at 1:54 PM, Morbus Iff wrote: > >This isn't a problem though. Just attach an xml:lang attribute to > such > >names. This is part of the reason to use family and given rather > than > > But, still: in my head, it's not that easy. If my client (a movie > rental store) is adding in cast information, it assumes a great deal > of knowledge for him: that he's going to know what nationality a name > is coming from (he has roughly 2000 foreign films), as well as how > to properly split the given and family depending on that nationality > (as well as being able to spot errors from people who don't and have > placed it on the box cover wrong, in IMDB wrong, etc., etc., which > happens very, very frequently). OK, fair enough; it's not exactly "easy" ;-) Part of what you're observing, though, is not a data issue, but an interface issue Of course they're related, but not so closely that we can't disentangle them (perhaps?). > Likewise, it also assumes that extra information is out there > somewhere for me to aggregate: basing some data entry from IMDB or > Amazon would be impossible. I know nothing about the formats of either of these. What's the problem that makes it "impossible"? Maybe there are library sources of MARC records you could source? I just looked in my library catalog for the Blade Runner video, which returned this MARC record. Seems key names are parsed (see the 700 fields), and other names (e.g. cast) are not: 001 15494415 005 19971103080721.0 007 vfucbaho- 008 870406s1986 cau122 g vleng dcgmIa 040 IJC|cIJC|dDLM 049 MIBN 090 PN1997|b.B592 1986 245 00 Blade runner|h[videorecording] /|cThe Ladd Company 260 Los Angeles :|bEmbassy Home Entertainment,|cc1986 300 1 videocassette (122 min.) :|bsd., col. ;|c1/2 in 500 VHS format 500 "VHS hi-fi stereo 1380." 500 "Mono compatible". --- container 500 Based on the novel Do androids dream of electric sheep? by Philip K. Dick 500 Winner of: Los Angeles film critic's award, RIAA and ITA video awards, 3 British academy awards, 2 academy award nominations 500 Videocassette release of the 1982 motion picture by The Ladd Company, originally rated R 500 Miami University's MCIS video collection 508 Producer, Michael Deeley : director, Ridley Scott ; screenplay, Hampton Fancher and David Peoples ; visual effects, Douglas Trumbull ; original music, Vangelis ; photography, Jordan Cronenweth ; editor, Terry Rawlings 511 1 Harrison Ford, Rutger Hauer, Sean Young, Edward James Olmos, Daryl Hannah 520 A futuristic tale set in the Los Angeles of 2020 520 Directed by Ridley Scott with Harrison Ford, Rutger Hauer, Sean Young and Edward James Olmos. Harrison Ford is at his suspenseful best as he takes you on a frightening, futuristic detective mission to track down and eliminate four renegade "replicants", genetically engineered humans of superior strength and intelligence. Riveting visual effects reflect the bleakness of a world winding down. It's like nothing you've ever seen. (Tamarelle) 1982 R 650 0 Science fiction films 650 0 Feature films 700 1 Ford, Harrison,|d1942- 700 1 Hauer, Rutger,|d1944- 700 1 Dick, Philip K.|tDo androids dream of electric sheep? |h[videorecording] 700 10 Scott, Ridley 710 2 Ladd Company 710 2 Embassy Home Entertainment (Firm) 830 0 MCIS video collection ;|vFS-9 947 upd df Bruce |
From: Morbus I. <mo...@di...> - 2004-01-07 18:54:50
|
[Bruce - starting a new topic on this at libdb-discuss @ SF] The conversation, in progress, is: * instead of just person.name, you should have person.given_name and person.family_name. My initial complaint was that this assumes healthy knowledge of what nationality the name is from: how they culturally display their names (given/family, family/given, etc.). Think Korean, Chinese, Japanese. Bruce's response is valid: >This isn't a problem though. Just attach an xml:lang attribute to such >names. This is part of the reason to use family and given rather than But, still: in my head, it's not that easy. If my client (a movie rental store) is adding in cast information, it assumes a great deal of knowledge for him: that he's going to know what nationality a name is coming from (he has roughly 2000 foreign films), as well as how to properly split the given and family depending on that nationality (as well as being able to spot errors from people who don't and have placed it on the box cover wrong, in IMDB wrong, etc., etc., which happens very, very frequently). Likewise, it also assumes that extra information is out there somewhere for me to aggregate: basing some data entry from IMDB or Amazon would be impossible. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |