Thread: [libdb-develop] On Naming A Person and The Cultural And Familiar Divide
Status: Inactive
Brought to you by:
morbus
From: Morbus I. <mo...@di...> - 2004-01-07 18:54:50
|
[Bruce - starting a new topic on this at libdb-discuss @ SF] The conversation, in progress, is: * instead of just person.name, you should have person.given_name and person.family_name. My initial complaint was that this assumes healthy knowledge of what nationality the name is from: how they culturally display their names (given/family, family/given, etc.). Think Korean, Chinese, Japanese. Bruce's response is valid: >This isn't a problem though. Just attach an xml:lang attribute to such >names. This is part of the reason to use family and given rather than But, still: in my head, it's not that easy. If my client (a movie rental store) is adding in cast information, it assumes a great deal of knowledge for him: that he's going to know what nationality a name is coming from (he has roughly 2000 foreign films), as well as how to properly split the given and family depending on that nationality (as well as being able to spot errors from people who don't and have placed it on the box cover wrong, in IMDB wrong, etc., etc., which happens very, very frequently). Likewise, it also assumes that extra information is out there somewhere for me to aggregate: basing some data entry from IMDB or Amazon would be impossible. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-07 19:23:40
|
On Jan 7, 2004, at 1:54 PM, Morbus Iff wrote: > >This isn't a problem though. Just attach an xml:lang attribute to > such > >names. This is part of the reason to use family and given rather > than > > But, still: in my head, it's not that easy. If my client (a movie > rental store) is adding in cast information, it assumes a great deal > of knowledge for him: that he's going to know what nationality a name > is coming from (he has roughly 2000 foreign films), as well as how > to properly split the given and family depending on that nationality > (as well as being able to spot errors from people who don't and have > placed it on the box cover wrong, in IMDB wrong, etc., etc., which > happens very, very frequently). OK, fair enough; it's not exactly "easy" ;-) Part of what you're observing, though, is not a data issue, but an interface issue Of course they're related, but not so closely that we can't disentangle them (perhaps?). > Likewise, it also assumes that extra information is out there > somewhere for me to aggregate: basing some data entry from IMDB or > Amazon would be impossible. I know nothing about the formats of either of these. What's the problem that makes it "impossible"? Maybe there are library sources of MARC records you could source? I just looked in my library catalog for the Blade Runner video, which returned this MARC record. Seems key names are parsed (see the 700 fields), and other names (e.g. cast) are not: 001 15494415 005 19971103080721.0 007 vfucbaho- 008 870406s1986 cau122 g vleng dcgmIa 040 IJC|cIJC|dDLM 049 MIBN 090 PN1997|b.B592 1986 245 00 Blade runner|h[videorecording] /|cThe Ladd Company 260 Los Angeles :|bEmbassy Home Entertainment,|cc1986 300 1 videocassette (122 min.) :|bsd., col. ;|c1/2 in 500 VHS format 500 "VHS hi-fi stereo 1380." 500 "Mono compatible". --- container 500 Based on the novel Do androids dream of electric sheep? by Philip K. Dick 500 Winner of: Los Angeles film critic's award, RIAA and ITA video awards, 3 British academy awards, 2 academy award nominations 500 Videocassette release of the 1982 motion picture by The Ladd Company, originally rated R 500 Miami University's MCIS video collection 508 Producer, Michael Deeley : director, Ridley Scott ; screenplay, Hampton Fancher and David Peoples ; visual effects, Douglas Trumbull ; original music, Vangelis ; photography, Jordan Cronenweth ; editor, Terry Rawlings 511 1 Harrison Ford, Rutger Hauer, Sean Young, Edward James Olmos, Daryl Hannah 520 A futuristic tale set in the Los Angeles of 2020 520 Directed by Ridley Scott with Harrison Ford, Rutger Hauer, Sean Young and Edward James Olmos. Harrison Ford is at his suspenseful best as he takes you on a frightening, futuristic detective mission to track down and eliminate four renegade "replicants", genetically engineered humans of superior strength and intelligence. Riveting visual effects reflect the bleakness of a world winding down. It's like nothing you've ever seen. (Tamarelle) 1982 R 650 0 Science fiction films 650 0 Feature films 700 1 Ford, Harrison,|d1942- 700 1 Hauer, Rutger,|d1944- 700 1 Dick, Philip K.|tDo androids dream of electric sheep? |h[videorecording] 700 10 Scott, Ridley 710 2 Ladd Company 710 2 Embassy Home Entertainment (Firm) 830 0 MCIS video collection ;|vFS-9 947 upd df Bruce |
From: <ed-...@in...> - 2004-01-07 19:41:12
|
Hi folks: On Wed, Jan 07, 2004 at 02:25:20PM -0500, Bruce D'Arcus wrote: > I just looked in my library catalog for the Blade Runner video, which > returned this MARC record. Seems key names are parsed (see the 700 > fields), and other names (e.g. cast) are not: > > 001 15494415 > 005 19971103080721.0 > 007 vfucbaho- > 008 870406s1986 cau122 g vleng dcgmIa > 040 IJC|cIJC|dDLM > 049 MIBN > 090 PN1997|b.B592 1986 > 245 00 Blade runner|h[videorecording] /|cThe Ladd Company > 260 Los Angeles :|bEmbassy Home Entertainment,|cc1986 > 300 1 videocassette (122 min.) :|bsd., col. ;|c1/2 in > 500 VHS format > 500 "VHS hi-fi stereo 1380." > 500 "Mono compatible". --- container > 500 Based on the novel Do androids dream of electric sheep? by > Philip K. Dick > 500 Winner of: Los Angeles film critic's award, RIAA and ITA > video awards, 3 British academy awards, 2 academy award > nominations > 500 Videocassette release of the 1982 motion picture by The > Ladd Company, originally rated R > 500 Miami University's MCIS video collection > 508 Producer, Michael Deeley : director, Ridley Scott ; > screenplay, Hampton Fancher and David Peoples ; visual > effects, Douglas Trumbull ; original music, Vangelis ; > photography, Jordan Cronenweth ; editor, Terry Rawlings > 511 1 Harrison Ford, Rutger Hauer, Sean Young, Edward James > Olmos, Daryl Hannah > 520 A futuristic tale set in the Los Angeles of 2020 > 520 Directed by Ridley Scott with Harrison Ford, Rutger Hauer, > Sean Young and Edward James Olmos. Harrison Ford is at his > suspenseful best as he takes you on a frightening, > futuristic detective mission to track down and eliminate > four renegade "replicants", genetically engineered humans > of superior strength and intelligence. Riveting visual > effects reflect the bleakness of a world winding down. > It's like nothing you've ever seen. (Tamarelle) 1982 R > 650 0 Science fiction films > 650 0 Feature films > 700 1 Ford, Harrison,|d1942- > 700 1 Hauer, Rutger,|d1944- > 700 1 Dick, Philip K.|tDo androids dream of electric sheep? > |h[videorecording] > 700 10 Scott, Ridley > 710 2 Ladd Company > 710 2 Embassy Home Entertainment (Firm) > 830 0 MCIS video collection ;|vFS-9 > 947 upd df Which means that if you are looking for films which Terry Rawlings is credited in you better hope you have a key word index on the 508 :) Whereas in IMDB you have a much nicer granularity: http://www.imdb.com/name/nm0712625/ Probably one of the main reasons why Terry Rawlings is not in a 700 is because someone was going to have to look up an authority record for them and couldn't be bothered! I think that library cataloging standards are often counter productive, and that using Amazon and IMDB as models eventhough all we have is their interface, and we will have to guess at the underlying framework, a reverse engineering process of sorts. So what was the question again? :) //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org |
From: Morbus I. <mo...@di...> - 2004-01-07 19:32:11
|
>OK, fair enough; it's not exactly "easy" ;-) > >Part of what you're observing, though, is not a data issue, but an >interface issue Of course they're related, but not so closely that we >can't disentangle them (perhaps?). A stop-gap measure would be support "name, given, family, lang" all at once in the database: with "name" being useful for aggregated, foreign, or unknown parses, and "given" and "family" for instances when they're known. >I know nothing about the formats of either of these. What's the >problem that makes it "impossible"? Maybe there are library sources of >MARC records you could source? I've yet to find a great selection of MARC records for movies (which source did your library pull it's MARC from?), much less movies full of foreign names (perhaps you could try poking around for CITY OF LOST CHILDREN? STACY? AUDITION? As for IMDB, there is no parsing: http://imdb.com/title/tt0368296/. >I just looked in my library catalog for the Blade Runner video, which >returned this MARC record. Seems key names are parsed (see the 700 >fields), and other names (e.g. cast) are not: Annoyingly, the client will want to include cast lists in the database <g>. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Bruce D'A. <bd...@fa...> - 2004-01-12 20:00:47
|
Ed asked: > Probably one of the main reasons why Terry Rawlings is not in a 700 is > because someone was going to have to look up an authority record for > them > and couldn't be bothered! I think that library cataloging standards > are often > counter productive, and that using Amazon and IMDB as models > eventhough all we > have is their interface, and we will have to guess at the underlying > framework, > a reverse engineering process of sorts. > > So what was the question again? :) Answer: how to properly model people and their names? It's a big issue -- which is exactly why there are authority records to begin with -- particularly if you're interested in bibliographic metadata for my (scholarly) needs. Yes, I know, this is starting as a movie DB; but why not be ambitious? See this for an RDF-specific discussion: http://rdfweb.org/topic/NamesInFoaf Bruce |
From: Morbus I. <mo...@di...> - 2004-01-12 20:28:47
|
>http://rdfweb.org/topic/NamesInFoaf Indeed. Although the site seems to be down now (danbri mentioned he was changing servers recently - may be just a side effect of that), he and I have had discussions about it in the past. Once this site comes back up, I'll see what I can see, but the current plan is to: Define person with the following fields: name = full name, unparsed, undetermined. given_name = as you'd expect, "Morbus". family_name = as you'd expect, "Iff". lang_name = corresponds to xml:lang. This gives us: * the flexibility for aggregators to do their work with unparsed name without user intervention. * manual labor to specify (or modify) the given and family's during manual addition or standard editing practices. * support for eventual serialization and specific cultural understanding with xml:lang. Thoughts? Incidentally, I swear to God, I'm gonna start working on those pages again tonight. Been very hectic with article writing and a bunch of other jazz, so I didn't get much (ok, ok, any) stuff done last week. -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus |
From: Morten F. <mo...@mf...> - 2004-01-12 23:58:03
|
Hi all, This just a quick note to introduce myself: Morten Frederiksen, Denmark:=20 Freelance developer etc., mostly working with RDF in general and FOAF... On Monday 12 January 2004 21:00, Bruce D'Arcus wrote: > See this for an RDF-specific discussion: > http://rdfweb.org/topic/NamesInFoaf =2E.. where the issues of names came up (again) last year, resulting in t= he=20 above draft proposal. Bruce pointed me to this project from a post of his, but it turns out I'd= =20 already heard a bit about it, and have run into Morbus before. In the FOAF project we'll likely soon(tm) be revisting the subject of nam= es,=20 and also how to model the various levels of ownership and "levels" of a w= ork.=20 I shall soon be reading up on FRBR... On the names issue, I think it's best in the long run to get something=20 "right", not just stick to first name and last name, western idiosyncrasi= es.=20 However, I do realize that it's not at the top of the priority list of th= is=20 project, and that getting initial data into it may be impossible without=20 simplification, but perhaps a long-term project goal could be to have use= rs=20 world wide contribute to getting all movie-people's names "marked up" wit= hin=20 a model like the above proposes... Anyway, I'm currently busy with other stuff, but will follow this list an= d=20 project (there's obviously some overlap with the FOAF project), hopefully= =20 being able to contribute something sometime. Regards, Morten |
From: Bruce D'A. <bd...@fa...> - 2004-01-13 03:15:26
|
On Jan 12, 2004, at 6:59 PM, Morten Frederiksen wrote: > In the FOAF project we'll likely soon(tm) be revisting the subject of > names, > and also how to model the various levels of ownership and "levels" of > a work. Hmm...this sounds intriguing. Can you explain the last bit? Sounds like FOAF is treading into the realm of bibliographic metadata ;-) > ... perhaps a long-term project goal could be to have users > world wide contribute to getting all movie-people's names "marked up" > within > a model like the above proposes... Also interesting. One issue is that of what the library community refers to as authorized data, of which there are two aspects: 1) How to ensure names are properly marked up in the distributed world of something like FOAF? 2) I wonder if there needs to be some communication with the library world on this? There is a lot of effort (and money) invested in the controlled name data that the library community refers to as "authority records." The LoC is even working on an XML Schema for these data. I wonder if there's an argument to be made not only that such data ultimately ought to be served on the web, but that perhaps it might be designed to be RDF compatible? Bruce |