From: Doran, M. D <do...@ut...> - 2011-10-18 21:54:22
|
Hi Jonathan, > It is an unholy mess. Yeah, I purposely didn't want to get into all that. For instance, MARC Holding (MFHD) records *also* have a 007 Physical Description fixed field. So in the case of bibs with multiple MFHDs attached (for items with different formats), in *theory* you could distinguish between them. In *practice* though, only 0.1% of our library's MFHD records have a 007 field (and I bet most of yours have a similar percentage). Some institutions *require* that all formats be included on one bibliographic record. > I tell the catalogers, I can only get out what you put > in. They say "we're just following the rules." Yes, there's a direct relationship: cataloging practice (process) => cataloging records (product) => cataloging system functionality (technology) I find though when talking to catalogers, that many don't understand the significance of the fixed fields. And when I finish the presentation on "Format Designation in MARC Records: Impact on yadda yadda," I'll start on "Date Designation in MARC Records". Equally as unholy a mess. -- Michael > -----Original Message----- > From: Jonathan Rochkind [mailto:roc...@jh...] > Sent: Tuesday, October 18, 2011 4:34 PM > To: sol...@go... > Cc: Doran, Michael D; Tuan Nguyen; vuf...@li... Tech > Subject: Re: [solrmarc-tech] RE: [VuFind-Tech] well this is strange > > I haven't completely gotten to the bottom of this either, I use a somewhat > complicated heuristic to determine what format facet values to map to in my > solr indexing (custom code I wrote here). > > But one additional kink to the approach outlined here: > > We have some records that have _both_ a print item _and_ a non-print item > attached to them. The non-print is often a microfilm or electronic version of > the same content; but I think sometimes can instead be a CD or CD-ROM that > accompanied the book. > > Guess what the 007's look like in this case? An 007/008 only for the non- > print version, with nothing for the print version, so no way to tell > something that is _only_ microfilm from something that is microfilm+print. > > It is an unholy mess. I tell the catalogers, I can only get out what you put > in. They say "we're just following the rules." If anyone's driving this bus, > I don't know who. > > > > > I think the issue here w/ the print count is that getMediaTypes() > looks for > > > explicitly declared media types in the 007 or the "form of item" > entry in the > > > 008/006. > > > > I may be able to add to David Walker's assessment. I've recently been > investigating how format designation in MARC records impacts system > functionality, so I've had a chance to take a look at our collection in that > regard [1]. > > > > Any format designation based on the MARC 007 Physical Description fixed > field (which appears to be the case in the "Media Types" being discussed [2]) > is going to run into limitations. For example, our Library's catalog > contains 1,543,293 total bib records, of which 501,885 of those records have > a 007 field. Doing the math: > > > > 501,885/1,543,293 x 100 = 32.5%. > > > > We see that from the *start*, only a third of the bib records even > *have* the data used to assign a Media Type. In the screen shot below, we > can see the distribution based on the character position 00 value of the 007 > field (again this is our collection, although probably typical for an > institution dependent on copy cataloging): > > > > > > > > > > We see that electronic resources, microforms, maps, and sound and video > recordings seem to be fairly well represented, but text (e.g. books) not so > much. One explanation for this is that the 007 field is "mandatory if > applicable" for National Level Full records if the Media Type is Electronic > Resource or Microform, but not for any of the other media types [3]. > > > > > In which case, we might update the code to default to print if no > value is > > > found. > > > > I think we can assume that *most* bib records for print items are not > 007 encoded. However, I would feel uneasy assuming the converse, that all > non-007-encoded bib records are for print items. I will try looking at our > collection to get a better sense of how true this is. > > > > By implementing next-gen catalog systems such as VuFind, Primo, > AquaBrowser, etc., we have escaped some of the limitations of the traditional > ILS catalog. However, as long as MARC records are a primary raw material, we > are still dependent on the *information* encoded in those records, regardless > of how we slice, dice, xml-ify, and index. ;-) > > > > -- Michael > > > > [1] I'm working on a presentation titled "Format Designation in MARC > Records: Impact on Search, Retrieval, Limits and Facets in Voyager and > AquaBrowser". Note that I tend to use the word "format" while realizing that > "type" and "medium" are also used in the MARC specification and there can be > greater or lesser degrees of overlapping meanings. > > > > [2] I'm basing that assumption on the level of format detail, e.g. > "VideoDVD" and "VideoVHS". Drilling down to the "DVD" and "VHS" level from > videorecording, requires looking at the 007 character position 004. See: > http://www.loc.gov/marc/bibliographic/bd007v.html > > > > [3] MARC 21 Format for Bibliographic Data > National Level Full and > Minimal Requirements > > http://www.loc.gov/marc/bibliographic/nlr/ > > > > # Michael Doran, Systems Librarian > > # University of Texas at Arlington > > # 817-272-5326 office > > # 817-688-1926 mobile > > # do...@ut... <mailto:do...@ut...> > > # http://rocky.uta.edu/doran/ <http://rocky.uta.edu/doran/> > > > > > > > -----Original Message----- > > > From: sol...@go... [mailto:solrmarc- > te...@go...] > > > On Behalf Of Walker, David > > > Sent: Thursday, October 13, 2011 2:15 PM > > > To: sol...@go...; Tuan Nguyen; vufind- > > > te...@li... Tech > > > Subject: [solrmarc-tech] RE: [VuFind-Tech] well this is strange > > > > > > Hi Tuan, et alles, > > > > > > I think the issue here w/ the print count is that getMediaTypes() > looks for > > > explicitly declared media types in the 007 or the "form of item" > entry in the > > > 008/006. > > > > > > I suspect -- although I'll have to look at this more closely -- that > most > > > MARC records for print items simply don't have any explicit media > type > > > declared, under the assumption that the record is for an item in > print unless > > > it states otherwise. > > > > > > In which case, we might update the code to default to print if no > value is > > > found. > > > > > > I need to download and try out the latest (MIxin) version of the code > myself. > > > I notice Bob made a few (albeit relatively minor) changes. > > > > > > --Dave > > > > > > ----------------- > > > David Walker > > > Library Web Services Manager > > > California State University > > > > > > > > > -----Original Message----- > > > From: sol...@go... <mailto:solrmarc- > te...@go...> [mailto:sol...@go...] > <mailto:[mailto:sol...@go...]> > > > On Behalf Of Demian Katz > > > Sent: Thursday, October 13, 2011 11:27 AM > > > To: Tuan Nguyen; vuf...@li... <mailto:vufind- > te...@li...> Tech > > > Cc: sol...@go... <mailto:solrmarc- > te...@go...> > > > Subject: [solrmarc-tech] RE: [VuFind-Tech] well this is strange > > > > > > I'm copying this to the solrmarc-tech list in case anyone over there > has > > > comments. I haven't tried this myself yet, but I'll see if I can > find some > > > time in the next week or two to see how our collection breaks down. > > > > > > - Demian > > > > > > > -----Original Message----- > > > > From: Tuan Nguyen [mailto:tu...@yo...] > <mailto:[mailto:tu...@yo...]> > > > > Sent: Thursday, October 13, 2011 2:02 PM > > > > To: vuf...@li... <mailto:vufind- > te...@li...> Tech > > > > Subject: [VuFind-Tech] well this is strange > > > > > > > > Well I thought I'd tried out the new methods to see how our > collection > > > > looks in terms of media types and content types, I indexed 2205853 > > > > marc records. The content types look reasonable, but the media > types > > > > look suspicious. Particularly the Print (3666). I'm sure we have > more > > > > than > > > > 3666 books in print. Has anyone tried this out? > > > > > > > > This is what I get: > > > > > > > > > > > > Content Types: > > > > ------------------------- > > > > Book (2043180) > > > > ComputerFile (378935) > > > > MusicRecording (39292) > > > > Periodical (37856) > > > > Thesis (32405) > > > > Serial (30109) > > > > Video (21007) > > > > MusicalScore (11341) > > > > Map (4555) > > > > MotionPicture (3987) > > > > MapSingle (3722) > > > > BookSubunit (2657) > > > > ProjectedMedium (1769) > > > > SoundRecording (1756) > > > > BookSeries (513) > > > > BookComponentPart (497) > > > > MixedMaterial (421) > > > > Newspaper (371) > > > > FlashCard (337) > > > > Website (324) > > > > ComputerCombination (289) > > > > ComputerInteractiveMultimedia (277) > > > > Atlas (256) > > > > MapSeries (240) > > > > Kit (209) > > > > ComputerDocument (176) > > > > Realia (159) > > > > BookCollection (124) > > > > Database (121) > > > > ComputerBibliographicData (93) > > > > ComputerProgram (92) > > > > SerialIntegratingResource (73) > > > > MapSerial (72) > > > > ArtReproduction (38) > > > > MusicalScoreManuscript (30) > > > > ComputerNumericData (22) > > > > ComputerRepresentational (21) > > > > Image (18) > > > > Slide (15) > > > > LooseLeaf (14) > > > > Model (14) > > > > Chart (13) > > > > SerialComponentPart (10) > > > > ComputerOnlineSystem (9) > > > > Filmstrip (9) > > > > PhysicalObject (7) > > > > Picture (6) > > > > Toy (6) > > > > Game (5) > > > > MapManuscript (4) > > > > > > > > > > > > Media Types: > > > > -------------------------------- > > > > Electronic (388049) > > > > Online (357551) > > > > Microfiche (124710) > > > > SoundDisc (27719) > > > > SoundDiscCD (14120) > > > > SoundDiscLP (13140) > > > > Microfilm (10085) > > > > VideoDVD (8635) > > > > VideoVHS (8515) > > > > Map (7123) > > > > SoundRecordingOther (7042) > > > > Print (3666) > > > > Filmstrip (1337) > > > > ComputerOpticalDisc (1329) > > > > VideoOther (988) > > > > MapOther (906) > > > > ComputerOther (733) > > > > SoundCassette (355) > > > > Atlas (347) > > > > VideoLaserdisc (303) > > > > SensorImage (136) > > > > MicrofilmReel (91) > > > > PrintLarge (85) > > > > VideoUMatic (85) > > > > Microform (53) > > > > Slide (51) > > > > Microopaque (47) > > > > SoundTapeReel (43) > > > > PhotomechanicalPrint (39) > > > > ComputerFloppyDisk (32) > > > > Braille (24) > > > > VideoBeta (22) > > > > MapView (20) > > > > Picture (17) > > > > MapSection (14) > > > > Chart (11) > > > > ComputerOpticalDiscCartridge (10) > > > > MapDiagram (10) > > > > VideoBluRay (10) > > > > ElectronicDirect (9) > > > > ImageOther (9) > > > > ComputerMagnetoOpticalDisc (7) > > > > FilmOther (7) > > > > ComputerDisk (4) > > > > VideoMII (4) > > > > FlashCard (3) > > > > GlobeOther (3) > > > > VideoEIAJ (3) > > > > FilmCassette (2) > > > > ComputerTapeCartridge (1) > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------- > --- > > > > - > > > > ------- > > > > All the data continuously generated in your IT infrastructure > contains > > > > a definitive record of customers, application performance, security > > > > threats, fraudulent activity and more. Splunk takes this data and > > > > makes sense of it. Business sense. IT sense. Common sense. > > > > http://p.sf.net/sfu/splunk-d2d-oct <http://p.sf.net/sfu/splunk-d2d- > oct> > > > > _______________________________________________ > > > > Vufind-tech mailing list > > > > Vuf...@li... <mailto:Vufind- > te...@li...> > > > > https://lists.sourceforge.net/lists/listinfo/vufind-tech > <https://lists.sourceforge.net/lists/listinfo/vufind-tech> > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "solrmarc-tech" group. > > > To post to this group, send email to sol...@go... > <mailto:sol...@go...> . > > > To unsubscribe from this group, send email to solrmarc- > <mailto:sol...@go...> > > > tec...@go... <mailto:solrmarc- > tec...@go...> . > > > For more options, visit this group at > > > http://groups.google.com/group/solrmarc-tech?hl=en > <http://groups.google.com/group/solrmarc-tech?hl=en> . > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "solrmarc-tech" group. > > > To post to this group, send email to sol...@go... > <mailto:sol...@go...> . > > > To unsubscribe from this group, send email to solrmarc- > <mailto:sol...@go...> > > > tec...@go... <mailto:solrmarc- > tec...@go...> . > > > For more options, visit this group at > > > http://groups.google.com/group/solrmarc-tech?hl=en > <http://groups.google.com/group/solrmarc-tech?hl=en> . > > > > > > -- > You received this message because you are subscribed to the Google > Groups "solrmarc-tech" group. > To post to this group, send email to sol...@go.... > To unsubscribe from this group, send email to solrmarc- > tec...@go.... > For more options, visit this group at > http://groups.google.com/group/solrmarc-tech?hl=en. > |