Re: [Emilda-devel] resend : merge option in emilda 1.3 ?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, Apr 20, 2005 at 08:41:02PM +0200, Wolfgang Pichler wrote:
> David Everly wrote:
> >On Wed, Apr 20, 2005 at 12:17:34PM +0200, Wolfgang Pichler wrote:
> >>question :
> >>
> >>will there be the option to merge  lexicographical records (ie.e
> >>MARC Data a'la book_add where this data is constituted) from z3950 ?
> >>
> >>this would mean i can refine MARC-Data (make it more appropriate or
> >>precise) by a subsequent book_edit from z3950 by substituting
> >>(adding) fields via gui..
> >>i could start with "rather bad" lexicographical data in 9xx records
> >>and refine it step by step later from selected z3950-servers ....
> >
> problem is:
>=20
> i have really bad, unstructured data from the old system
> a) title related
> b) author related
> c) imprint related
> d) some supplemental infos (language maybe, etc)

Yes, this is what I had imagined you were facing.

> i will put these into some 9xx fields and configure search in "native"=20
> zebra accordingly indexed (word/phrase)
>=20
> replacement dose not mean to associate a different marc record but to=20
> edit it appropriately (offline or within emilda)

I can see a use for both abilities.

> >Wouldn't this be somewhat problematic for you as you would have to hack
> >the search and display fields to search and display both the 9XX fields
> >and the normal fields within the MARC record?
> >
> yes. no hack.
> user must specify where to search : in "old fuzz" region or in "new and=
=20
> pretty" one.

Yes this would be easy to do I think.  And one could even use usmarc.abs
to have a single (for instance, Author) search return records in which
either 100, 700, or your 9XX Author-related field would produce a match.
However, the problem I see is one of what fields are thusly displayed
when one wants to see the list of Titles:  Does one display 245a or what
local authorities have determined to be the 9XX equivalent?  And of
course this path continues with respect to all the other "duplicate"
fields one might set up.

> >If I understand what you want to do correctly, would a "replace" of the
> >relationship between the book(s) in the mysql database with a different
> >(better) MARC record (with the bood_add type search mechanism) address
> >your needs?  If so, then it would seem to be a much easier task that
> >could be solved several ways:
> >
> >1.  UPDATE books SET book_control_number =3D "$new_number"
> >   WHERE book_control_number =3D "$old_number";
> >
> >2.  UPDATE books SET book_control_number =3D "$new_number"
> >   WHERE book_id IN ($1, $2, $3 ...);
> >
> >3.  Overwrite the existing MARC record (keeping the same control number)
> >
> >=20
> >
> 3. i do intend ...
>=20
> >If this DOES meet your needs and only one of the above three methods can
> >be done, I would select changing the book_control_number for selected
> >book_ids, since this offers the greatest flexibility:  it provides
> >additional functionality in the case that copies were added to an
> >existing MARC record when in fact there should have been a new/different
> >MARC record used.  My users have had this situation before, and the
> >solution was to delete and re-add (which seems a little drastic to me).
> >
> >What do you think?  Did I understand what you want to do?
> >
> >=20
> >
> exactly. the re-edit task is the one people do not understand quite well =
:-)
> there is a perltk megawidget package on CPAN to edit marc records.
>=20
> i think i will have to engage it if 1.3 will have no provisions to merge=
=20
> existing with newly retrieved ones.
> but i hope at least there will be the possibility to remap/filter=20
> individual fields for individual z3950 sources (and have other features=
=20
> like authentication etc.) ...
>=20
> i fear the concept of bibliographic "data-quality" in emilda is "aquire=
=20
> once and forget" :-)
> real life is somewhat not as neat. there is one person available for=20
> 14000 items and no way to establish things smoothly.
> it took more than one year now to put intermediary barcodes on all=20
> items, make an inventory and print and affix the "new" spine-labels on=20
> all of them.
> who could afford another half year to re-aquire bibliographic data from=
=20
> scratch ?

I understand your situation is difficult (and it is probably not very
much unlike others of us).

Certainly for hand editing, "Edit item information" is a good start.
Perhaps a replace/edit with search/retrieve/edit capabilities (similar
to add new) may also be useful.

But to understand the relative importance of adding a merge function, I
would want to understand just a little more:

Is it fair and correct to say that the logic is as follows:

1.  Import existing data into 9XX fields.
2.  Have beautiful corrected/new data in regular fields.
3.  If only 9XX fields exist, then use them
4.  If the record has been "refined" then non-9XX fields will exist and
    thus deprecate and supersede the 9XX fields.

_If_ you can say that all points are true (especially #4), then you can
probably simplify the local process to say that one imports existing
data into the regular places and refines it (either by the existing
"Edit item information" function or by finding a better "upstream"
record and editing that as a replacement to the existing record).
This simplified approach would need no 9XX records (although as a
running tally for your staff you may wish to have a 9XX field to
indicate whether the record has been "cleaned").

If not, it would be interesting find all possible states of these
records, how they transition from one state to the next, and how a
person in charge of cataloguing would determine what is needed (and
when) for each of these records.  For instance, would your vision be to
one day eliminate the need for 9XX records at your site?

Thanks,
Dave.
--=20
Encrypted Mail Preferred:
    Key ID:  8527B9AF
    Key Fingerprint:  E1B6 40B6 B73F 695E 0D3B  644E 6427 DD74 8527 B9AF
    Information:  http://www.gnupg.org/
                                                                           =
    =20
ASCII ribbon campaign:
()  against HTML email
/\  against Microsoft attachments
    Information:  http://www.expita.com/nomime.html