From: Gilles D. <gr...@sc...> - 2001-11-21 00:28:29
|
According to Olivier Korn: > Some time ago, Gilles adviced me to do the following : > > LC_COLLATE=C htmerge -c site1.conf > LC_COLLATE=C htmerge -c site2.conf > LC_COLLATE=C htmerge -c site3.conf > *then* > LC_COLLATE=C htmerge -c site1.conf -m site2.conf > LC_COLLATE=C htmerge -c site1.conf -m site3.conf > > We both were not sure wether it was necessary to do the first pass or not > but I still do this nowadays and it is working perfectly. > > Note : "LC_COLLATE=C" is there because I use another locale than C (or > en_US). On many systems, the en_US locale uses a collating sequence that treats accented characters as unaccented, just like fr_FR or other iso-8859-1 based locales. Use "LC_COLLATE=C" if there's any chance you have accented characters in your documents. This is done in the 3.1.6 snapshot. As it turns out, the first htmerge pass, after htdig, is needed on each database before you run htmerge -m. The code that handles the merging of two databases expects that the wordlist has already been purged of control records that htdig uses to tell htmerge about documents to update or delete. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |