From: Andreas K. <ak...@ma...> - 2002-07-30 09:44:01
|
Hi all, i have multiple dirs with pdf-files. Every dir has an own db. (example: sport, news etc...) If i move some files from one dir to another the index of the old dir contains the link to the old file. How i can remove this old "links" from old db. Example: dir news contains sport789.pdf and sport790.pdf indexing this thir with htdig after indexing i see the error and move the files to dir sports indexing sports in db news i can find the files sport789.pdf and sport790.pdf. but the links are wrong, because had moved files. Complete reindexing is not possible because there are over 30.000 files in every dir. Any ideas? Thanx! Andreas |
From: Gilles D. <gr...@sc...> - 2002-07-31 22:12:49
|
According to Andreas Kunert: > i have multiple dirs with pdf-files. > Every dir has an own db. (example: sport, news etc...) > If i move some files from one dir to another the index of the old dir > contains the link to the old file. > > How i can remove this old "links" from old db. > > Example: > dir news contains sport789.pdf and sport790.pdf > indexing this thir with htdig > after indexing i see the error and move the files to dir sports > indexing sports > > in db news i can find the files sport789.pdf and sport790.pdf. but the > links are wrong, because had moved files. > > Complete reindexing is not possible because there are over 30.000 files > in every dir. You shouldn't need to completely reindex. An update dig should be sufficient to find out which URLs are no longer on the server and remove them from the database. If you can take advantage of the great speedup provided by the local_urls attribute, an update dig of thirty thousand files should take all that long. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |