From: Roger F. <rog...@ho...> - 2002-02-28 19:39:38
|
<html><div style='background-color:'><DIV> <P>Hi,</P> <P>My apologies if this is in the archives or docs somewhere, but I've looked for ages with no results: I have a database with about 50,000 documents in it which never changed, and I want to add about 6 new documents to it (which I'm going to have to do quite often). How do I add them without the htdig program reindexing everything in there? I have included them in the start.url file, but when I run htdig with the -v option, it shows that it's loading up all the previous servers from the index (the documents are spread across about 1000 different subdomains, so you can see that happening easily).</P> <P>The other command line options I've found liek -i don't seem to prevent the database from reindexing old sites, any clues?</P> <P> </P> <P>Rog.</P></DIV></div><br clear=all><hr>Join the worlds largest e-mail service with MSN Hotmail. <a href='http://g.msn.com/1HM105401/16'>Click Here</a><br></html> |
From: Gabriele B. <an...@ti...> - 2002-02-28 20:44:26
|
>The other command line options I've found liek -i don't seem to prevent >the database from reindexing old sites, any clues? Ciao Roger, I don't know if this could help you, because I've never done it. But probably you could just index this 6 new documents and merge the new db with the previous one. Just an idea ... Maybe other guys can certainly give you better answers. Bye, -Gabriele -- Gabriele Bartolini - Web Programmer Current Location: Prato, Tuscany, Italy an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > find bin/laden -name osama -exec rm {} \; - Important: -------------- I've experienced problems when receiving e-mail sent to the address: an...@us.... I think I lost much of it. So if you sent me a message, and I never replied to you, that's probably the reason. Please update your address book to this one: an...@ti.... Sorry and thank you! |
From: Gilles D. <gr...@sc...> - 2002-02-28 21:59:28
|
According to Gabriele Bartolini: > >The other command line options I've found liek -i don't seem to prevent > >the database from reindexing old sites, any clues? > > Ciao Roger, > > I don't know if this could help you, because I've never done it. But > probably you could just index this 6 new documents and merge the new db > with the previous one. Just an idea ... Maybe other guys can certainly give > you better answers. Well, you certainly don't want to use -i because that removes the existing database and starts over from scratch. Gabriele, your suggestion had been the standard advice we gave for this in the past. However, if you're running 3.1.6 or a recent 3.2.0b4 snapshot, you can use the -m option to htdig to do "minimal" digging. You need to provide a file name argument after the -m, and that file must contain a list of URLs to be indexed. htdig will index only those URLs, using a hop count of 0 so it doesn't follow links. That would be the quickest way to update the index. In 3.1.6, you must run htmerge after running htdig, which will still take a while for a sizable index, but overall it's still faster than doing a full update run. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: System A. <ad...@tg...> - 2002-03-01 09:32:11
|
On Thu, 28 Feb 2002, Gilles Detillieux wrote: > htdig to do "minimal" digging. You need to provide a file name argument > after the -m, and that file must contain a list of URLs to be indexed. > htdig will index only those URLs, using a hop count of 0 so it doesn't > follow links. Can htdig be set to index files which are NOT URLs? Specificly, I have a directory with about 30000 files identified by a 3-8 digit long number that I need to index based upon name(number is the name) These are main text bodies primarly without markup used to generate HTML pages. But I do need to index them for quick search, and presentation through a script as html documents.. I looked at this a year ago, but it seemed that filename extensions were needed.. Jim -- Jim Britain ad...@tg... Tabor Griffin Communications System Operations ji...@tg... 8445 Camino Santa Fe Suite 202 (858) 625-0070 San Diego Ca, 92121 |