|
From: Gilles D. <gr...@sc...> - 2002-06-11 19:51:53
|
According to Neal Richter: > Is there a method somewhere to remove a document from an index? > > I though I found on but can't locate it again.. htpurge is the tool that does this. In recent 3.2.0b4 snapshots anyway, there is a -u option to htpurge to feed it a single URL argument. You can also use "htpurge -" and feed it a list of URLs via stdin. > This will hopefully be a method added to the libhtdig API and is > usefull in the contect of large indexes where rebuilding a huge index to > remove 1 document is inefficient. htpurge has two main functions, purgeDocs and purgeWords. The former is called first, given an optional Dictionary object of URLs to be purged. It traverses db.docdb looking for records to be purged for a number of reasons (not indexed, obsolete entry, not found, no excerpt, not retrieved) as well as those explicitly requested for purging, if any. It then produces a list of DocIDs, which purgeWords then uses to get rid of any words in the word database that are associated with these document IDs. It should be reasonably easy to extract from htpurge.cc the code you need for the API function for this operation. purgeDocs is pretty easy to follow, and the actual work of deleting records is pretty straightforward. Rather than traversing the whole database looking for a URL, though, it may be more expedient to look up the ID for a URL in the index. purgeWords is harder to follow, because it uses the rather cryptic mifluz database "walking" API, but you can likely use that code as-is. > As a note I will try to submit a patch and set of makefiles to do > a native WIN32 port of HtDig & libhtdig this week. > > The way this will work is that using a WIN32 system with cygwin > and MSVC installed, a seperate set of makefiles are used to build a fully > native WIN32 set of binaries (no cygwin dll needed). > > Cygwin is needed since these makefiles use GNU make instead of windows > make, but MSVC's command line compiler is used. > > A ./configure is not necessary, but a setup script will > replace/modify any the existing db/db_config.h & include/htconfig.h > > Any feedback in how you would like this to work better let me > know! > > There are free native windows compilers out there.. Borland & > Watcom have them for download.. Watcom's is Open Source as well.. > > http://www.openwatcom.org Thanks for all your work on that! -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |