From: <mic...@bt...> - 2005-04-28 08:44:29
|
I recall doing something very similar to this as a College project, as a DOS batch file, so I know that this is also possible if you are not running a *NIX system. Mike > -----Original Message----- > From: Douglas Kline [mailto:kl...@he...]=20 > Sent: 27 April 2005 19:11 > To: Brockington,MJ,Mike,IQ D > Cc: htd...@li... > Subject: Re: [htdig] indexing a directory tree in the file system=20 >=20 >=20 > > If you want all files indexed within that tree, then you=20 > could use some > > sort of script to dump out a recursive directory listing to=20 > a file, then > > use that file as the source for your Start_URL. > >=20 > > If you only want a subset then that technique might not be=20 > practical. > >=20 > > Mike > >=20 > > > -----Original Message----- > > > From: htd...@li...=20 > > > [mailto:htd...@li...] On Behalf=20 >=20 > > > Hi, > > >=20 > > > is there a simple incantation of htdig that would allow=20 > me to index a > > > file tree (not via the web server)? > > > I have a hodgepodge of files, all text, some with and some without > > > extensions that I would like to have full text search=20 > capabilities on. > > > But I can not figure out how to get them all indexed.=20 > Some do, some > > > don't. (I am using a file:// URL as the starting point). > > > I am using 3.2.0b6. > > > Looks like htdig is geared mainly towards web site=20 > indexing and I am > > > trying to bend it too much... >=20 >=20 >=20 > If this is under Unix, you could use the "find" command to=20 > write out all the > files in a tree. If you want to select some of them, you=20 > could use options to > the "find" command which are more plentiful with the Gnu=20 > version or you could > pipe the output to a command like a grep or sed to select=20 > some files. The > argument to ht-Dig should be a URL or list of URL's. So=20 > somewhere a URL has to > be used to address the files. Once you have a URL you can=20 > use, the various > files can be addressed by pathnames starting from the URL. =20 > You could convert > the list of files into a list of URL's with substitutions=20 > which could be > executed in the same pipe from the find command. >=20 > If you're not running Unix, there might be some parallel=20 > operations you could > use. >=20 > Douglas >=20 > =3D=3D=3D=3D=3D=3D=3D=3D > Douglas Kline > kl...@he... >=20 >=20 >=20 |