From: Gilles D. <gr...@sc...> - 2002-11-07 19:01:37
|
According to Dave Parfitt: > I want people to be able to create pdf files and save them to a > directory on our intranet. > I don't want to make a hyperlink to each new pdf file on a webpage in > order for that new pdf > file to be indexed. Can this be done? It can be done pretty easily if you have shell access to the web server, and can run a "find" command to get a list of all your PDF files, as explained in http://www.htdig.org/FAQ.html#q5.25 To give you a more concrete example, more relevant to your PDF files, on my system the DocumentRoot for Apache is set to /home/httpd/html, so I can use this command: find /home/httpd/html -type f -name '*.[Pp][Dd][Ff]' -print | sed -e 's|/home/httpd/html/|http://www.scrc.umanitoba.ca/|' \ > /etc/htdig/pdflist.txt to build the list of URLs of all *.pdf and *.PDF files on my server. I can then put this attribute setting in my htdig.conf to use this list: start_url: `/etc/htdig/pdflist.txt` Just change the directory name and URL in the sed command and find command to whatever you need for your server, and you can use whatever file name you want for the pdf list. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |