From: Gilles D. <gr...@sc...> - 2002-10-01 22:35:39
|
According to Olli Aro: > I just installed Htdig 3.1.6 on my Redhat 7.2. I have successfully indexed > some static sites and now I am trying to index my dynamic OpenCMS site. > htdig indexes fine the static parts of the site, but dynamic parts are not > indexed. Would anyone know, what I am doing wrong? Below is the debug output > from htdig with -vvv option (htmerge did not produce any output). The site > is running on Apache (port 80) and all the dynamic pages are requested from > a Tomcat server over warp. ... > 1:3:1:http://localhost/opencms/opencms/: Making HTTP request on > http://localhost/opencms/opencms/ > Header line: HTTP/1.1 302 Moved Temporarily > Header line: Date: Sat, 28 Sep 2002 06:07:10 GMT > Header line: Server: Apache/1.3.26 (Unix) mod_webapp/1.2.0-dev PHP/4.2.3 > mod_ssl/2.8.9 OpenSSL/0.9.6b > Header line: Content-Type: text/html > Header line: Location: http://localhost/opencms/opencms/index.html > Header line: Transfer-Encoding: chunked > Request time: 0 secs > redirect > redirect: http://localhost/opencms/opencms/ Here's the problem. OpenCMS handles a "directory URL" with a trailing slash by providing a redirect to the same URL with index.html appended. But, htdig treats these two URLs as the same, because with your average HTTP server they are. Apache and most other HTTP servers, when handling a request for a directory URL on their own, would return the contents of index.html or some other default document without doing a redirect. To prevent indexing the same document twice, htdig strips off the usually redundant default document name for the directory. This is handled by http://www.htdig.org/attrs.html#remove_default_doc In your case, you will need to take index.html, and any other default name to which OpenCMS would issue a redirect, out of the remove_default_doc list when indexing OpenCMS content. Either that or figure out a way to get OpenCMS not to issue redirects in this way. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |