From: Paul M. R. <pa...@go...> - 2004-03-06 21:46:13
|
Greetings, Has anyone else run across this? Somebody somewhere on our website made a link like... http://www.goshen.edu./foo (Notice the period after the domain name). Now, I'm getting duplicates of many pages in my htdig (3.2...) index. E.g. http://www.goshen.edu./foo and http://www.goshen.edu/foo Look like two different pages. I was able to axe the duplicates with this configuration file directive (yeah, it would be more elegant to do a URL rewrite...) exclude_urls: goshen.edu./ But this got me thinking.... I thought I should/could really get rid of these with (apache) webserver re-write rules, but after a bit of trying, could not get apache's mod_rewrite to treat goshen.edu./ and goshen.edu/ in any distinguishable way. Now, I'm wondering if this is a deeper problem? This URL: http://httpd.apache.org/docs-2.0/mod/mod_proxy.html seems to indicate that a domain with a trailing period might in principal be considered equivalent to one without. (Search the page for "trailing period".) If so, it might be nice if htdig could automagically treat as the same URLs like mydomain.com. and mydomain.com . Those other guye, Google, et. al. must surely compensate for this sort of thing? But perhaps someone who knows more about DNS than I do could straighten me out? -Paul -- Paul Meyer Reimer paulmr at goshen.edu Goshen College Goshen, IN 46526 |