From: <ian...@di...> - 2004-08-28 00:52:47
|
Just did my first run of rundig noticed that the search results were bring up duplicates like: http://www.digitalhit.com/cr/reneezellweger http://www.digitalhit.com/cr/reneezellweger/ and http://www.digitalhit.com/academy/73/index.shtml http://www.digitalhit.com/academy/73/ Anyway to eliminate or weed those out? Thanks. |
From: Jim <li...@yg...> - 2004-08-28 03:43:38
|
On Fri, 27 Aug 2004 ian...@di... wrote: > Just did my first run of rundig noticed that the search results were bring > up duplicates like: > > http://www.digitalhit.com/cr/reneezellweger > http://www.digitalhit.com/cr/reneezellweger/ > > and > > http://www.digitalhit.com/academy/73/index.shtml > http://www.digitalhit.com/academy/73/ > > Anyway to eliminate or weed those out? For the second case, take a look at the following. http://www.htdig.org/attrs.html#remove_default_doc This attribute allows you to specify that index.shtml is to be treated as a default document. Once you do that (and reindex) the index.shtml should be stripped before making the request. That should eliminate the duplication. For the first case, I am not certain what is happening. I suspect there is an issue with the way the web server is configured. Typically a web server will respond with some sort of "moved" status code (e.g. 301) and a pointer to a new location when a URL ending with a directory name is provided without a trailing slash. For example, a request for http://www.digitalhit.com/cr/reneezellweger should result in a moved status code and a new location of http://www.digitalhit.com/cr/reneezellweger/ htdig will drop the first due to the returned status code and then try to request the second. If in your case both are being indexed, the most likely cause is that the web server is configured in a non-standard way (e.g. special rewrite rules) and is returning the same document for both cases. Jim |
From: <ian...@di...> - 2004-08-28 05:40:45
|
Jim said: > For the second case, take a look at the following. > http://www.htdig.org/attrs.html#remove_default_doc Done. I'll see how that works. > For the first case, I am not certain what is happening. I suspect there is > an issue with the way the web server is configured. Typically a web > server will respond with some sort of "moved" status code (e.g. 301) and a > pointer to a new location when a URL ending with a directory name is > provided without a trailing slash. For example, a request for > > http://www.digitalhit.com/cr/reneezellweger I don't know if this helps, but a) we're not using mod_rewrite and b) 'cr' is actually a php file that's taking 'reneezellweger' as a database variable. I'll try another run and see how it goes. Right now I'm trying to solve why I'm getting so many "no server running" errors. Fun and games... |
From: Jim <li...@yg...> - 2004-08-28 06:11:07
|
On Sat, 28 Aug 2004 ian...@di... wrote: >> For the first case, I am not certain what is happening. I suspect there is >> an issue with the way the web server is configured. Typically a web >> server will respond with some sort of "moved" status code (e.g. 301) and a >> pointer to a new location when a URL ending with a directory name is >> provided without a trailing slash. For example, a request for >> >> http://www.digitalhit.com/cr/reneezellweger > > I don't know if this helps, but a) we're not using mod_rewrite and b) 'cr' > is actually a php file that's taking 'reneezellweger' as a database > variable. If this is the case, you might need to consider using mod_rewrite to solve the problem. In the above case, I think a typical web server would look for a file named reneezellweger in a /cr/ directory and then send a moved status if the file doesn't exist, all before even considering the PHP side of things. Perhaps there is some more clever way to handle this case, but rewriting the URLs with the 'cr' might solve the problem with duplication. > I'll try another run and see how it goes. Right now I'm trying to solve > why I'm getting so many "no server running" errors. You might want to consider turning off the ignore_dead_servers attribute. By default, once htdig fails to contact a server once, it assumes the server is dead and skips any further URLs associated with that server. For more on this attribute see the following. http://www.htdig.org/attrs.html#ignore_dead_servers This won't solve the problem of the underlying failure, but it might at least eliminate some of the noise and let you focus on the cases where htdig truly fails to find the server. Jim |
From: Duke H. <dx...@lo...> - 2004-08-30 15:55:33
Attachments:
dxh0844.vcf
|
You might use global search and replace in your site creation software to replace instances of "http://.../cr/reneezellweger" with "http://.../cr/reneezellweger/". If the site is dynamic, you might be able to change entries in your DB in a similar way. HTH, -- Duke Jim wrote: > On Sat, 28 Aug 2004 ian...@di... wrote: > >>> For the first case, I am not certain what is happening. I suspect >>> there is >>> an issue with the way the web server is configured. Typically a web >>> server will respond with some sort of "moved" status code (e.g. 301) >>> and a >>> pointer to a new location when a URL ending with a directory name is >>> provided without a trailing slash. For example, a request for >>> >>> http://www.digitalhit.com/cr/reneezellweger >> >> >> I don't know if this helps, but a) we're not using mod_rewrite and b) >> 'cr' >> is actually a php file that's taking 'reneezellweger' as a database >> variable. > > > If this is the case, you might need to consider using mod_rewrite to > solve > the problem. In the above case, I think a typical web server would look > for a file named reneezellweger in a /cr/ directory and then send a moved > status if the file doesn't exist, all before even considering the PHP > side > of things. Perhaps there is some more clever way to handle this case, but > rewriting the URLs with the 'cr' might solve the problem with > duplication. > >> I'll try another run and see how it goes. Right now I'm trying to solve >> why I'm getting so many "no server running" errors. > > > You might want to consider turning off the ignore_dead_servers attribute. > By default, once htdig fails to contact a server once, it assumes the > server is dead and skips any further URLs associated with that server. > For > more on this attribute see the following. > > http://www.htdig.org/attrs.html#ignore_dead_servers > > This won't solve the problem of the underlying failure, but it might > at least eliminate some of the noise and let you focus on the cases > where htdig truly fails to find the server. > > Jim > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > > |
From: Jennifer Z. <we...@ca...> - 2004-08-30 19:22:22
|
Hello. I have a need to have 2 different conf files and 2 separate databases with my ht://Dig 3.1.6 install to keep my public and private sections of my web site separate. In my first conf file, I have defined the database_dir: as /Volumes/www/htdig/db/public. I then run rundig and define the public conf file: /Volumes/www/htdig/bin/rundig -c /Volumes/www/htdig/conf/public_htdig.conf. This works as expected, all the db files go into the public folder defined in my public.conf file. When I go to run rundig to build the databases for the private side of the web site, with the database_dir: /Volumes/www/htdig/db/private, all the db files go into /Volumes/www/htdig/db/ and not /Volumes/www/htdig/db/private like specified in my conf file. Any thoughts why this would work with the first and not the second? Thanks, Jen |
From: Jim <li...@yg...> - 2004-08-31 04:47:31
|
On Mon, 30 Aug 2004, Jennifer Zelazny wrote: > Hello. I have a need to have 2 different conf files and 2 separate databases > with my ht://Dig 3.1.6 install to keep my public and private sections of my > web site separate. In my first conf file, I have defined the database_dir: > as /Volumes/www/htdig/db/public. > > I then run rundig and define the public conf file: > /Volumes/www/htdig/bin/rundig -c /Volumes/www/htdig/conf/public_htdig.conf. > This works as expected, all the db files go into the public folder defined in > my public.conf file. > > When I go to run rundig to build the databases for the private side of the > web site, with the database_dir: /Volumes/www/htdig/db/private, all the db > files go into /Volumes/www/htdig/db/ and not /Volumes/www/htdig/db/private > like specified in my conf file. > > Any thoughts why this would work with the first and not the second? I would start by removing the config file you are using for the private part of the site and replacing it with the public_htdig.conf file that appears to be working. Then carefully edit the new copy, making just those changes necessary for the private site. I suggest this because it sounds like your config file for the private site might have somehow been corrupted (e.g. a non-printing character or syntax error that is causing causing problems with the parsing of the file). Also double and triple check your command line to ensure that you are passing the correct config file. Are you using the same copy of rundig for both cases? If not, carefully check the scripts to make sure that they are doing what you expect. Jim |
From: Jennifer Z. <we...@ca...> - 2004-08-31 17:36:06
|
Sue enough, the conf file (second one) was corrupted. Thanks! Jen On Aug 30, 2004, at 11:47 PM, Jim wrote: > On Mon, 30 Aug 2004, Jennifer Zelazny wrote: > >> Hello. I have a need to have 2 different conf files and 2 separate >> databases with my ht://Dig 3.1.6 install to keep my public and >> private sections of my web site separate. In my first conf file, I >> have defined the database_dir: as /Volumes/www/htdig/db/public. >> >> I then run rundig and define the public conf file: >> /Volumes/www/htdig/bin/rundig -c >> /Volumes/www/htdig/conf/public_htdig.conf. This works as expected, >> all the db files go into the public folder defined in my public.conf >> file. >> >> When I go to run rundig to build the databases for the private side >> of the web site, with the database_dir: >> /Volumes/www/htdig/db/private, all the db files go into >> /Volumes/www/htdig/db/ and not /Volumes/www/htdig/db/private like >> specified in my conf file. >> >> Any thoughts why this would work with the first and not the second? > > I would start by removing the config file you are using for the > private part of the site and replacing it with the public_htdig.conf > file that appears to be working. Then carefully edit the new copy, > making just those changes necessary for the private site. I suggest > this because it sounds like your config file for the private site > might have somehow been corrupted (e.g. a non-printing character or > syntax error that is causing > causing problems with the parsing of the file). > > Also double and triple check your command line to ensure that you are > passing the correct config file. > > Are you using the same copy of rundig for both cases? If not, carefully > check the scripts to make sure that they are doing what you expect. > > Jim |
From: <ian...@di...> - 2004-08-31 06:08:14
|
Just finishing up the search before I roll it out. Change all my templates to make them xhtml compliant and laid out with CSS. Just noticed that there's a <font size="-1"> before the modified date and the file size. Checked my templates and it's not there. Is there a <font size="-1"> hard-coded into htdig somewhere? |
From: Jim <li...@yg...> - 2004-08-31 07:51:32
|
On Tue, 31 Aug 2004 ian...@di... wrote: > Change all my templates to make them xhtml compliant and laid out with CSS. > > Just noticed that there's a <font size="-1"> before the modified date and > the file size. Checked my templates and it's not there. Is there a <font > size="-1"> hard-coded into htdig somewhere? This font tag is in the long.html template. However that template is not used unless you enable its use in the configuration file. Look for the template_map and template_name attributes in your config file. There is a little blurb just above the attributes that explains their use. In short, you need to uncomment them and then change the template files according to your needs. Jim |
From: <ian...@di...> - 2004-09-01 17:06:10
|
Jim said: > This font tag is in the long.html template. However that template is not Thanks again for the info. I should stop working on this stuff when I'm tired. :-) |
From: <ian...@di...> - 2004-09-02 15:18:46
|
I'm trying to make my search results xhtml 1.0 compliant and one of the errors is that the options tags aren't closed with '/>' Same thing for the stars that go to the end of the document titles. I looked at the templates and I see that these are generated by $(METHOD), $(FORMAT), $(SORT) and $(STARSLEFT). a) how/where do I edit these macros? b) future versions may want to have an xhtml ./configure flag. Thanks. |
From: Jim <li...@yg...> - 2004-09-04 00:20:28
|
On Thu, 2 Sep 2004 ian...@di... wrote: > I'm trying to make my search results xhtml 1.0 compliant and one of the > errors is that the options tags aren't closed with '/>' Same thing for the > stars that go to the end of the document titles. > > I looked at the templates and I see that these are generated by $(METHOD), > $(FORMAT), $(SORT) and $(STARSLEFT). > > a) how/where do I edit these macros? To the best of my knowledge, there is no easy way to modify this part of the output. You would need to make the changes at the source code level and recompile. If you want to give this a try, take a look at Display.cc in the htsearch directory. If you are not up to hacking C++ code, but still want to try the change, I might be able to throw together a patch sometime in the next few days. > b) future versions may want to have an xhtml ./configure flag. You might consider filing a bug report requesting XHTML compliance. I wouldn't mind seeing it added, and I recall others asking for the same in the past. Jim |