From: Franck C. <fra...@rd...> - 2003-05-22 12:11:12
|
Greetings, I use htdig 3.1.6 on Mandrake 9.0. When i launch rundig manually (rundig -v) it's OK, i can do a search in a browser.. But if i have put rundig in a cron job and now i can't launch a search in a brouwser; here is the error message: ht://Dig error htsearch detected an error. Please report this to the webmaster of this site. The error message is: Unable to read word database file Did you run htmerge? Can you help me to solve the problem ? Thanks Franck |
From: Franck C. <fra...@rd...> - 2003-05-22 07:43:55
|
Greetings, I use htdig 3.1.6 on Mandrake 9.0. When i launch rundig manually (rundig -v) it's OK, i can do un search in a browser.. But i put rundig in a cron job and now i can't launch a search in a brouwser; here is the error message: ht://Dig error htsearch detected an error. Please report this to the webmaster of this site. The error message is: Unable to read word database file Did you run htmerge? Can you help to solve the problem ? Thanks Franck |
From: Robert I. <rob...@vo...> - 2005-03-23 15:44:52
|
I have htdig 3.1.6 on a Sun Cobalt RaQ550. I ran ./rundig after some web page changes, and got this: >1159:1159:3:http://www.volvoclub.org.uk/workshop/parts/PartCatalogue1962-71-Index.pdf: >! CORE DUMPED > size = 101502 I have never seen CORE DUMPED before, can you explain why this happened. Thanks Bob Robert Isaac Director & Internet Manager, Volvo Owners Club All email messages are virus scanned before being sent PLEASE INCLUDE ALL PREVIOUS MESSAGE TEXT WITH REPLY Club web site: www.volvoclub.org.uk Also visit: www.trisaac.com for John Wayne Collectors Plates Roil Products Neways International |
From: Jim <li...@yg...> - 2005-03-24 06:20:41
|
On Wed, 23 Mar 2005, Robert Isaac wrote: > I have htdig 3.1.6 on a Sun Cobalt RaQ550. > > I ran ./rundig after some web page changes, and got this: > >> 1159:1159:3:http://www.volvoclub.org.uk/workshop/parts/PartCatalogue1962-71-Index.pdf: >> ! CORE DUMPED >> size = 101502 > > I have never seen CORE DUMPED before, can you explain why this happened. In short, core dumped means the program crashed. Most likely a 'core' file was written somewhere containing details about the program state at the time of the crash. This is not a human readable file. You would need an appropriate debugger and some experience interpreting the debugger output in order to find out anything useful. As for the reason for the crash, there is no way to know based on the available information. What is your current max_doc_size attribute set to? PDF's frequently cause problems if their size exceeds that set by max_doc_size. Any possible issues with the amount of RAM and free disk space available during indexing? Is the problem reproducible? If not you might save yourself some grief by just not worrying about it unless it happens again. If it is reproducible, you might try building a fresh set of database to rule out any existing corruption. Jim |
From: Robert I. <rob...@vo...> - 2005-03-24 12:08:52
|
At 06:20 24/03/2005, you wrote: >On Wed, 23 Mar 2005, Robert Isaac wrote: > >>I have htdig 3.1.6 on a Sun Cobalt RaQ550. >> >>I ran ./rundig after some web page changes, and got this: >> >>> >>>1159:1159:3:http://www.volvoclub.org.uk/workshop/parts/PartCatalogue1962-71-Index.pdf: >>>! CORE DUMPED >>> size = 101502 >> >>I have never seen CORE DUMPED before, can you explain why this happened. > >In short, core dumped means the program crashed. Most likely a 'core' >file was written somewhere containing details about the program state at >the time of the crash. This is not a human readable file. You would need >an appropriate debugger and some experience interpreting the debugger >output in order to find out anything useful. > >As for the reason for the crash, there is no way to know based on the >available information. What is your current max_doc_size attribute set >to? PDF's frequently cause problems if their size exceeds that set by >max_doc_size. Any possible issues with the amount of RAM and free disk >space available during indexing? Is the problem reproducible? If not you >might save yourself some grief by just not worrying about it unless it >happens again. If it is reproducible, you might try building a fresh set >of database to rule out any existing corruption. > >Jim > Thanks Jim. I did ./rundig again and all was OK. Must have been a temporary glitch. Anyone any ideas where an error file may be located and named? Bob Robert Isaac Director & Internet Manager, Volvo Owners Club All email messages are virus scanned before being sent PLEASE INCLUDE ALL PREVIOUS MESSAGE TEXT WITH REPLY Club web site: www.volvoclub.org.uk Also visit: www.trisaac.com for John Wayne Collectors Plates Roil Products Neways International |
From: Gabriele B. <bar...@in...> - 2003-05-22 21:25:15
|
Ciao Franck, At 14.10 22/05/2003 +0200, Franck Collineau wrote: >But if i have put rundig in a cron job and now i can't launch a search in a >brouwser; >here is the error message: Is it performed by the same user that you use when you launch the script manually? What's the cronjob line like? Does the webserver's error log file tell you anything about it? -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Tuscany, Italy bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
From: Franck C. <fra...@rd...> - 2003-05-23 07:15:53
|
Hi Gabriele ! And thank you for your help ! Look at the error log file message. > Is it performed by the same user that you use when you launch the script > manually? Yes > What's the cronjob line like? 0 3 * * 1-5 /usr/bin/rundig > /tmp/rundig.log 2>&1 > Does the webserver's error log file tell you anything about it? Here the message in the error lof file: DB2 problem...: /var/lib/htdig/common/synonyms.db: No such file or directory Franck |
From: Gabriele B. <g.b...@co...> - 2003-05-23 09:36:42
|
Ciao Franck, > Here the message in the error lof file: > DB2 problem...: /var/lib/htdig/common/synonyms.db: No such file or directory There seems to be some problems with the synonyms database. Make sure rundig 'ran': htfuzzy synonyms For some reasons it may not have been run and ... we should discover the reason why. :-) Ciao -Gabriele -- Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
From: <pal...@un...> - 2003-06-04 16:17:59
|
Dear friends, I've a big problem that I don't succeed in solving. I've a site of about 6000 pages. In each page I generate in PHP a menu of the site. Well, I indexed all the site, but, unfortunately, when I searched some of the words contained in the menu I got 6000 and more results!!! Of course only few of them were real; a very big part of them was useless 'cause it contained the link to the page to which the menu belonged and the whole menu as abstract. This is a big problem for me 'cause the words in the menu are obviously the keywords of the corresponding sections ... so, has anyone a good idea to solve this problem? Thank U in advance 4 you help Pietro ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ |
From: Gabriele B. <g.b...@co...> - 2003-06-05 06:32:38
|
Ciao Pietro! Il mer, 2003-06-04 alle 18:00, pal...@un... ha scritto: > Dear friends, > I've a big problem that I don't succeed in solving. > I've a site of about 6000 pages. In each page I generate in PHP a menu of the > site. I suggest you to give a look at FAQ 4.15. http://www.htdig.org/FAQ.html#q4.15 You can decide not to index the menu (maybe just follow the links); give a look at the noindex tag (which is not DTD compliant though - so if you wanna validate your HTML code this must be skipped). My suggestion is to put all the links in the head section (giving them structure information - see HTML specification), and to use the 'htdig_noindex' comment to enclose the menu. For instance <html> <head> .... <link href="sect1.htm" rel="next"> <...> </head> <body> blah blah ... <!-- htdig_noindex --> <map> <div> <strong>Menu</strong><br> <ul> <li><a href="sect1.htm">Sezione 1</a></li> </ul> <div> </map> <!-- htdig_noindex --> etc. > This is a big problem for me 'cause the words in the menu are obviously the > keywords of the corresponding sections ... so, has anyone a good idea to solve > this problem? Have you tried to give a look at the 'backlink_factor' attribute for htsearch? It should give more importance to resource pointed by those links. http://www.htdig.org/attrs.html#backlink_factor Ciao and ... hope this helps! :-) -Gabriele -- Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
From: <pal...@un...> - 2003-06-13 10:37:54
|
Hi friends, I want to thank all the people who gave me some ideas to solve my problem. I found very usefull the solution with the "noindex follow" tags and I adopted it with success. Thank you again :-) Scrive Gabriele Bartolini <g.b...@co...>: > Ciao Pietro! > > Il mer, 2003-06-04 alle 18:00, pal...@un... ha scritto: > > Dear friends, > > I've a big problem that I don't succeed in solving. > > I've a site of about 6000 pages. In each page I generate in PHP a menu of > the > > site. > > I suggest you to give a look at FAQ 4.15. > > http://www.htdig.org/FAQ.html#q4.15 > > You can decide not to index the menu (maybe just follow the links); give > a look at the noindex tag (which is not DTD compliant though - so if you > wanna validate your HTML code this must be skipped). > > My suggestion is to put all the links in the head section (giving them > structure information - see HTML specification), and to use the > 'htdig_noindex' comment to enclose the menu. For instance > > <html> > <head> > .... > > <link href="sect1.htm" rel="next"> > <...> > </head> > > <body> > blah blah ... > > <!-- htdig_noindex --> > <map> > <div> > <strong>Menu</strong><br> > <ul> > <li><a href="sect1.htm">Sezione 1</a></li> > </ul> > <div> > </map> > <!-- htdig_noindex --> > > etc. > > > This is a big problem for me 'cause the words in the menu are obviously the > > > keywords of the corresponding sections ... so, has anyone a good idea to > solve > > this problem? > > Have you tried to give a look at the 'backlink_factor' attribute for > htsearch? It should give more importance to resource pointed by those > links. > > http://www.htdig.org/attrs.html#backlink_factor > > Ciao and ... hope this helps! :-) > > -Gabriele > -- > Gabriele Bartolini - Web Programmer > Comune di Prato - Prato - Tuscany - Italy > g.b...@co... | http://www.comune.prato.it > > find bin/laden -name osama -exec rm {} ; > ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ |
From: M. B. <sb...@on...> - 2003-06-13 12:32:58
|
Sorry, Can you tell me, how you do it? or better can you show me an example? , I just like to do the same thing. thanks in advance On Fri, 2003-06-13 at 10:59, pal...@un... wrote: > Hi friends, > I want to thank all the people who gave me some ideas to solve my problem= . > I found very usefull the solution with the "noindex follow" tags and I ad= opted=20 > it with success. >=20 > Thank you again :-) >=20 >=20 > Scrive Gabriele Bartolini <g.b...@co...>: >=20 > > Ciao Pietro! > >=20 > > Il mer, 2003-06-04 alle 18:00, pal...@un... ha scritto: > > > Dear friends, > > > I've a big problem that I don't succeed in solving. > > > I've a site of about 6000 pages. In each page I generate in PHP a men= u of > > the=20 > > > site. > >=20 > > I suggest you to give a look at FAQ 4.15. > >=20 > > http://www.htdig.org/FAQ.html#q4.15 > >=20 > > You can decide not to index the menu (maybe just follow the links); giv= e > > a look at the noindex tag (which is not DTD compliant though - so if yo= u > > wanna validate your HTML code this must be skipped). > >=20 > > My suggestion is to put all the links in the head section (giving them > > structure information - see HTML specification), and to use the > > 'htdig_noindex' comment to enclose the menu. For instance > >=20 > > <html> > > <head> > > .... > >=20 > > <link href=3D"sect1.htm" rel=3D"next"> > > <...> > > </head> > >=20 > > <body> > > blah blah ... > >=20 > > <!-- htdig_noindex --> > > <map> > > <div> > > <strong>Menu</strong><br> > > <ul> > > <li><a href=3D"sect1.htm">Sezione 1</a></li> > > </ul> > > <div> > > </map> > > <!-- htdig_noindex --> > >=20 > > etc. > >=20 > > > This is a big problem for me 'cause the words in the menu are obvious= ly the > >=20 > > > keywords of the corresponding sections ... so, has anyone a good idea= to > > solve=20 > > > this problem? > >=20 > > Have you tried to give a look at the 'backlink_factor' attribute for > > htsearch? It should give more importance to resource pointed by those > > links. > >=20 > > http://www.htdig.org/attrs.html#backlink_factor > >=20 > > Ciao and ... hope this helps! :-) > >=20 > > -Gabriele > > --=20 > > Gabriele Bartolini - Web Programmer > > Comune di Prato - Prato - Tuscany - Italy > > g.b...@co... | http://www.comune.prato.it > > > find bin/laden -name osama -exec rm {} ; > >=20 >=20 >=20 >=20 > ------------------------------------------------- > This mail sent through IMP: http://horde.org/imp/ >=20 >=20 >=20 > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > htdig-general mailing list <htd...@li...> > To unsubscribe, send a message to <htd...@li...urceforg= e.net> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html --=20 S=E9rgio Basto Technology Project Manager onevision design studios TECMAIA - Parque de Ci=EAncia e Tecnologia da Maia Rua Frederico Ulrich, 2650 4470-605 MOREIRA DA MAIA tel. + 351 22 091 5410 fax. + 351 22 091 5419 email: sb...@on... web: http://www.onevision-design.com |
From: Emma J. H. <emm...@xt...> - 2003-06-13 18:08:31
|
On Fri, Jun 13, 2003 at 01:22:22PM +0100, S?rgio Monteiro Basto wrote: > Sorry, Can you tell me, how you do it? or better can you show me an > example? , > I just like to do the same thing. This was already in teh bottom of the email: > > > > > > You can decide not to index the menu (maybe just follow the links); give > > > a look at the noindex tag (which is not DTD compliant though - so if you > > > wanna validate your HTML code this must be skipped). > > > > > > My suggestion is to put all the links in the head section (giving them > > > structure information - see HTML specification), and to use the > > > 'htdig_noindex' comment to enclose the menu. For instance > > > > > > <html> > > > <head> > > > .... > > > > > > <link href="sect1.htm" rel="next"> > > > <...> > > > </head> > > > > > > <body> > > > blah blah ... > > > > > > <!-- htdig_noindex --> > > > <map> > > > <div> > > > <strong>Menu</strong><br> > > > <ul> > > > <li><a href="sect1.htm">Sezione 1</a></li> > > > </ul> > > > <div> > > > </map> > > > <!-- htdig_noindex --> Or use the following: http://htdig.org/FAQ.html#q4.15 4.15. Can I use meta tags to prevent htdig from indexing certain files? Yes, in each HTML file you want to exclude, add the following between the <HEAD> and </HEAD> tags: <META NAME="robots" CONTENT="noindex, follow"> Doing so will allow htdig to still follow links to other documents, but will prevent this document from being put into the index itself. You can also use "nofollow" to prevent following of links. See the section on Recognized META information for more details. For documents produced automatically by MhonArc, you can have that line inserted automatically by putting it in the MhonArc resource file, in the sections IDXPGBEGIN and TIDXPGBEGIN. You can also use the noindex_start and noindex_end attributes to define one set of tags which will mark sections to be stripped out of documents, so they don't get indexed, or you can mark sections with the non-DTD <noindex> and </noindex> tags. The noindex_start and noindex_end attributes can also be used to suppress in-line JavaScript code that wasn't properly enclosed in HTML comment tags (see question 4.26). In 3.1.6, you can also put a section between <noindex follow> and </noindex> tags to turn off indexing of text but still allow htdig to follow links. -- Emma Jane Hogbin [[ 416 417 2868 ][ www.xtrinsic.com ]] |
From: Emma J. H. <emm...@xt...> - 2003-06-04 18:13:43
|
On Wed, Jun 04, 2003 at 06:00:19PM +0200, pal...@un... wrote: > of them were real; a very big part of them was useless 'cause it contained > the link to the page to which the menu belonged and the whole menu as abstract. If I understand your problem correctly you can solve this with the following: http://htdig.org/FAQ.html#q4.15 4.15. Can I use meta tags to prevent htdig from indexing certain files? Yes, in each HTML file you want to exclude, add the following between the <HEAD> and </HEAD> tags: <META NAME="robots" CONTENT="noindex, follow"> Doing so will allow htdig to still follow links to other documents, but will prevent this document from being put into the index itself. You can also use "nofollow" to prevent following of links. See the section on Recognized META information for more details. For documents produced automatically by MhonArc, you can have that line inserted automatically by putting it in the MhonArc resource file, in the sections IDXPGBEGIN and TIDXPGBEGIN. You can also use the noindex_start and noindex_end attributes to define one set of tags which will mark sections to be stripped out of documents, so they don't get indexed, or you can mark sections with the non-DTD <noindex> and </noindex> tags. The noindex_start and noindex_end attributes can also be used to suppress in-line JavaScript code that wasn't properly enclosed in HTML comment tags (see question 4.26). In 3.1.6, you can also put a section between <noindex follow> and </noindex> tags to turn off indexing of text but still allow htdig to follow links. -- Emma Jane Hogbin [[ 416 417 2868 ][ www.xtrinsic.com ]] |
From: Tony G. <to...@tg...> - 2003-06-04 19:22:04
|
hello, I upgraded to apache 2. htdig works on the domain that uses the htdig.conf file (except for style sheet on wrapper.html) but not on the two other domains that use their own .conf files. it is clearly a path issue but after beating up on the server all day I am too blind to see where I am going wrong. apache 2 is in /usr/local/apache2 document root is still /var/www/html help... please!!! Cheers Tony Grant PS I'm going to bed and will read all your wonderful solutions tomorrow morning. |
From: Jim C. <li...@yg...> - 2003-06-05 02:54:43
|
On Wednesday, June 4, 2003, at 01:21 PM, Tony Grant wrote: > I upgraded to apache 2. > > htdig works on the domain that uses the htdig.conf file (except for > style sheet on wrapper.html) but not on the two other domains that use > their own .conf files. Is the new Apache install perhaps setup to run as a different user? If so, it might be that the permissions/ownership set for the other config files are too restrictive for the new user. Jim |
From: Tony G. <to...@tg...> - 2003-06-05 06:24:04
|
On Thu, 2003-06-05 at 04:54, Jim Cole wrote: > On Wednesday, June 4, 2003, at 01:21 PM, Tony Grant wrote: > > > I upgraded to apache 2. > > > > htdig works on the domain that uses the htdig.conf file (except for > > style sheet on wrapper.html) but not on the two other domains that use > > their own .conf files. > > Is the new Apache install perhaps setup to run as a different user? If > so, it might be that the permissions/ownership set for the other config > files are too restrictive for the new user. I have checked from the command line - works just fine for all three .conf files. The web server isn't serving the results for the pages that have includes from external domains... Cheers Tony Grant |
From: Jim C. <li...@yg...> - 2003-06-05 15:34:27
|
On Thursday, June 5, 2003, at 12:23 AM, Tony Grant wrote: >> Is the new Apache install perhaps setup to run as a different user? If >> so, it might be that the permissions/ownership set for the other >> config >> files are too restrictive for the new user. > > I have checked from the command line - works just fine for all three > .conf files. The user you are running as from the command line doesn't necessarily have anything to do with the user the web server is running as; the latter is usually defined as part of the web server configuration. If you haven't already done so, you should just do a quick 'ls -l' on all of your config files and make sure they all have the same owner, group. and permissions as the one config file that is working. If they are the same, then you can cross one possible problem off the list (I am assuming all of the config files are in the same directory and being used by the same copy of htsearch). Jim |
From: Tony G. <to...@tg...> - 2003-06-05 14:21:09
|
On Thu, 2003-06-05 at 04:54, Jim Cole wrote: > Is the new Apache install perhaps setup to run as a different user? If > so, it might be that the permissions/ownership set for the other config > files are too restrictive for the new user. I am running as the same user and group. htdig.conf works, client1.conf and client2.conf don't. I believe that htsearch can't find the config files. And I still can't figure why. Everything else works on the server (tomcat, webaliser) which leaves me with a much lower opinion of htdig now... Cheers Tony Grant -- www.tgds.net Library management software toolkit, redhat linux on Sony Vaio C1XD, Dreamweaver MX with Tomcat and PostgreSQL |
From: Jim C. <li...@yg...> - 2003-06-05 19:51:18
|
On Thursday, June 5, 2003, at 07:50 AM, Tony Grant wrote: > htdig.conf works, client1.conf and client2.conf don't. > > I believe that htsearch can't find the config files. And I still can't > figure why. The path to the directory containing the configuration files is hard-coded into htsearch. If it can find one of them, then it should be able to find all of them, assuming of course that the ownership and permissions are the same for each. You are using just one copy of htsearch and have all three config files located in the same directory? Have you double checked the search pages for the other domains to ensure that the 'config' attributes are set correctly (i.e. value="client1" and value="client2"). Is there any chance that you have more than one copy of htsearch on the server, and that a different version is being picked up for the other domains.? Those are the only things that I can think of to check at the moment. The htsearch code is simply trying to read a file with a full name that includes the compiled in path, plus the search page's 'config' attribute, plus a '.conf'. If it fails to read the file, either the path is wrong (e.g. more than one instance of htsearch, misplaced config file, incorrect 'config' attribute, etc.) or the file is unreadable (e.g. permission and/or ownership problems). > Everything else works on the server (tomcat, webaliser) which leaves me > with a much lower opinion of htdig now... Though I won't go as far as ruling out an ht://Dig problem, the fact that things work from the command line implies that the problem is 99% likely to be due server configuration issues rather than the ht://Dig code. This is especially true if your are using 3.1.6, which has received extensive testing. I certainly have no problems using that version with Apache 2 on the servers that I maintain. |
From: Tony G. <to...@tg...> - 2003-06-06 06:30:24
|
On Thu, 2003-06-05 at 17:51, Jim Cole wrote: > > htdig.conf works, client1.conf and client2.conf don't. > > > > I believe that htsearch can't find the config files. And I still can't > > figure why. > > The path to the directory containing the configuration files is > hard-coded into htsearch. If it can find one of them, then it should be > able to find all of them, assuming of course that the ownership and > permissions are the same for each. You are using just one copy of > htsearch and have all three config files located in the same directory? Yes they are all in /etc and they all belong to root. > Have you double checked the search pages for the other domains to > ensure that the 'config' attributes are set correctly (i.e. > value="client1" and value="client2"). <input type="hidden" name="config" value="blah"> <input type="hidden" name="restrict" value="http://www.blah/"> > Is there any chance that you have more than one copy of htsearch on the > server, and that a different version is being picked up for the other > domains.? There is one in cgi-bin and one in /usr/bin/ which I believe is the command line one... > Though I won't go as far as ruling out an ht://Dig problem, the fact > that things work from the command line implies that the problem is 99% > likely to be due server configuration issues rather than the ht://Dig > code. This is especially true if your are using 3.1.6, which has > received extensive testing. I certainly have no problems using that > version with Apache 2 on the servers that I maintain. I have been using it for a very long time. This happened once before years ago and the only way to fix it was to do "rpm -Uvh" reinstalling the same version over the top of the one that had stopped working. I will phase out htdig soon and replace it with a pure Java search engine using JSP for the result pages. Cheers Tony Grant -- www.tgds.net Library management software toolkit, redhat linux on Sony Vaio C1XD, Dreamweaver MX with Tomcat and PostgreSQL |