You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Lachlan A. <lh...@us...> - 2004-06-23 12:45:41
|
Greetings all, The code used to use popen, but apparently there is a security problem with it -- something to do with shell escapes, judging from the CVS entry. It changed between 3.1.5 and 3.1.6. Putting popen back just for WIN32 seems the best way to go. Cheers, Lachlan On Mon, 21 Jun 2004 08:32 pm, ka...@ga... wrote: > While compiling htdig 3.2.0b5 on win 32 I found that method 'parse' > of class 'ExternalParser' contains line: > > // NEAL - ENABLE/REWRITE THIS ASAP FOR WIN32 > #ifndef _MSC_VER //_WIN32 > > I've made some changes to ExternalParser, to make it work under > win32. There is no need to create another process or thread to run > external parser - you can call: > FILE *input = _popen((char *)cmdline, "rb" ); > that opens the pipe to read from. > > I've compiled that code succesfully and run htdig with some > external parsers: antiword, xpdf and openoffice (under win2000). -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Ted Stresen-R. <ted...@ma...> - 2004-06-23 11:30:05
|
Hi, I was wondering if someone could lend a hand modifying rundig.sh as follows. I would like the email that is sent at the end of the run to include the following information: - the hostname and IP of the machine that the script is running on - the name (and path) of the script itself (in cases where multiple rundigs might be running) I've managed to get the host name (`hostname`) and the pwd but I cannot figure out how to get the script to print it's own name (so that people can name it whatever they want, for portability). What is the environment variable for the name and/or path of the currently executing script? $SELF doesn't seem to work (borrowing from PHP here...) and $SCRIPTNAME doesn't seem to work either. Any ideas? Ted Stresen-Reuter |
From: Joe R. J. <jj...@cl...> - 2004-06-21 23:35:23
|
On Sun, 20 Jun 2004, Lachlan Andrew wrote: > Date: Sun, 20 Jun 2004 21:41:47 +1000 > From: Lachlan Andrew <lh...@us...> > To: Joe R. Jah <jj...@cl...> > Cc: htd...@li... > Subject: Re: Make check and htdig warnings > > On Sat, 19 Jun 2004 01:08 pm, Joe R. Jah wrote: > > > > running htdig: expected > > file:///tmp/htdig-3.2.0b6/test/htdocs/set1/site4.html > > http://localhost:7400/set1/ > > http://localhost:7400/set1/bad_local.htm > > http://localhost:7400/set1/script.html > > http://localhost:7400/set1/site%201.html > > http://localhost:7400/set1/site2.html > > http://localhost:7400/set1/site3.html > > http://localhost:7400/set1/sub%2520dir/ > > http://localhost:7400/set1/sub%2520dir/empty%20file.html > > http://localhost:7400/set1/title.html > > but got > > http://localhost:7400/set1/ > > http://localhost:7400/set1/bad_local.htm > > http://localhost:7400/set1/script.html > > http://localhost:7400/set1/site%201.html > > http://localhost:7400/set1/site2.html > > http://localhost:7400/set1/site3.html > > http://localhost:7400/set1/sub%2520dir/ > > http://localhost:7400/set1/sub%2520dir/empty%20file.html > > http://localhost:7400/set1/title.html > > FAIL: t_htdig > > > > It is different than last time: > > Thanks. Could you please do a > make TESTS=t_htdig check > and then send me .../test/conf/htdig.conf.tmp ? This test might be > pushing sed to its limits of compatibility... Here it is: ---8<--- # # Example config file for ht://Dig. # # This configuration file is used by all the programs that make up ht://Dig. # Please refer to the attribute reference manual for more details on what # can be put into this file. (http://www.htdig.org/confindex.html) # Note that most attributes have very reasonable default values so you # really only have to add attributes here if you want to change the defaults. # # What follows are some of the common attributes you might want to change. # # Specifies the directory for files that will or can be # shared among different search databases. The default # value for this attribute is defined at compile time. common_dir: ./../installdir # # Specify where the database files need to go. Make sure that there is # plenty of free disk space available for the databases. They can get # pretty big. # database_dir: /tmp/htdig-3.2.0b6/test/var/htdig # # This specifies the URL where the robot (htdig) will start. You can specify # multiple URLs here. Just separate them by some whitespace. # The example here will cause the ht://Dig homepage and related pages to be # indexed. # You could also index all the URLs in a file like so: # start_url: `${common_dir}/start.url` # start_url: HTTP://LocalHost:7400/Set1/ # # This attribute limits the scope of the indexing process. The default is to # set it to the same as the start_url above. This way only pages that are on # the sites specified in the start_url attribute will be indexed and it will # reject any URLs that go outside of those sites. # # Keep in mind that the value for this attribute is just a list of string # patterns. As long as URLs contain at least one of the patterns it will be # seen as part of the scope of the index. # limit_urls_to: ${start_url} site4.html # # If there are particular pages that you definately do NOT want to index, you # can use the exclude_urls attribute. The value is a list of string patterns. # If a URL matches any of the patterns, it will NOT be indexed. This is # useful to exclude things like virtual web trees or database accesses. By # default, all CGI URLs will be excluded. (Note that the /cgi-bin/ convention # may not work on your web server. Check the path prefix used on your web # server.) # exclude_urls: /cgi-bin/ .cgi # # The string htdig will send in every request to identify the robot. Change # this to your email address. # maintainer: jjah # # The excerpts that are displayed in long results rely on stored information # in the index databases. The compiled default only stores 512 characters of # text from each document (this excludes any HTML markup...) If you plan on # using the excerpts you probably want to make this larger. The only concern # here is that more disk space is going to be needed to store the additional # information. Since disk space is cheap (! :-)) you might want to set this # to a value so that a large percentage of the documents that you are going # to be indexing are stored completely in the database. At SDSU we found # that by setting this value to about 50k the index would get 97% of all # documents completely and only 3% was cut off at 50k. You probably want to # experiment with this value. # Note that if you want to set this value low, you probably want to set the # excerpt_show_top attribute to false so that the top excerpt_length characters # of the document are always shown. # max_head_length: 100000 # # To limit network connections, ht://Dig will only pull up to a certain limit # of bytes. This prevents the indexing from dying because the server keeps # sending information. However, several FAQs happen because people have files # bigger than the default limit of 100KB. This sets the default a bit higher. # (see <http://www.htdig.org/FAQ.html> for more) # max_doc_size: 200000 # This sets the maximum length of words that will be # indexed. Words longer than this value will be silently # truncated when put into the index, or searched in the # index. maximum_word_length: 50 # # Most people expect some sort of excerpt in results. By default, if the # search words aren't found in context in the stored excerpt, htsearch shows # the text defined in the no_excerpt_text attribute: # (None of the search words were found in the top of this document.) # This attribute instead will show the top of the excerpt. # no_excerpt_show_top: true # # Depending on your needs, you might want to enable some of the fuzzy search # algorithms. There are several to choose from and you can use them in any # combination you feel comfortable with. Each algorithm will get a weight # assigned to it so that in combinations of algorithms, certain algorithms get # preference over others. Note that the weights only affect the ranking of # the results, not the actual searching. # The available algorithms are: # exact # endings # metaphone # prefix # regex # soundex # synonyms # By default only the "exact" algorithm is used with weight 1. # Note that if you are going to use the endings, metaphone, soundex, # or synonyms algorithms, you will need to run htfuzzy to generate # the databases they use. # search_algorithm: exact:1 # # The following are the templates used in the builtin search results # The default is to use compiled versions of these files, which produces # slightly faster results. However, uncommenting these lines makes it # very easy to change the format of search results. # See <http://www.htdig.org/hts_templates.html for more details. # # template_map: Long long ${common_dir}/long.html \ # Short short ${common_dir}/short.html # template_name: long # # Enable extended features of WordList # wordlist_extend: true # # The following are used to change the text for the page index. # The defaults are just boring text numbers. These images spice # up the result pages quite a bit. (Feel free to do whatever, though) # next_page_text: <img src=/htdig/buttonr.gif border=0 align=middle width=30 height=30 alt=next> no_next_page_text: prev_page_text: <img src=/htdig/buttonl.gif border=0 align=middle width=30 height=30 alt=prev> no_prev_page_text: page_number_text: "<img src=/htdig/button1.gif border=0 align=middle width=30 height=30 alt=1>" \ "<img src=/htdig/button2.gif border=0 align=middle width=30 height=30 alt=2>" \ "<img src=/htdig/button3.gif border=0 align=middle width=30 height=30 alt=3>" \ "<img src=/htdig/button4.gif border=0 align=middle width=30 height=30 alt=4>" \ "<img src=/htdig/button5.gif border=0 align=middle width=30 height=30 alt=5>" \ "<img src=/htdig/button6.gif border=0 align=middle width=30 height=30 alt=6>" \ "<img src=/htdig/button7.gif border=0 align=middle width=30 height=30 alt=7>" \ "<img src=/htdig/button8.gif border=0 align=middle width=30 height=30 alt=8>" \ "<img src=/htdig/button9.gif border=0 align=middle width=30 height=30 alt=9>" \ "<img src=/htdig/button10.gif border=0 align=middle width=30 height=30 alt=10>" # # To make the current page stand out, we will put a border arround the # image for that page. # no_page_number_text: "<img src=/htdig/button1.gif border=2 align=middle width=30 height=30 alt=1>" \ "<img src=/htdig/button2.gif border=2 align=middle width=30 height=30 alt=2>" \ "<img src=/htdig/button3.gif border=2 align=middle width=30 height=30 alt=3>" \ "<img src=/htdig/button4.gif border=2 align=middle width=30 height=30 alt=4>" \ "<img src=/htdig/button5.gif border=2 align=middle width=30 height=30 alt=5>" \ "<img src=/htdig/button6.gif border=2 align=middle width=30 height=30 alt=6>" \ "<img src=/htdig/button7.gif border=2 align=middle width=30 height=30 alt=7>" \ "<img src=/htdig/button8.gif border=2 align=middle width=30 height=30 alt=8>" \ "<img src=/htdig/button9.gif border=2 align=middle width=30 height=30 alt=9>" \ "<img src=/htdig/button10.gif border=2 align=middle width=30 height=30 alt=10>" # local variables: # mode: text # eval: (if (eq window-system 'x) (progn (setq font-lock-keywords (list '("^#.*" . font-lock-keyword-face) '("^[a-zA-Z][^ :]+" . font-lock-function-name-face) '("[+$]*:" . font-lock-comment-face) )) (font-lock-mode))) # end: url_part_aliases: bar foo common_url_parts: http:// http://local HTTP://LocalHost 7400/set1 robotstxt_name: htdig case_sensitive: false url_rewrite_rules: (.*)si[a-z]*[4-9]*\.([a-z]*)tml file:////tmp/htdig-3.2.0b6/test/htdocs/set1/site4.\\2tml ---8<--- > > By the way, by accident I found out that every time I rung make > > check, it leaves four files in /tmp: > > > > -rw-r--r-- 1 jjah wheel 3089 Jun 18 19:46 t_htsearch22185 > > -rw-r--r-- 1 jjah wheel 3079 Jun 18 19:46 t_htsearch22410 > > -rw-r--r-- 1 jjah wheel 6818 Jun 18 19:46 t_htsearch22765 > > -rw-r--r-- 1 jjah wheel 1803 Jun 18 19:46 t_htsearch23147 > > Thanks. They should be deleted except when an error occurs. I'll fix > that. > > Cheers, > Lachlan > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) Regrds, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Neal R. <ne...@ri...> - 2004-06-21 20:44:17
|
On Mon, 21 Jun 2004 ka...@ga... wrote: > Hello, > > While compiling htdig 3.2.0b5 on win 32 I found that method 'parse' of > class 'ExternalParser' contains line: > > // NEAL - ENABLE/REWRITE THIS ASAP FOR WIN32 > #ifndef _MSC_VER //_WIN32 > > I've made some changes to ExternalParser, to make it work under win32. > There is no need to create another process or thread to run external > parser - you can call: > FILE *input = _popen((char *)cmdline, "rb" ); > that opens the pipe to read from. > > I've compiled that code succesfully and run htdig with some external > parsers: antiword, xpdf and openoffice (under win2000). Hmmm.. interesting. I will test that. Please send us a patch. Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2004-06-21 20:42:35
|
Hey all, Do you think a different factor for 'htdig-keywords' META tag would be usefull? It would allow users to make their htdig specific keywords much more important than regular meta keywords... and these days that might be recommended since most major search engines either ignore or punish pages with excessive meta-keywords. Or for users who want htdig to treat pages slightly differently than a internet search engine. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Gabriele B. <an...@ti...> - 2004-06-21 18:33:22
|
Greetings Karol, At 12.32 21/06/2004, ka...@ga... wrote: >While compiling htdig 3.2.0b5 on win 32 I found that method 'parse' of >class 'ExternalParser' contains line: Have you tried htdig 3.2.0b6? That's the latest version and various bugs have been fixed since 3.2.0b5. You can download it from sourceforge.net or www.htdig.org/files Ciao, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Toscana, Italia an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
From: Rzepa, H. <h....@im...> - 2004-06-21 16:44:20
|
Using gcc 3.3 on Irix, I get the following compile errors. Can anyone help? gmake[1]: Entering directory `/var/www/htdig/htdig-3.2.0b6/htlib' /bin/sh ../libtool --mode=compile g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/var/www/htdig32/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -g -O2 -Wall -fno-rtti -fno-exceptions -c -o Configuration.lo `test -f 'Configuration.cc' || echo './'`Configuration.cc mkdir .libs g++ -DHAVE_CONFIG_H -I. -I. -I../include "-DDEFAULT_CONFIG_FILE=\"/var/www/htdig32/conf/htdig.conf\"" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -g -O2 -Wall -fno-rtti -fno-exceptions -c Configuration.cc -DPIC -o .libs/Configuration.o In file included from /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/bits/ios_base.h:45, from /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/ios:49, from /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/ostream:45, from /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/iostream:45, from htString.h:23, from Dictionary.h:22, from Configuration.h:107, from Configuration.cc:24: /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/mips-sgi-irix6.5/bits/atomicity.h: In function `_Atomic_word __exchange_and_add(_Atomic_word*, int)': /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/mips-sgi-irix6.5/bits/atomicity.h:40: error: ` test_then_add' undeclared (first use this function) /usr/freeware/lib/gcc-lib/mips-sgi-irix6.5/3.3/include/c++/mips-sgi-irix6.5/bits/atomicity.h:40: error: (Each undeclared identifier is reported only once for each function it appears in.) gmake[1]: *** [Configuration.lo] Error 1 gmake[1]: Leaving directory `/var/www/htdig/htdig-3.2.0b6/htlib' gmake: *** [all-recursive] Error 1 -- Henry Rzepa. +44 (020) 7594 5774 (Voice); +44 (0870) 132 3747 (eFax) http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College London, SW7 2AZ, UK. (Voracious anti-spam filter in operation for received email. If expected reply not received, please phone/fax). |
From: <ka...@ga...> - 2004-06-21 10:30:42
|
Hello, While compiling htdig 3.2.0b5 on win 32 I found that method 'parse' of class 'ExternalParser' contains line: // NEAL - ENABLE/REWRITE THIS ASAP FOR WIN32 #ifndef _MSC_VER //_WIN32 I've made some changes to ExternalParser, to make it work under win32. There is no need to create another process or thread to run external parser - you can call: FILE *input = _popen((char *)cmdline, "rb" ); that opens the pipe to read from. I've compiled that code succesfully and run htdig with some external parsers: antiword, xpdf and openoffice (under win2000). -- Best regards, Karol Przybyszewski |
From: Lachlan A. <lh...@us...> - 2004-06-20 11:43:12
|
On Sat, 19 Jun 2004 01:08 pm, Joe R. Jah wrote: > > running htdig: expected > file:///tmp/htdig-3.2.0b6/test/htdocs/set1/site4.html > http://localhost:7400/set1/ > http://localhost:7400/set1/bad_local.htm > http://localhost:7400/set1/script.html > http://localhost:7400/set1/site%201.html > http://localhost:7400/set1/site2.html > http://localhost:7400/set1/site3.html > http://localhost:7400/set1/sub%2520dir/ > http://localhost:7400/set1/sub%2520dir/empty%20file.html > http://localhost:7400/set1/title.html > but got > http://localhost:7400/set1/ > http://localhost:7400/set1/bad_local.htm > http://localhost:7400/set1/script.html > http://localhost:7400/set1/site%201.html > http://localhost:7400/set1/site2.html > http://localhost:7400/set1/site3.html > http://localhost:7400/set1/sub%2520dir/ > http://localhost:7400/set1/sub%2520dir/empty%20file.html > http://localhost:7400/set1/title.html > FAIL: t_htdig > > It is different than last time: Thanks. Could you please do a make TESTS=t_htdig check and then send me .../test/conf/htdig.conf.tmp ? This test might be pushing sed to its limits of compatibility... > By the way, by accident I found out that every time I rung make > check, it leaves four files in /tmp: > > -rw-r--r-- 1 jjah wheel 3089 Jun 18 19:46 t_htsearch22185 > -rw-r--r-- 1 jjah wheel 3079 Jun 18 19:46 t_htsearch22410 > -rw-r--r-- 1 jjah wheel 6818 Jun 18 19:46 t_htsearch22765 > -rw-r--r-- 1 jjah wheel 1803 Jun 18 19:46 t_htsearch23147 Thanks. They should be deleted except when an error occurs. I'll fix that. Cheers, Lachlan -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Geoff H. <ghu...@us...> - 2004-06-20 07:16:20
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b6: Scheduled: 31 May 2004. 3.2.0b5: Released: 10 Nov 2003. 3.2.0b4: Cancelled. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: KNOWN BUGS: (none serious. See <http://sourceforge.net/tracker/?atid=104593&group_id=4593&func=browse>.) PENDING PATCHES (available but need work): * Gilles's configuration parsing patches need testing before committing. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. (Is this still pending??) NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
From: Joe R. J. <jj...@cl...> - 2004-06-19 03:08:40
|
On Sat, 19 Jun 2004, Lachlan Andrew wrote: > Date: Sat, 19 Jun 2004 10:12:44 +1000 > From: Lachlan Andrew <lh...@us...> > To: Joe R. Jah <jj...@cl...> > Cc: htd...@li... > Subject: Re: Make check and htdig warnings > > Gretings Joe, Greetings Lachlan, > Yes, that patch turned out to be more complicated that I thought... > "tests/Makefile" is generated from (indirectly) from > "tests/Makefile.am" and "./Makefile.config". Since this last file is > shared by all programs, it would affect the whole build process. I > thought that, so close to the release, I wouldn't jeopardise the main > program for the sake of the tests. Good point. > Regarding t_htdig, could you please post the errors that it outputs? > These will be before the FAIL: t_htdig line. Is it the same as > previously? running htdig: expected file:///tmp/htdig-3.2.0b6/test/htdocs/set1/site4.html http://localhost:7400/set1/ http://localhost:7400/set1/bad_local.htm http://localhost:7400/set1/script.html http://localhost:7400/set1/site%201.html http://localhost:7400/set1/site2.html http://localhost:7400/set1/site3.html http://localhost:7400/set1/sub%2520dir/ http://localhost:7400/set1/sub%2520dir/empty%20file.html http://localhost:7400/set1/title.html but got http://localhost:7400/set1/ http://localhost:7400/set1/bad_local.htm http://localhost:7400/set1/script.html http://localhost:7400/set1/site%201.html http://localhost:7400/set1/site2.html http://localhost:7400/set1/site3.html http://localhost:7400/set1/sub%2520dir/ http://localhost:7400/set1/sub%2520dir/empty%20file.html http://localhost:7400/set1/title.html FAIL: t_htdig It is different than last time: dodoc: cannot open running htdig: expected http://localhost:7400/set1/ http://localhost:7400/set1/bad_local.htm http://localhost:7400/set1/script.html http://localhost:7400/set1/site%201.html http://localhost:7400/set1/site2.html http://localhost:7400/set1/site3.html http://localhost:7400/set1/site4.html http://localhost:7400/set1/sub%2520dir/ http://localhost:7400/set1/sub%2520dir/empty%20file.html http://localhost:7400/set1/title.html but got htpurge: Database is empty! FAIL: t_htdig By the way, by accident I found out that every time I rung make check, it leaves four files in /tmp: -rw-r--r-- 1 jjah wheel 3089 Jun 18 19:46 t_htsearch22185 -rw-r--r-- 1 jjah wheel 3079 Jun 18 19:46 t_htsearch22410 -rw-r--r-- 1 jjah wheel 6818 Jun 18 19:46 t_htsearch22765 -rw-r--r-- 1 jjah wheel 1803 Jun 18 19:46 t_htsearch23147 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Lachlan A. <lh...@us...> - 2004-06-19 00:14:13
|
Gretings Joe, Yes, that patch turned out to be more complicated that I thought... "tests/Makefile" is generated from (indirectly) from "tests/Makefile.am" and "./Makefile.config". Since this last file is shared by all programs, it would affect the whole build process. I thought that, so close to the release, I wouldn't jeopardise the main program for the sake of the tests. Regarding t_htdig, could you please post the errors that it outputs? These will be before the FAIL: t_htdig line. Is it the same as previously? Thanks, Lachlan On Fri, 18 Jun 2004 04:18 pm, Joe R. Jah wrote: > > Patch 3 has not been committed to 3.2.0b6: > > --- test/Makefile.orig Thu Jun 17 22:56:56 2004 > +++ test/Makefile Thu Jun 17 22:54:30 2004 > @@ -188,10 +188,10 @@ > $(top_builddir)/htcommon/libcommon.la \ > $(top_builddir)/htword/libhtword.la \ > $(top_builddir)/htlib/libht.la \ > - $(top_builddir)/htcommon/libcommon.la \ > - $(top_builddir)/htword/libhtword.la \ > - $(top_builddir)/db/libhtdb.la \ > - $(top_builddir)/htlib/libht.la > + $(top_builddir)/./htcommon/libcommon.la \ > + $(top_builddir)/./htword/libhtword.la \ > + $(top_builddir)/./db/libhtdb.la \ > + $(top_builddir)/./htlib/libht.la > > Also t_htdig still fails: > > FAIL: t_htdig > PASS: t_htsearch > PASS: t_htmerge > PASS: t_htnet > PASS: t_htdig_local > PASS: t_factors > PASS: t_fuzzy > PASS: t_parsing > PASS: t_templates > PASS: t_validwords > ==================== > 1 of 19 tests failed > ==================== > > Regards, > > Joe -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Gilles D. <gr...@sc...> - 2004-06-18 18:21:04
|
Besides, htsearch already allows you to override the compiled-in value of the config directory, in a wrapper script, via the CONFIG_DIR environment variable. So, right now you get the best of both worlds, as you can use either approach. See http://www.htdig.org/FAQ.html#q4.20 and http://www.htdig.org/FAQ.html#q5.30 According to Rupert Jones: > I'm sorry, this isn't making any sense to me. > > As I understand it, the path to the config file is compiled in to htsearch > as a security precaution, so that when you specify which config file to use > as a parameter when invoking htsearch from the webpage you are not exposing > the directory location of ht://dig. > > Why do you feel that compiling in the location of the conf files is a > drawback? ... > -----Original Message----- > From: htd...@li... > [mailto:htd...@li...] On Behalf Of Ted > Stresen-Reuter > Sent: 03 June 2004 23:55 > To: //Dig - Dev > Subject: [htdig-dev] htsearch => wrapper script > > You know, in setting up my package, I realized that it seems like a > real drawback to have to compile in the location of the conf files into > htsearch (rather than having it look in it's own directory, for > example, for cases in which the location is not compiled in). > > Maybe, instead of putting htsearch in cgi-bin, we could put htsearch in > --prefix/bin and drop a perl script into cgi-bin that simply passes the > request on to htsearch (and htsearch sends the response). That way, we > could keep all the htdig binaries together in one place. > > Does this make sense to anyone else or just to me? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2004-06-18 17:28:36
|
IIS is Microsoft's web server. If you can get ht://Dig running under Windows, using either Cygwin or the native Win32 build, then supporting IIS should work. To my knowledge, the only issue with IIS was that CGI scripts need to run in NPH (non-parsed headers) mode under IIS. At least, that was the case a few years back - I don't know if that's changed. You can do this with htsearch by setting nph: true in your htdig.conf or other config file. See http://www.htdig.org/attrs.html#nph (for 3.1.6) or http://www.htdig.org/dev/htdig-3.2/attrs.html#nph (for 3.2.0b6). According to Lachlan Andrew: > Forgive my ignorance, by I don't know what IIS is. However, ht://Dig > does run on windows. Most versions work under Cygwin, which provides > a unix-like environment. Neal Richter also recently ported it to run > "natively" under window, but the documentation on how to install that > has not yet been written. > > I hope this helps, > Lachlan > > On Sat, 12 Jun 2004 02:13 am, Moh...@co... wrote: > > Does Htdig works on an IIS box with windows environment? > > Please advice me on this matter. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2004-06-18 17:21:00
|
According to Lachlan Andrew: > The htmerge.html documentation currently says > > Note: You must run htmerge separately on each of the databases > created by htdig before merging them together with this option. > This is because merging the two wordlists together requires > wordlists that have already been cleaned up by htmerge. > > Am I correct in thinking that this is a carry-over from 3.1.x, and is > no longer true? Yup! Now, what I'm not sure of is whether you'd need to run htpurge on each of the databases before merging. I'd be inclined to think it wouldn't be strictly necessary, but maybe helpful if one db has obsolete entries that would conflict with valid entries in the other. I don't know if Geoff can give more insight in this, as he's the one who did a lot of the design of htpurge and htmerge for 3.2. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2004-06-18 16:29:36
|
All right, after about 4 months and a few news updates, I'm satisfied that my script for regenerating main.html is working satisfactorily, so I've switched the index.html and contents.html files over to using this, rather than the main.shtml file. This should help out our mirrors who don't support SSI. The next step, as I see it, is to automate updates to dev/htdig-3.2/main.html the same way. I can see 3 ways of doing this. The main difference between the maindocs/dev/htdig-3.2 version and the maindocs version, is the latter has been using the maindocs/css/htdig.css style sheet for quite some time, but the former hasn't been. So we can... 1) make a copy of css/htdig.css in dev/htdig-3.2, and then the script that updates main.html can just copy it to dev/htdig-3.2, and check in both copies as needed, or 2) copy and check in the 2nd main.html as above, but use sed to strip out the css-dependent stuff, or 3) fully regenerate dev/htdig-3.2/main.html, independently of the one in maindocs (though using the same technique). What do you folks think? If we plan to introduce the style sheet into the 3.2 on-line docs soon, as well as into the htdoc subdirectory of the source tree, then maybe option 1 is the way to go. It's easiest. If we don't plan to do this soon, then option 2 might be better, so the look of main.html is consistent with the other 3.2 on-line docs. However, both 1 and 2 assume that the information in main.html (i.e. the Introduction and Recent News frame when you first go to the site) will remain the same for both. If we want the option to have different content (e.g. a different intro for 3.2), then option 3 is the only real viable one, whether we add style sheets soon or not. I'm planning for now to copy the main.html (minus the css stuff) to dev/htdig-3.2 manually and check it in, for now, but I wouldn't mind some feedback on this before I change the update script. The style sheet was added to maindocs around the time 3.1.6 went out, as an experiment, but we haven't really followed up in terms of deciding if we want to stick to this format or not. Cheers, Gilles Back in February, I wrote... > According to me: > > Another way might be to do away with main.shtml and news.txt from the > > maindocs tree altogether, and just have a main.html file, as before the > > SourceForge days. Only difference is this time, the news section in > > main.html would be between clear delimiters, and the news-get.sh script > > would use these to automatically strip out and reinsert updated news > > items from the one file. It would recommit it only if it was different > > from yesterday's file. The script should also have proper tests to > > ensure the newly generated file is indeed complete, to prevent the whole > > thing from getting clobbered in the event of a disk space crunch, but > > with that, it may be the best option from a maintenance point of view. > > I'm leaning towards this latter approach. > > OK, I've implemented this on a trial basis. index.html and contents.html > still point to main.shtml, with still includes news.txt. However, I've > updated the news-get.sh script to maintain both news.txt and main.html, > so we'll see how it goes in the coming days/weeks. If it seems solid, > we can switch index.html and contents.html over to using main.html, > and we can then get rid of main.shtml and news.txt (and take out the > part in news-get.sh that maintains the latter). > > Please have a look and comment. I'd appreciate the extra eyeballs. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Joe R. J. <jj...@cl...> - 2004-06-18 06:19:02
|
On Fri, 11 Jun 2004 lac...@ip... wrote: > Date: Fri, 11 Jun 2004 10:32:32 +1000 > From: lac...@ip... > To: Joe R. Jah <jj...@cl...>, Lachlan Andrew <lh...@us...> > Cc: htd...@li... > Subject: Re: [htdig-dev] Make check and htdig warnings > > Thanks Joe. I'll apply patch 3 to CVS when I next get to my PC. The problem > with t_htdig has already been fixed. (I'm really not sure why it worked > under gnu/linux -- the script definitely had a bug.) Patch 3 has not been committed to 3.2.0b6: --- test/Makefile.orig Thu Jun 17 22:56:56 2004 +++ test/Makefile Thu Jun 17 22:54:30 2004 @@ -188,10 +188,10 @@ $(top_builddir)/htcommon/libcommon.la \ $(top_builddir)/htword/libhtword.la \ $(top_builddir)/htlib/libht.la \ - $(top_builddir)/htcommon/libcommon.la \ - $(top_builddir)/htword/libhtword.la \ - $(top_builddir)/db/libhtdb.la \ - $(top_builddir)/htlib/libht.la + $(top_builddir)/./htcommon/libcommon.la \ + $(top_builddir)/./htword/libhtword.la \ + $(top_builddir)/./db/libhtdb.la \ + $(top_builddir)/./htlib/libht.la Also t_htdig still fails: FAIL: t_htdig PASS: t_htsearch PASS: t_htmerge PASS: t_htnet PASS: t_htdig_local PASS: t_factors PASS: t_fuzzy PASS: t_parsing PASS: t_templates PASS: t_validwords ==================== 1 of 19 tests failed ==================== Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... > >On Wed, 9 Jun 2004, Lachlan Andrew wrote: > > > >> Date: Wed, 9 Jun 2004 20:49:47 +1000 > >> From: Lachlan Andrew <lh...@us...> > >> To: Joe R. Jah <jj...@cl...>, htd...@li... > >> Subject: Re: [htdig-dev] Make check and htdig warnings > >> > >> On Wed, 9 Jun 2004 03:41 pm, Joe R. Jah wrote: > >> > Hi Folks, > >> > > >> > Make check errors on BSD/OS 4.3.1: > >> > > >> > ../htlib/.libs/libht.a(HtWordType.o): In function > >> > `HtStripPunctuation(String &)': > >> > /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >> > reference to `WordType::instance' gmake[2]: *** [testnet] Error 1 > >> > >> Greetings Joe, > >> > >> This is the same problem as Jesse was getting on HP-UX... To hunt > >> this problem down, could you please > >> 1. Try the explicit g++ command I suggested in > >> <http://www.mail-archive.com/htd...@li.../msg02078.html> > > > > cd test > > g++ -g -O2 -Wall -fno-rtti -fno-exceptions -o testnet testnet.o \ > > -L/opt/htdig/lib/zlib/lib ../htnet/.libs/libhtnet.a \ > > ../htcommon/.libs/libcommon.a ../htword/.libs/libhtword.a \ > > ../db/.libs/libhtdb.a ../htlib/.libs/libht.a \ > > ../htword/.libs/libhtword.a -lz > > gmake check > > > >../htlib/.libs/libht.a(HtWordType.o): In function `HtIsWordChar(char)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtIsStrictWordChar(char)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtWordNormalize(String > >&)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtStripPunctuation(String > >&)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >gmake[1]: *** [url] Error 1 > >gmake[1]: Leaving directory `/usr/src/WWW/htdig/htdig-3.2.0b6/test' > >gmake: *** [check-am] Error 2 > > > >> 2. Replace '--mode=link' by '--mode=link --preserve-dup-deps' in > >> line 324 of test/Makefile and then try make check again. > > > >../htlib/.libs/libht.a(HtWordType.o): In function `HtIsWordChar(char)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtIsStrictWordChar(char)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtWordNormalize(String > >&)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >../htlib/.libs/libht.a(HtWordType.o): In function `HtStripPunctuation(String > >&)': > >/usr/src/WWW/htdig/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > >reference to `WordType::instance' > >*** Error code 1 > > > >Stop. > >*** Error code 1 > > > >Stop. > > > >> 3. Replace the line something like > >> HTLIBS = $(top_builddir)/htnet/libhtnet.la \ > >> $(top_builddir)/htcommon/libcommon.la \ > >> $(top_builddir)/htword/libhtword.la \ > >> $(top_builddir)/htlib/libht.la \ > >> $(top_builddir)/htcommon/libcommon.la \ > >> $(top_builddir)/htword/libhtword.la \ > >> $(top_builddir)/db/libhtdb.la \ > >> $(top_builddir)/htlib/libht.la > >> in test/Makefile, by a line like > >> HTLIBS = $(top_builddir)/htnet/libhtnet.la \ > >> $(top_builddir)/htcommon/libcommon.la \ > >> $(top_builddir)/htword/libhtword.la \ > >> $(top_builddir)/htlib/libht.la \ > >> $(top_builddir)/./htcommon/libcommon.la \ > >> $(top_builddir)/./htword/libhtword.la \ > >> $(top_builddir)/./db/libhtdb.la \ > >> $(top_builddir)/./htlib/libht.la > >> (that is, for the repeated libraries, add a './' to the path) and > >> then rerun make check. > > > > > >PASS: t_wordkey > >PASS: t_wordlist > >PASS: t_wordskip > >PASS: t_wordbitstream > >PASS: t_search > >PASS: t_htdb > >PASS: t_rdonly > >PASS: t_trunc > >PASS: t_url > > > >dodoc: cannot open > >running htdig: expected > >http://localhost:7400/set1/ > >http://localhost:7400/set1/bad_local.htm > >http://localhost:7400/set1/script.html > >http://localhost:7400/set1/site%201.html > >http://localhost:7400/set1/site2.html > >http://localhost:7400/set1/site3.html > >http://localhost:7400/set1/site4.html > >http://localhost:7400/set1/sub%2520dir/ > >http://localhost:7400/set1/sub%2520dir/empty%20file.html > >http://localhost:7400/set1/title.html > >but got > > > >htpurge: Database is empty! > > > >FAIL: t_htdig > >PASS: t_htsearch > >PASS: t_htmerge > >PASS: t_htnet > >PASS: t_htdig_local > >PASS: t_factors > >PASS: t_fuzzy > >PASS: t_parsing > >PASS: t_templates > >PASS: t_validwords > >==================== > >1 of 19 tests failed > >*** Error code 1 > > > >Stop. > >*** Error code 1 > > > >Stop. > > > >> 4. Type > >> nm htword/.libs/libhtword.a | grep instance > > > > U _10WordDBInfo.instance > > U _11WordKeyInfo.instance > > U _11WordMonitor.instance > > U _14WordRecordInfo.instance > > U _8WordType.instance > > U _11WordKeyInfo.instance > > U _14WordRecordInfo.instance > > U _10WordDBInfo.instance > > U _11WordKeyInfo.instance > >0000039c D _10WordDBInfo.instance > > U _11WordKeyInfo.instance > > U _14WordRecordInfo.instance > > U _11WordKeyInfo.instance > >000008f8 D _11WordKeyInfo.instance > > U _10WordDBInfo.instance > > U _11WordKeyInfo.instance > > U _14WordRecordInfo.instance > >00000870 D _11WordMonitor.instance > >00000160 D _14WordRecordInfo.instance > > U _11WordKeyInfo.instance > > U _14WordRecordInfo.instance > >00000988 D _8WordType.instance > > > > > >> nm test/testnet.o | grep instance > > > >> 5. Type > >> cp /bin/true test/testnet > > > >It was actually cp /usr/bin/true test/testnet > > > >> make check > > > >PASS: t_wordkey > >PASS: t_wordlist > >PASS: t_wordskip > >PASS: t_wordbitstream > >PASS: t_search > >PASS: t_htdb > >PASS: t_rdonly > >PASS: t_trunc > >PASS: t_url > >running htdig: expected > >http://localhost:7400/set1/ > >http://localhost:7400/set1/bad_local.htm > >http://localhost:7400/set1/script.html > >http://localhost:7400/set1/site%201.html > >http://localhost:7400/set1/site2.html > >http://localhost:7400/set1/site3.html > >http://localhost:7400/set1/site4.html > >http://localhost:7400/set1/sub%2520dir/ > >http://localhost:7400/set1/sub%2520dir/empty%20file.html > >http://localhost:7400/set1/title.html > >but got > > > >FAIL: t_htdig > >PASS: t_htsearch > >PASS: t_htmerge > >Could not fetch URL > >FAIL: t_htnet > >PASS: t_htdig_local > >PASS: t_factors > >PASS: t_fuzzy > >PASS: t_parsing > >PASS: t_templates > >PASS: t_validwords > >==================== > >2 of 19 tests failed > >==================== > >gmake[1]: *** [check-TESTS] Error 1 > >gmake[1]: Leaving directory `/usr/src/WWW/htdig/htdig-3.2.0b6/test' > >gmake: *** [check-am] Error 2 > > > >> > Warnings from htdig: > >> > > >> > Warning: Configuration option heading_factor_1 is no longer supported > >> > Warning: Configuration option heading_factor_2 is no longer supported > >> > Warning: Configuration option heading_factor_3 is no longer supported > >> > Warning: Configuration option heading_factor_4 is no longer supported > >> > Warning: Configuration option heading_factor_5 is no longer supported > >> > Warning: Configuration option heading_factor_6 is no longer supported > >> > Warning: Configuration option modification_time_is_now is no longer > supported > >> > Warning: Configuration option pdf_parser is no longer supported > >> > Warning: Configuration option translate_amp is no longer supported > >> > Warning: Configuration option translate_lt_gt is no longer supported > >> > Warning: Configuration option translate_quot is no longer supported > >> > > >> > Huh? > >> > >> Because people were confused by pdf_parser no longer working in > >> ht://Dig, it now checks for old 3.1.x configuration attributes which > >> are in the htdig.conf file but not supported by ht://Dig 3.2.x > >> Are any of these options specified in your htdig.conf? If not, this > > >> is a bug... > > > >Thanks Lachlan; yes I had all those attributes from 3.1.x days left in my > >htdig.conf file. > > > >Regards, |
From: Lachlan A. <lh...@us...> - 2004-06-16 12:24:18
|
Greetings, Unfortuantely, it is not possible to dig through the javascript of a page. Sorry. Lachlan On Wed, 16 Jun 2004 11:17 am, wu_zeng wrote: > Hello, when I use htdig to index a bbs site,which can be visited by > web method, it does not work. When I read the source of the web > page, it has php code or javascript code. How can I solve it? -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: <SV...@gr...> - 2004-06-16 11:27:34
|
Hi, I got problems to compile this for Win2000, any1 did this? If so, can you please send it to me? tnx! Stoyan V. |
From: wu_zeng <wuz...@is...> - 2004-06-16 01:13:39
|
Hello, when I use htdig to index a bbs site,which can be visited by web method, it does not work. When I read the source of the web page, it has php code or javascript code. How can I solve it? Thank you with best regards! Zeng Wu |
From: Gabriele B. <g.b...@co...> - 2004-06-15 05:14:13
|
> http://www.tedmasterweb.com/htdig-3.2.0b6.dmg.zip (12.6 MB) I just put a link to your package on freshmeat. Thank you, -Gabriele |
From: Ted Stresen-R. <ted...@ma...> - 2004-06-14 22:20:35
|
Hi, Good job to everyone for getting this out (htdig 3.2.0b6). I'm looking forward to using it. I rebuilt the 3.2.0b6 binary package for Mac OS X (10.2 - Jaguar or higher) and posted it to my web site. Only one person tried out the package (and it worked fine) previously so I don't really know if it's worth publishing on a wide scale, but if you want, feel free to dump it into the binaries directory. You can pick it up here: http://www.tedmasterweb.com/htdig-3.2.0b6.dmg.zip (12.6 MB) This binary includes a pre-configured version of htdig including an index for htdig.org (the databases). After the installer is finished running, an AppleScript launches your browser and points you to the search page so you can see immediately whether it's working or not. Please let me know if you test the package and if you experience any problems with it. Sincerely, Ted Stresen-Reuter PS: the package built just fine on OS X (10.3.4) and if someone knows how to decrease the size of this download (by disabling dynamic libraries, for example), let me know. |
From: Christopher M. <chr...@mc...> - 2004-06-14 16:58:11
|
On Mon, 2004-06-14 at 06:29, Gabriele Bartolini wrote: > The ht://Dig group is very happy to announce the release of ht://Dig > version 3.2.0b6. That's great. Thanks for that. <shameless self promotion> We went live with our new search tool built around htDig and Postgres last night. From what I can tell, it is working beautifully. You can see the main interface to it from here: http://www.mcgill.ca/search/ but you get a more interesting view if you come from a page withing www.mcgill.ca. For example, if you go to: http://www.mcgill.ca/eflc/ and put the word 'english' in the search box (top right corner of the page) and click 'find' you'll see what I mean. </shameless self promotion> So, thanks very much to the htDig team for making this possible. Your work is very much appreciated and we couldn't have built this tool without your help. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: <Moh...@co...> - 2004-06-14 13:54:37
|
Thank you Robert for your rapid response, please send me the window's environment installation documentation. Mohamed. |---------+----------------------------> | | Robert Ribnitz | | | <ribnitz@linuxbou| | | rg.ch> | | | | | | 06/12/2004 04:36 | | | PM | |---------+----------------------------> >--------------------------------------------------------------------------------------------------------------| | | | To: htd...@li..., Moh...@co... | | cc: | | Subject: Re: htdig-dev digest, Vol 1 #900 - 1 msg | >--------------------------------------------------------------------------------------------------------------| >Message: 1 >To: htd...@li... >From: Moh...@co... >Date: Fri, 11 Jun 2004 11:13:34 -0500 >Subject: [htdig-dev] ISS box > > > > > >HI, >Does Htdig works on an IIS box with windows environment? >Please advice me on this matter. >Thank you in advance. >Mohamed. > > Hello Mohamed, htdig uses the http protocol to access files. Fot this reason it will work on any server that can serve http requests, also Microsoft Internet Information Server. If I am informed correctly there's a port of htdig to the Win32 platform, please see the website yours Robert Ribnitz ht://Dig debian maintainer |
From: Gabriele B. <g.b...@co...> - 2004-06-14 10:29:53
|
The ht://Dig group is very happy to announce the release of ht://Dig version 3.2.0b6. It fixes several bugs from 3.2.0b5, and runs somewhat faster, although still much slower than 3.1.6 (no significant speed improvements are expected in the near future, although we are working on it). Calling this release a "beta" simply means that exhaustive testing, especially on non-Linux platforms, is not yet complete. However, we consider it stable enough for most production use. Reports of bugs and performance problems are quite welcome. Please try to provide as much information as possible regarding OS, configuration, hardware used, etc. Feedback should be sent to the htdig-dev mailing list at htd...@li... . To download 3.2.0b6, see <http://www.htdig.org/where.html>http://www.htdig.org/where.html For the upgrade guide, see <http://www.htdig.org/dev/htdig-3.2/upgrade.html>http://www.htdig.org/dev/htdig-3.2/upgrade.html For the Release notes, see <http://www.htdig.org/dev/htdig-3.2/RELEASE.html>http://www.htdig.org/dev/htdig-3.2/RELEASE.html For the ChangeLog, see <http://www.htdig.org/dev/htdig-3.2/ChangeLog>http://www.htdig.org/dev/htdig-3.2/ChangeLog Thanks to the many people who contributed to this release in the form of code, feedback and bug reports! -- the ht://Dig Group Release notes for htdig-3.2.0b6 14 Jun 2004 Bug fixes: * Correctly handle empty disallow entries in robots.txt * No longer compile regular expressions for every URL (improve performances) * Allow compressed databases on Cygwin * Fixed bugs in phrase searching * Improved parsing of the configuration file * bin/rundig -a handles multiple database directories * Ellipsis displayed correctly by htsearch * Allow '-' argument to '-m' ('minimal') runtime option to htdig * Check validity of first URL from each server * No longer ignore empty configuration attributes * fixed bug in handling 'http_proxy', 'http_proxy_authorization', 'authorization attributes' * remove stale md5_db if '-i' specified * Make 'server_alias' case insensitive * fixed bugs with zlib * Allow € HTML entity * fixed other minor bugs New features: * added allow_space_in_url attribute: if set to true, htdig will handle URLs that contain embedded spaces * added store_phrases attribute: if it is false, htdig only stores the first occurrence of each word in a document * added an improved version of RTF2HTML into the contrib section * added OpenOffice.org support to doc2html in contrib section * improved date factor formula * improved tests * improved documentation * added man pages |