You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(221) |
Nov
(357) |
Dec
(268) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(348) |
Feb
(246) |
Mar
(299) |
Apr
(230) |
May
(205) |
Jun
(209) |
Jul
(411) |
Aug
(350) |
Sep
(282) |
Oct
(248) |
Nov
(334) |
Dec
(106) |
2003 |
Jan
(230) |
Feb
(221) |
Mar
(123) |
Apr
(99) |
May
(127) |
Jun
(152) |
Jul
(128) |
Aug
(103) |
Sep
(71) |
Oct
(97) |
Nov
(105) |
Dec
(75) |
2004 |
Jan
(85) |
Feb
(79) |
Mar
(154) |
Apr
(241) |
May
(68) |
Jun
(108) |
Jul
(70) |
Aug
(91) |
Sep
(101) |
Oct
(64) |
Nov
(67) |
Dec
(87) |
2005 |
Jan
(46) |
Feb
(82) |
Mar
(81) |
Apr
(59) |
May
(37) |
Jun
(45) |
Jul
(49) |
Aug
(61) |
Sep
(26) |
Oct
(20) |
Nov
(25) |
Dec
(20) |
2006 |
Jan
(16) |
Feb
(17) |
Mar
(45) |
Apr
(34) |
May
(14) |
Jun
(17) |
Jul
(5) |
Aug
(22) |
Sep
(24) |
Oct
(5) |
Nov
(44) |
Dec
(18) |
2007 |
Jan
(15) |
Feb
(13) |
Mar
(21) |
Apr
(25) |
May
(15) |
Jun
(21) |
Jul
(9) |
Aug
(1) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(11) |
2008 |
Jan
(10) |
Feb
(15) |
Mar
(3) |
Apr
|
May
(4) |
Jun
(4) |
Jul
(26) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
(3) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Andy A. <and...@ya...> - 2017-11-08 00:52:26
|
Hi I recently mirrored the htdig repo on GitHub. The address is https://github.com/andy5995/htdig I plan to promote it, and work on it from time-to-time. I'm not an expert coder, so I have no expectations that I'll be able to develop htdig as well as the original developers, or that I'll be able to make any significant improvements. For the moment, my plan simply is to gradually build up a community through GitHub and a Slack workspace. The more members who join, the better idea we'll have about the level of support for the continued development of htdig. To create the possibility and opportunity for development to continue, I'll review pull requests in a timely manner, create tickets, and make minor patches myself. I don't feel a need to manage or control the project for an indefinite period of time, and would happily hand it over to a team of three or four capable devs who could make sure that htdig remains a highly reputable and stable open source project. If you are interested in passively monitoring the project, there is a link to join the Slack workspace (chat) on the revised README.md file. I also ask that you submit any known bugs on the GitHub issue tracker. The version I'm starting with is 3.2.0b6 (using a patch from https://packages.debian.org/stable/web/htdig), and some modifications from Robert Klein (https://github.com/roklein/htdig), who's repo is the one I mirrored. Thank you, -- -Andy |
From: Gary L. <fig...@gm...> - 2015-04-24 07:07:02
|
On my Ubuntu machine, htdig is installed. e.g. "which htdig" gives me, /usr/bin/htdig I want to install htdig under /var/www/my_web_site i.e. /var/www/my_web_site/htdig Extra info gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) GNU Make 4.0 For htdig-3.1.6: When I run "./configure", I got: configure: error: To compile ht://Dig, you will need a C++ library. Try installing libstdc++ "Run /sbin/ldconfig -p | grep stdc++" I have: libstdc++.so.6 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 libstdc++.so.6 (libc6) => /usr/lib/i386-linux-gnu/libstdc++.so.6 I also tried out htdig-3.2.0b6 I run "./configure", and it seems fine. I got something like "Now you must run 'make' followed by 'make install'" When I run "make", I got quite a few errors like: ..... Making all in htsearch make[1]: Entering directory '/var/www/test/testme/sounddesign/htdig-3.2.0b6/htsearch' g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/opt/www/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -DCONFIG_DIR=\"/opt/www/conf\" -I../htfuzzy -g -O2 -Wall -fno-rtti -fno-exceptions -c -o Display.o `test -f 'Display.cc' || echo './'`Display.cc In file included from Display.cc:30:0: Collection.h:39:10: error: extra qualification ‘Collection::’ on member ‘Open’ [-fpermissive] void Collection::Open(); .... .... .... Display.cc:830:32: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] if (input->exists("endyear")) ^ Any idea what I should do? |
From: jon g. <han...@ya...> - 2011-03-30 18:37:18
|
I have been using a dedicated server leased from a major provider for almost nine years. The server uses a FreeBSD OS and runs Apache server. We installed ht/Dig (Version 3.2.0b6) about 8 years ago and it has been working like a charm since then. We have done no maintenance on ht/Dig nor changed any configurations on it since then -- so we are a little (read--a lot!) rusty on this... Recently my provider decided that it was necessary to upgrade the OS to FreeBSD7.2 -- since then the htsearch function called from the web interface has been able to access the search data but ht/Dig has not been updating. ht/Dig was originally set up to update by calling rundig via a crontab twice a day. I have since reset the crontab to run rundig, but this did not seem to help. So,as a test,I ran rundig from the command line. This returned a strange error, "File /usr/local/etc/htdig/htdig.conf not found." When I checked, sure enough the htdig folder was there but the htdig.conf file wasn't. (There is a file called "htdig.conf.sample" in the folder.) The htdig.conf file is actually in the /usr/home/username/htdig/conf/ directory. So question #1 -- Where did the htdig.conf file in the etc/ directory go -- or why is rundig looking for it there? Since at this point this seemed like kind of an academic question and I really wanted to know if I could get an update going, I decided to run rundig from the command line using the -c option directing it the htdig.conf in the /home directory. This seemed to run the update but resulted in the following: mv: rename /home/username/htdig/db/root2word.db to /usr/local/share/htdig/common/root2word.db: Permission denied mv: rename /home/username/htdig/db/word2root.db to /usr/local/share/htdig/common/word2root.db: Permission denied mv: rename /home/username/htdig/db/synonyms.db to /usr/local/share/htdig/common/synonyms.db: Permission denied and resulted in wiping out the old htsearch data (..searches now result in an Internal Server Error) I checked the permissions on these directories using ls -l: /usr/local/share/htdig/common/ : drwxr-xr-x 2 root wheel 512 Mar 19 2009 common /home/username/htdig/db/ : drwxr-xr-x 2 putka users 1024 Mar 29 20:49 db Attempts to chmod these directories have failed with a "permission denied". Question #2: Is this just a directory permissions problem (i.e. did permissions and root access change when the OS was updated?) and if I can resolve this can I just continue to use ht/Dig calling rundig with the -c option -- OR is this all a waste of time and I need to reinstall, reconfigure, recompile... (which obviously I don't really want to do.) Many thanks for your time in reviewing this and for any expert input I can get. Jon |
From: John D. <joh...@gm...> - 2011-03-24 16:05:11
|
Hi, I mailed to the list yesterday but have not got any replies yet. I had a doubt if my mail was delivered as I have not got any mail from the group. So re-posting it again. Thank you again. I am trying to make it work on windows7 for PDF indexing. All the database files are being generated but I see the following issues, 1) The db.docsdb is generated with pdf id but not with TItle. 2) The excrepts(H) attribute is missing from the db.docs file 3) The db.worddump is generated with junk charecters. The db.docs and db.worddump files, I tried using the ones generated on linux which worked fine but not the db.docsdb and db.docs.index files. IS there a way of making the db files generated on linux work on windows? Please let me know what options I have? I tested running perl sccripts doc2html and pdf2html and they are parsing my pdf but only the local ones. They are not parsing when I pass the URL of the pdf. pdftotext and pdfinfo are working fine. Also, how can index the pdfs in my local system directory. I tried these options but it didn't work, start_url: http://localhost/pdf/ #local_urls: http://localhost/pdf/ = C:/cygwin/var/www/htdocs/pdf/ #local_urls_only: true Thanks for your help. John |
From: John D. <joh...@gm...> - 2011-03-24 03:22:17
|
Hi, I am trying to make it work on windows7 for PDF indexing. All the database files are being generated but I see the following issues, 1) The db.docsdb is generated with pdf id but not with TItle. 2) The excrepts(H) attribute is missing from the db.docs file 3) The db.worddump is generated with junk charecters. The db.docs and db.worddump files, I tried using the ones generated on linux which worked fine but not the db.docsdb and db.docs.index files. Please let me know what options I have? I tested running perl sccripts doc2html and pdf2html and they are parsing my pdf but only the local ones. They are not parsing when I pass the URL of the pdf. pdftotext and pdfinfo are working fine. Also, how can index the pdfs in my local system directory. I tried these options but it didn't work, start_url: http://localhost/pdf/ #local_urls: http://localhost/pdf/ = C:/cygwin/var/www/htdocs/pdf/ #local_urls_only: true Thanks for your help. John |
From: mark ---- <mar...@ho...> - 2011-02-24 16:56:16
|
Hi This is a shell script I wrote to start and stop htdig when it is running on a fedora or Centos or redhat linux machine. This was not an easy script to write htdig is a very noisy program to kill from a bash script which made it very difficult to kill . Which made it very difficult to get to work with the Fedora service script system. The issue solved were kill -9 pid returning "Killed" or "terminated". You can not trap a SIGINT signal ie -9 where as you can trap a SIGTERM signal from pkill Please include this script in future releases of htdig and install it into /etc/init.d from the make install process. Best wishes Mark Adam Tkac <atkac@redhat com> Here is the script #!/bin/sh # # htdig: 345 81 35 # description: Starts and stops the htdig search engine # Source function library. . /etc/rc.d/init.d/functions ### This library is called from # See how we were called. case "$1" in start) echo -n "Starting htdig search engine : " rundig=$(pgrep rundig) # find all the process if [ -z "$rundig" ] # check that another instants of rundig is not running before then # before running rundig rundig& # rundig as a background task. fi echo "OK" ;; stop) echo -n "Stopping htdig search engine : " # Notes # ===== # this checks to see is there is a pid process number for the process say "rundig" # if there is a process it kills it off # send annoying "killed" or "terminated messages to junk bin { comand } 2>/dev/null # then the final rundig=$(pgrep rundig) # find all the process htdig=$(pgrep htdig) htmerge=$(pgrep htmerge) htnotify=$(pgrep htnotify) htfuzzy=$(pgrep htfuzzy) # check the process has a PID job number and if it does kill it quitely if [ "$rundig" ] ;then trap " $(pkill rundig | 2>/dev/null )" SIGTERM ;fi if [ "$htdig" ]; then trap " $(pkill htdig | 2>/dev/null )" SIGTERM ;fi if [ "$htmerge" ]; then trap " $(pkill htmerge | 2>/dev/null )" SIGTERM ;fi if [ "$htnotify" ]; then trap " $(pkill htnotify | 2>/dev/null)" SIGTERM ;fi if [ "$htfuzzy" ]; then trap " $(pkill htfuzzy | 2>/dev/null)" SIGTERM ;fi echo "OK" ;; #status) # echo "" #;; restart) echo -n "Restarting htdig search engine : " echo " " echo -n "Stopping htdig search engine : " rundig=$(pgrep rundig) # find all the process htdig=$(pgrep htdig) htmerge=$(pgrep htmerge) htnotify=$(pgrep htnotify) htfuzzy=$(pgrep htfuzzy) # check the process has a PID job number and if it does kill it quitely if [ "$rundig" ] ;then trap " $(pkill rundig | 2>/dev/null )" SIGTERM ;fi if [ "$htdig" ]; then trap " $(pkill htdig | 2>/dev/null )" SIGTERM ;fi if [ "$htmerge" ]; then trap " $(pkill htmerge | 2>/dev/null )" SIGTERM ;fi if [ "$htnotify" ]; then trap " $(pkill htnotify | 2>/dev/null)" SIGTERM ;fi if [ "$htfuzzy" ]; then trap " $(pkill htfuzzy | 2>/dev/null)" SIGTERM ;fi echo "OK" echo -n "Starting htdig search engine : " rundig& echo "OK" echo -n " Restart complete done." ;; *) echo "Usage: htdig search engine {start|stop|restart}" exit 1 esac |
From: Klaus S. <sys...@dk...> - 2010-11-05 16:56:38
|
Hi guys! I seem to be missing a few things as I cannot find a complete description of the switches for the htsearch binary. I am trying to search a mailman archive with restricted, i.e. pwd protected archives. As far as I have seen from the URL that htsearch builds there are far more switches than documented in the man pages and I believe that i need the "restrict" switch. Thanks in advance and your's, firebug -- _________________________________________ Klaus Schinkinger System Administration Doktoratskolleg "Computational Mathematics" Johannes Kepler Universität Mail: kla...@dk... Tel.:0043/(0)732/2468-7176 Web: www.dk-compmath.jku.at Altenberger Straße 69 A-4040 Linz RoomNr.:HF234, Hochschulfonds Gebäude _________________________________________ |
From: Kevin K. <kr...@ce...> - 2010-10-14 18:25:22
|
I'm running Ubuntu 10.04 and htdig version 3.2.0b6-9.1 from aptitude. Everything is working great except my exclude-url list here is is exclude_urls: /n/ /~*/ /cgi-bin/ /documentation/ /images/ /public_html/ /photos/ the most important on is the /n/. our intranet allows a who slew of softlinks to be followed. I was this to be excluded. what am I missing here to get this to work? |
From: Sorrel, A. <and...@hp...> - 2010-08-27 17:13:29
|
Hi everyone, We are trying to make HTDig work for double byte languages (Japanese, Chinese etc). As per HTDigs' FAQs, HTDig does not support double byte languages. However, we found that piece code on the web (see from line 70). http://www.sfr-fresh.com/unix/www/htdig-3.2.0b6.tar.gz:a/htdig-3.2.0b6/htlib/regex.c 70 /* This is for multi byte string support. */ 71 #ifdef MBS_SUPPORT ... This is already in the code we have; we have enabled it but still no change. Has anyone some idea of what could be wrong? Thanks in advance for your help! Best Regards. Andre Sorrel and...@hp... |
From: Malcolm A. <mal...@ou...> - 2010-04-26 08:03:57
|
> A number of the files receive the following error: > Not found: http://pdx/pd5/documents/CompanyLicensing/COMPANY%20PERMANENT%20FILINGS/1888.pdf%0a Ref: > If I search the directories, the files are there sans the "%0a Odds-on the problem is that someone is generating links with a linebreak just inside the href i.e. ... href="URL " - and some code is escaping the line break into %0a Malcolm. |
From: Vorländer, M. <MV...@pd...> - 2010-04-23 17:34:26
|
Ken, have a look at the file mentioned in the "Ref" for a link pointing to the file not found. That link contains the %a. HTH, Martin Martin Vorländer Softwareentwicklung & VMS-Support PDV-Systeme GmbH Dörntener Straße 2 A 38644 Goslar Tel +49 (0) 5321 3703–33 Fax +49 (0) 5321 8924 E-Mail mv...@pd... www.pdv-systeme.de Geschäftsführer Dr. Dietmar Kipping, Amtsgericht Braunschweig HRB 110209 Unsere allgemeinen Geschäftsbedingungen finden Sie unter: www.pdv-systeme.de/unternehmen/agb.htm Hinweis: Diese E-Mail ist vertraulich. Wenn Sie nicht der vorgesehene Empfänger sind, verwenden Sie bitte keine Inhalte dieser E-Mail und leiten sie auch nicht weiter. Wenn Sie fälschlicherweise diese E-Mail bekommen haben, informieren Sie uns bitte umgehend und löschen dieses Dokument. Advice: This e-mail is confidential. If you are not the intended recipient, please do not disclose or use the contents of the mail. If you have erroneously received this e-mail, please inform us immediately by return e-mail and delete the document. ________________________________ Von: Ken Harris <KH...@md...> An: htd...@li... <htd...@li...> Gesendet: Tue Apr 20 15:58:15 2010 Betreff: [htdig] HT Dig issue with file list I have inherited a program that uses HTDig to build searchable databases out of PDF files. The process works fairly well, after some cleanup, but I have an error I cannot seem to figure out. A number of the files receive the following error: Not found: http://pdx/pd5/documents/CompanyLicensing/COMPANY%20PERMANENT%20FILINGS/1888.pdf%0a Ref: If I search the directories, the files are there sans the "%0a" - I have read all the documentation, searched the archives and googled the heck out of this but cannot find an answer. Any help would be appreciated and if you need more information let me know what is needed as I would like to get these files into the search database. Thanks Ken Harris Programmer / Analyst MIS/Maryland Insurance Administration 410-468-2311 KH...@md...<mailto:KH...@md...> ------------------------------------------------------------------------------------------------------------- The information contained in this e-mail, and attachment(s) thereto, is intended for use by the named addressee only, and may be confidential or legally privileged. If you have received this e-mail in error, please notify the sender immediately by reply e-mail or by telephone at the number listed above and permanently delete this e-mail message and any accompanying attachment(s). Please also be advised that any dissemination, retention, distribution, copying or unauthorized review of this communication is strictly prohibited. ------------------------------------------------------------------------------------------------------------- |
From: Ken H. <KH...@md...> - 2010-04-21 10:35:31
|
I sent this yesterday after I joined the list and got a notification back that I was not a member. Hoping that this is just a timing issue I am trying again this morning. I have inherited a program that uses HTDig to build searchable databases out of PDF files. The process works fairly well, after some cleanup, but I have an error I cannot seem to figure out. A number of the files receive the following error: Not found: http://pdx/pd5/documents/CompanyLicensing/COMPANY%20PERMANENT%20FILINGS/1888.pdf%0a Ref: If I search the directories, the files are there sans the "%0a" - I have read the FAQ and all the documentation I could find on the site, searched the archives and googled the heck out of this but cannot find an answer. Any help would be appreciated and if you need more information let me know what is needed as I would like to get these files into the search database. Thanks Ken Harris Programmer / Analyst MIS/Maryland Insurance Administration 410-468-2311 KH...@md... ------------------------------------------------------------------------------------------------------------- The information contained in this e-mail, and attachment(s) thereto, is intended for use by the named addressee only, and may be confidential or legally privileged. If you have received this e-mail in error, please notify the sender immediately by reply e-mail or by telephone at the number listed above and permanently delete this e-mail message and any accompanying attachment(s). Please also be advised that any dissemination, retention, distribution, copying or unauthorized review of this communication is strictly prohibited. ------------------------------------------------------------------------------------------------------------- |
From: Ken H. <KH...@md...> - 2010-04-20 14:19:01
|
I have inherited a program that uses HTDig to build searchable databases out of PDF files. The process works fairly well, after some cleanup, but I have an error I cannot seem to figure out. A number of the files receive the following error: Not found: http://pdx/pd5/documents/CompanyLicensing/COMPANY%20PERMANENT%20FILINGS/1888.pdf%0a Ref: If I search the directories, the files are there sans the "%0a" - I have read all the documentation, searched the archives and googled the heck out of this but cannot find an answer. Any help would be appreciated and if you need more information let me know what is needed as I would like to get these files into the search database. Thanks Ken Harris Programmer / Analyst MIS/Maryland Insurance Administration 410-468-2311 KH...@md... ------------------------------------------------------------------------------------------------------------- The information contained in this e-mail, and attachment(s) thereto, is intended for use by the named addressee only, and may be confidential or legally privileged. If you have received this e-mail in error, please notify the sender immediately by reply e-mail or by telephone at the number listed above and permanently delete this e-mail message and any accompanying attachment(s). Please also be advised that any dissemination, retention, distribution, copying or unauthorized review of this communication is strictly prohibited. ------------------------------------------------------------------------------------------------------------- |
From: Marco <br...@gm...> - 2009-11-30 12:37:43
|
No hints at all? On Fri, Nov 27, 2009 at 2:34 AM, Marco <br...@gm...> wrote: > Hi, htdig users!! > > I'm using ht://Dig 3.2.0b6 on an Ubuntu 8.10 server (i386) with a > recompiled htsearch binary (to set a default config file), indexing an > italian website. > > The problem is that, in my search result from htsearch, all the documents > (which are in the right number) have "[no title]" and no excerpt. > > Locally, everything works fine (with a recompiled htsearch package for > amd64 architecture, on a debian box). > > What could be the cause? > > thanks, > marco > |
From: Marco <br...@gm...> - 2009-11-27 02:02:12
|
Hi, htdig users!! I'm using ht://Dig 3.2.0b6 on an Ubuntu 8.10 server (i386) with a recompiled htsearch binary (to set a default config file), indexing an italian website. The problem is that, in my search result from htsearch, all the documents (which are in the right number) have "[no title]" and no excerpt. Locally, everything works fine (with a recompiled htsearch package for amd64 architecture, on a debian box). What could be the cause? thanks, marco |
From: Alfredo M. <alf...@ho...> - 2009-10-17 21:45:47
|
Hello, Can anybody guide or help me, or install for me Htdig in my site? I had it installed by Verio automatically and very easily by Verio's admin tools, but now now I have another site which does not have Htding as the search engine, but another one I don't like. I swear I have the read the instructions on how to install Htdig, but I was not able to do it. The instructions call for the execution of some comands, but it doesn't say where, and I don't know. Any help will be appreciated, thanks. Alfredo _________________________________________________________________ |
From: Ivan K. <ka...@ny...> - 2009-10-12 15:44:36
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rekha Ravi Pai <re...@so...> writes: > How do I index [.pdf files] so that they can be searched through > htsearch. On a Debian platform, these lines in htdig.conf were found to help: # The following was found by groping to parse pdf docs. max_doc_size: 5000000 debian_pdf_parser: xpdf external_parsers: application/msword /usr/share/htdig/parse_doc.pl \ application/postscript /usr/share/htdig/parse_doc.pl \ application/pdf /usr/share/htdig/parse_doc.pl Finger for key -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8+ <http://mailcrypt.sourceforge.net/> iD8DBQFK0ws+YpflG4Qs+dMRAtCVAJ9swfsnSZzq3P27uxTycal5rpxUvACeJZ3n +W6/1GmjFKvelnr5u17nTsw= =87XG -----END PGP SIGNATURE----- |
From: Nomen N. <no...@di...> - 2009-10-09 10:58:11
|
Rekha Ravi Pai <re...@so...> writes: > How do I index [.pdf files] so that they can be searched through > htsearch. On a Debian platform, these lines in htdig.conf were found to help: # The following was found by groping to parse pdf docs. max_doc_size: 5000000 debian_pdf_parser: xpdf external_parsers: application/msword /usr/share/htdig/parse_doc.pl \ application/postscript /usr/share/htdig/parse_doc.pl \ application/pdf /usr/share/htdig/parse_doc.pl |
From: Rekha R. P. <re...@so...> - 2009-10-06 04:31:10
|
Hi, I have downloaded htdig-3.2.0b6. I have configured, installed and run the rundig successfully. In my start_url's index.html, I have included all .js files which have links to other html and pdf files. As rundig indexes only html links, I am unable to get these files indexed. How do I index them so that they can be searched through htsearch. Regards, Rekha. -- -------------------------------------------------------------------------------- Rekha Pai Senior Software Consultant SoftJin Technologies Pvt. Ltd. #102, Mobius Tower, SJR I-Park, EPIP, Whitefield, Bangalore 560066 Phone: +91-80-41779999 Fax: +91-80-41157070 Business Disclaimer ____________________________________________________________ This e-mail message and any files transmitted with it are intended solely for the use of the individual or entity to which they are addressed. It may contain confidential, proprietary or legally privileged information. If you are not the intended recipient please be advised that you have received this message in error and any use is strictly prohibited. Please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender by return mail. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. ___________________________________________________________ |
From: G. T. Stresen-R. <ted...@gm...> - 2009-09-15 08:22:30
|
Just wondering what the status was of this project. I'm in need of a new indexing and search system. I've been looking for alternatives as easy to install and setup as htdig but have been unable to find any. Nutch seems like a good candidate, but requires a Java environment and I don't seem to have any Debian packages for it available from my repositories. Any help is greatly appreciated. Ted Stresen-Reuter http://tedmasterweb.com |
From: jason c. <jca...@ne...> - 2009-07-02 15:19:55
|
Hello, Ran rundig overnight; had this message is the morning WordDB: /var/lib/htdig/db.words.db: file size not a multiple of the pagesize WordDB: DB->cursor: method meaningless before open Running htpurge manually results in the same message. Looks like it shutdown at exactly 2gb's; created a test3gb file using dd to test... Trying to figure out where the 2gb file size limit is hiding. Using fedora 11 on a HP dl585g2; Linux cae2.netezza.com2.6.29.4-167.fc11.i686.PAE #1 SMP Wed May 27 17:28:22 EDT 2009 i686 athlon i386 GNU/Linux ext4 file system /dev/cciss/c0d0p3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw) /dev/cciss/c0d0p1 on /boot type ext3 (rw) /dev/cciss/c0d0p5 on /home type ext4 (rw) /dev/cciss/c0d0p2 on /usr type ext4 (rw) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) drwxrwxrwx. 3 root root 4.0K 2009-07-02 06:00 . drwxr-xr-x. 42 root root 4.0K 2009-07-01 13:55 .. -rwxrwxrwx. 1 root root 82M 2009-07-02 04:28 db.docdb -rwxrwxrwx. 1 root root 41M 2009-07-01 23:44 db.docs.index -rwxrwxrwx. 1 root root 432M 2009-07-01 23:44 db.excerpts -rwxrwxrwx. 1 root root 2.0G 2009-07-01 23:44 db.words.db -rwxrwxrwx. 1 root root 16K 2009-07-01 14:12 db.words.db_weakcmpr -rw-r--r--. 1 root root 3.0G 2009-07-02 06:06 filetest3gb drwxrwxrwx. 2 root root 4.0K 2009-07-01 13:59 scripts [root@cae2 htdig]# uname -a Thanks! -jc |
From: Neal R. <nri...@gm...> - 2009-04-23 16:42:30
|
Gabriele, Good to hear from you again! HtDig 4.0 is fully converted to CLucene and committed to the CVS tree below. RightNow Tech has been using it in production for 3+ years now and it's very stable and well tested. Unicode support etc. http://htdig.cvs.sourceforge.net/viewvc/htdig/htdig/?pathrev=htdig_4_0 HtDig as a project is pretty much a zombie. I think you would be better off expending energy within the Solr and Nutch communities. There is inherent risk in committing yourself to an open source project that in a zombie state. Even CLucene is in a bit of an unhealthy state as far as I know it's not keeping up with mainline Lucene and is relatively quiet on it's developer mailing list. Neal Richter On Fri, Apr 10, 2009 at 11:21 AM, Gabriele Bartolini <gab...@de...> wrote: > Dear list members, > > I am writing because I would love to get ht://Dig development start > again, for a total refactoring of the source code and a new version: > ht://Dig 4.0. A long time has passed since last 3.2 beta release, and a > lot of technologies advances have come in the meanwhile. My feeling is > that most of the code is now outdated. > > My company is interested in making available analysts and programmers > for this project. The reason I am writing is to propose a call for > sponsorship for version 4.0. If I get positive feedback, I'd be very > pleased to prepare a project outline. > > If you are interested in sponsoring, please either reply to this > message on the list or, if you prefer to keep it confidential, write to > me directly. Please write back also if you would like to directly > contribute to coding the project. > > In any case, if we get enough interest, it is our intention to give > high priority to the project (a desirable deadline is to have version > alpha out in one year). > > Thank you very much. > > Ciao, > Gabriele > > -- > Gabriele Bartolini: Data miner at Devise.IT > gab...@de... | www.devise.it | www.gamera-wet.com > > "Lasciate ogne speranza, voi ch'intrate" > Dante Alighieri, Divina Commedia, Inferno > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > |
From: Gabriele B. <gab...@de...> - 2009-04-10 17:51:36
|
Dear list members, I am writing because I would love to get ht://Dig development start again, for a total refactoring of the source code and a new version: ht://Dig 4.0. A long time has passed since last 3.2 beta release, and a lot of technologies advances have come in the meanwhile. My feeling is that most of the code is now outdated. My company is interested in making available analysts and programmers for this project. The reason I am writing is to propose a call for sponsorship for version 4.0. If I get positive feedback, I'd be very pleased to prepare a project outline. If you are interested in sponsoring, please either reply to this message on the list or, if you prefer to keep it confidential, write to me directly. Please write back also if you would like to directly contribute to coding the project. In any case, if we get enough interest, it is our intention to give high priority to the project (a desirable deadline is to have version alpha out in one year). Thank you very much. Ciao, Gabriele -- Gabriele Bartolini: Data miner at Devise.IT gab...@de... | www.devise.it | www.gamera-wet.com "Lasciate ogne speranza, voi ch'intrate" Dante Alighieri, Divina Commedia, Inferno |
From: Andreas J. <and...@ru...> - 2009-03-13 19:27:55
|
On Sun, Mar 08, 2009 at 04:57:39PM +0100, Bernd Heim wrote: > Hi, Hi, > I'm running a Debian Server with following locales > > LANG=de_DE.UTF-8 [...] > and > > ht://Dig 3.2.0b6 > > My htdig.conf begins with > > locale: de_DE > locale: de_DE.UTF-8 > > htdig runs fine and finds all words without german 'Umlaute' (äüö...). > But if I try to find any word with a german 'Umlaut' I get no result! > > Isn't htdig supporting utf8?? No htdig doesn't support utf8. But you may find ftp://ftp.ccsf.org/htdig-patches/3.2.0b6/UTF8.patch.0 useful (but you have to rebuild htdig). Regards, Andreas -- ! Andreas Jobs Network Operation Center ! ! Ruhr-Universitaet Bochum ! ! The only way to clean a compromised system is to flatten and rebuild. ! |
From: Bernd H. <com...@fr...> - 2009-03-08 16:24:39
|
Hi, I'm running a Debian Server with following locales LANG=de_DE.UTF-8 LANGUAGE=de_DE:en_US:de_LU:de_CH:de_BE:de_AT LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL= and ht://Dig 3.2.0b6 My htdig.conf begins with locale: de_DE locale: de_DE.UTF-8 htdig runs fine and finds all words without german 'Umlaute' (äüö...). But if I try to find any word with a german 'Umlaut' I get no result! Isn't htdig supporting utf8?? Best regards Bernd Heim |