You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Neal R. <ne...@ri...> - 2006-05-25 17:54:04
|
Hey all, Anthony Arnone and I are nearing the completion of the first version of htDig 4.0. At the moment we're fixing up the native Win32 build of it. The next few steps are as follows. 1) Polish new htdig.exe 2) Final code/directory cleanup (perhaps a reason to move htdig to subversion??) 3) Polish build process 4) Formal QA process When #4 is begun we'll formally announce the new version as beta and update the htdig website. We'll also be tracking & including the current changes to CLucene 0.9 until we reach a point of pain to be determined later. After that we'll lock the version of CLucene used in htDig 4.0. The tarball used to build HtDig 4.0 will include versions of CLucene, HTML Tidy and Berkeley DB (already included in htDig 3.2). ------- I would like some feedback on htsearch.exe. I actually think we should cease providing a compiled cgi for searching. We should ship a polished PHP script for searching and possibly Perl & Python scripts as well. There are various bindings for CLucene, so all of these can be made into htdig bindings by adding a bit of glue code that looks at the htdig.conf file etc. Why? Nearly every system this will be installed on will have PHP, Perl or Python. Any of these 3 scripts would allow much better customization of the output by users. Continuing to maintain cgi code in C/C++ is not very attractive when these scripting languages do all the work for you. What do you think? Any strong objections? Note that one of the most used contributions to htDig 3.1.6 & 3.2 is a PHP script which calls the htsearch cgi/exe and parses the output to allow customization... Thanks -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2006-05-25 17:38:37
|
OK. Let's do it. Sourceforge is also bothering us about disk usage... Arg. On Fri, 12 May 2006, G. T. Stresen-Reuter wrote: > On May 12, 2006, at 1:49 AM, Geoffrey Hutchison wrote: > >> >> On May 11, 2006, at 8:33 PM, Neal Richter wrote: >> >>> FYI: >> >> For what it's worth, I'd suggest any ht://Dig development move wholesale to >> the Subversion service from SourceForge. There's very little pain -- mostly >> convincing your fingers to type svn instead of cvs. (Or setting cvs to be >> an alias for svn in your shell.) >> >> There are quite a few advantages beyond avoiding the current CVS pains at >> SourceForge. For example, finally being able to move files and directories >> without killing the repository history... >> > I agree. I find Subversion to be an improvement over CVS. > > Ted Stresen-Reuter > > -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: ariel m. <ari...@ya...> - 2006-05-22 15:21:57
|
Hello, Please, i have the following problem compiling in AIX: > gcc -v Reading specs from /usr/local/lib/gcc-lib/powerpc-ibm-aix5.2.0.0/3.3.2/specs Configured with: ../gcc-3.3.2/configure : (reconfigured) ../gcc-3.3.2/configure --disable-nls : (reconfigured) ../gcc-3.3.2/configure --disable-nls Thread model: aix gcc version 3.3.2 (the version htdig 3.1.6 work great) >./configure >make (...) Making all in htlib /bin/sh ../libtool --mode=compile g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/opt/www/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -I/home/webusr/include -g -O2 -Wall -fno-rtti -fno-exceptions -c -o Configuration.lo `test -f 'Configuration.cc' || echo './'`Configuration.cc g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/opt/www/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -I/home/webusr/include -g -O2 -Wall -fno-rtti -fno-exceptions -c Configuration.cc -o Configuration.o In file included from /usr/local/include/c++/3.3.2/backward/iostream.h:31, from htString.h:28, from Dictionary.h:22, from Configuration.h:107, from Configuration.cc:24: /usr/local/include/c++/3.3.2/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header. Please consider using one of the 32 headers found in section 17.4.1.2 of the C++ standard. Examples include substituting the <X> header for the <X.h> header for C++ includes, or <sstream> instead of the deprecated header <strstream.h>. To disable this warning use -Wno-deprecated. In file included from /usr/local/include/c++/3.3.2/bits/stl_algobase.h:67, from /usr/local/include/c++/3.3.2/memory:54, from /usr/local/include/c++/3.3.2/string:48, from /usr/local/include/c++/3.3.2/bits/locale_classes.h:47, from /usr/local/include/c++/3.3.2/bits/ios_base.h:47, from /usr/local/include/c++/3.3.2/ios:49, from /usr/local/include/c++/3.3.2/ostream:45, from /usr/local/include/c++/3.3.2/iostream:45, from /usr/local/include/c++/3.3.2/backward/iostream.h:32, from htString.h:28, from Dictionary.h:22, from Configuration.h:107, from Configuration.cc:24: /usr/local/include/c++/3.3.2/cstdlib:103: error: `malloc' not declared /usr/local/include/c++/3.3.2/cstdlib:109: error: `realloc' not declared make: The error code from the last command is 1. Stop. make: The error code from the last command is 1. Stop. some suggestions? thanks in advance.. hugs __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: G. T. Stresen-R. <ted...@ma...> - 2006-05-12 06:42:08
|
On May 12, 2006, at 1:49 AM, Geoffrey Hutchison wrote: > > On May 11, 2006, at 8:33 PM, Neal Richter wrote: > >> FYI: > > For what it's worth, I'd suggest any ht://Dig development move > wholesale to the Subversion service from SourceForge. There's very > little pain -- mostly convincing your fingers to type svn instead of > cvs. (Or setting cvs to be an alias for svn in your shell.) > > There are quite a few advantages beyond avoiding the current CVS pains > at SourceForge. For example, finally being able to move files and > directories without killing the repository history... > I agree. I find Subversion to be an improvement over CVS. Ted Stresen-Reuter |
From: Arnone, A. <aa...@ri...> - 2006-05-12 01:02:53
|
I second the motion...=20 Here's what SourceForge says about converting:=20 http://sourceforge.net/docman/display_doc.php?docid=3D31070&group_id=3D1#= imp ort Looks relatively painless, but it'll take an admin to do it. Anthony -----Original Message----- From: htd...@li... [mailto:htd...@li...] On Behalf Of Geoffrey Hutchison Sent: Thursday, May 11, 2006 6:50 PM To: Richter, Neal Cc: htd...@li... Subject: Re: [htdig-dev] SUBJECT: SourceForge.net: CVS service offering changes (fwd) On May 11, 2006, at 8:33 PM, Neal Richter wrote: > FYI: For what it's worth, I'd suggest any ht://Dig development move =20 wholesale to the Subversion service from SourceForge. There's very =20 little pain -- mostly convincing your fingers to type svn instead of =20 cvs. (Or setting cvs to be an alias for svn in your shell.) There are quite a few advantages beyond avoiding the current CVS =20 pains at SourceForge. For example, finally being able to move files =20 and directories without killing the repository history... Cheers, -Geoff ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D= 121642 _______________________________________________ ht://Dig Developer mailing list: htd...@li... List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev |
From: Geoffrey H. <ge...@ge...> - 2006-05-12 00:50:22
|
On May 11, 2006, at 8:33 PM, Neal Richter wrote: > FYI: For what it's worth, I'd suggest any ht://Dig development move wholesale to the Subversion service from SourceForge. There's very little pain -- mostly convincing your fingers to type svn instead of cvs. (Or setting cvs to be an alias for svn in your shell.) There are quite a few advantages beyond avoiding the current CVS pains at SourceForge. For example, finally being able to move files and directories without killing the repository history... Cheers, -Geoff |
From: Neal R. <ne...@ri...> - 2006-05-12 00:34:53
|
FYI: ---------- Forwarded message ---------- Date: Thu, 11 May 2006 17:22:33 -0700 (PDT) From: SourceForge.net Team <no...@so...> To: ne...@ri... Subject: SUBJECT: SourceForge.net: CVS service offering changes Greetings, You are receiving this mail because you are a project admin for a SourceForge.net-hosted project. One of our primary services, CVS, suffered a series of interrelated, critical hardware failures in recent weeks. We understand how frustrating this CVS outage must be to you and your users; however, our top priority remains preservation of the integrity of your data. The series of CVS hardware failures prompted us to expedite the deployment of planed improvements to our CVS infrastructure, drawing upon much of the knowledge that we gained from our Subversion deployment. Our improved CVS service architecture, which we plan to deploy tomorrow afternoon (2006-05-12), will offer greater performance and stability and will eliminate several single points of failure. The Site Status page (https://www.sf.net/docs/A04) will be updated as soon as the new infrastructure is rolled out. In the interim, please read the important information provided below to learn about how these changes will affect your project. Summary of changes, effective 2006-05-12: 1. Hostname for CVS service Old: cvs.sourceforge.net New: PROJECT_UNIX_NAME.cvs.sourceforge.net This change will require new working copies to be checked out of all repositories (so control files in the working copy will point to the right place). We will be updating the instructions we supply, but instructions that your team has written within documentation, etc. will need to be updated. cvs -d:pserver:ano...@cv...:/cvsroot/gaim co gaim would be changed to cvs -d:pserver:ano...@ga...:/cvsroot/gaim co gaim 2. ViewCVS We are moving from ViewCVS to its successor, ViewVC. ViewVC is currently in use for our Subversion service. 3. Sync delay Old: CVS pserver, tarballs and ViewCVS provided against a separate server which is a minimum of three hours behind developer CVS. New: ViewVC will be provided against developer CVS (it will be current). CVS pserver will be provided against a secondary server (not developer server) with a maximum expected delay of two hours. Follow-up work is planned (this infrastructure takes us 80% of the way) to essentially eliminate the sync delay. 4. Read-only rsync service As a new service offering, we are now providing read-only rsync access against developer CVS. This allows projects to efficiently make on-demand backups of their entire CVS repository. All projects should be making regular backups of their CVS repository contents using this service. 5. Nightly tarball service Nightly tarball service is being dropped in lieu of read-only rsync service. Projects which currently depend on nightly tarballs for repository backups will need to begin using rsync to make a backup copy of their repository contents. We see this as a major functional improvement. For a number of reasons, tarballs have fallen out of sync with the data in the repository at times in the past few years. Tarballs required a substantial amount of additional disk, and I/O to generate. The move to read-only rsync allows backups to be produced on-demand, with an update frequency chosen by the project. 6. Points of failure In the past, developer CVS service for all projects was provided from a single host. CVS pserver service was provided from individual backend heads based on a split of the data. Under our new design, developer CVS and most of our CVS-related services are provided from one of ten CVS hosts (count subject to increase with growth). Each host is independent, and makes a backup copy of the repository data of another host (which is used to provide the pserver CVS service). Failure of a single host will impact only the availability of data on that host. Since the data is split among a larger number of hosts, the size of data impacted by an individual host outage is substantially smaller, and the time required for us to restore service will be substantially shorter. This rapid architecture change has been made possible specifically using the research we performed for our recent launch of Subversion service. We've applied our best practices, produced a substantial amount of internal documentation, and kept an eye toward maintainability. This effort has allowed us to deploy this new architecture quickly once hardware was received, and will permit us to quickly scale this service horizontally as growth and demand requires. Many other minor improvements have also been made to improve the service offering and make it less trouble-prone. The most important of which are listed above. For a full description of the new service offering, and for information on how to use the services described above, please refer to the site documentation for the CVS service after the service has been launched: https://www.sf.net/docs/E04 Thank you, The SourceForge.net Team . |
From: Vision W. H. <vis...@ya...> - 2006-04-16 23:29:09
|
Dear Htdig Team, We are are web hosting company and we recently placed mirror on our server in Serbia&Montenegro. These are our mirror details: Organisation: Vision Web Hosting URL: http://www.visionwebhosting.net Country: Serbia&Montenegro Main Site: http://htdig.visionwebhosting.net Developer Site: http://htdig.visionwebhosting.net/dev/ Files available via HTTP Please let us know if you need anything else. Slobodan Cvetic Vision Web Hosting Inc. --------------------------------- How low will we go? Check out Yahoo! Messengers low PC-to-Phone call rates. |
From: SOHOMINT - K. K. <kr...@so...> - 2006-03-29 10:25:28
|
Hello ht://Dig Team, we're a German mirror-project (http://www.mirroarrr.de) and we have mirrored your project. Here are some informations: organisation: C-KN (http://www.c-kn.de/) country: Germany main site: http://www.htdig.mirroarrr.de/ developer site: http://www.htdig.mirroarrr.de/dev/index.html files(http): http://www.htdig.mirroarrr.de/files/ patch archive: - update: once a day connection: 100 MBit Our contact-mail-adress is ko...@mi... We've tested the mirror a for a few weeks and it's working fine. Please add us to your mirror list. Thank you. Kind regards Karsten Krienke -- Dipl.-Ing. Karsten Krienke Geschäftsführer fon: 040/ 41 00 428 - 12 mail: kr...@so... SOHOMINT - Kirsch Krienke Milz Nolte GbR. Mörkenstraße 7 22767 Hamburg fon: 040/ 41 00 428 - 40 fax: 040/ 41 00 428 - 49 mail: ko...@so... |
From: Wim K. <wi...@ib...> - 2006-03-14 09:35:52
|
Goodmorning, I have quite a weird problem with indexing about 8000 PDF's. The files are indexed through a local_urls= setting which works perfect (all files are found as local equivalent of the URL version) but all files are allways changed according to htdig. For indexing the PDF's I use an executable PHP script which uses in his turn pdfinfo / pdftotext (both version 3.xx) and queries a database to retrieve some additional meta info (like the correct title etc). All gathered info is rendered into HTML which is indexed by htdig. It also adds 3 meta items: "Last-Modified", "Date" and "DC.Date" to force the modification date. In conjunction with the use_doc_date it should be clear to htdig that the document was changed or not. I can't figure out why every day the PDF's are changed (and they're not) but I have the idea that htdig takes the filetime of the tmpfile as last-modified. Any clues? Regards, Wim -- Wim Kosten <wi...@ib...> ibuildings.nl BV - information technology http://www.ibuildings.nl - 0118 42 95 50 |
From: Ahmon D. <da...@fr...> - 2006-03-02 00:10:21
|
>> > #1: bug #1123810 seems like an important bug with an easy fix. What >> > are the chances of getting it in? I had a look at rundig in the CVS >> > repository and it looks like it's still not done. >> >> Please submit the fix. (diff against the trunk) Index: installdir/rundig =================================================================== RCS file: /cvsroot/htdig/htdig/installdir/rundig,v retrieving revision 1.9 diff -u -r1.9 rundig --- installdir/rundig 29 Dec 2003 08:49:05 -0000 1.9 +++ installdir/rundig 1 Mar 2006 23:44:16 -0000 @@ -30,7 +30,6 @@ done # If -a specified, note the database directory to move the temp files correctly -# TODO: Should also check for files relative to COMMONDIR. if [ -f "$conffile" ] then new_db_dir=`awk '/^[^#a-zA-Z]*database_dir/ { print $NF }' < $conffile` @@ -38,6 +37,11 @@ then DBDIR=$new_db_dir fi + new_dir=`awk '/^[^#a-zA-Z]*common_dir/ { print $NF }' < $conffile` + if [ "$new_dir" != "" ] + then + COMMONDIR=$new_dir + fi else echo "Config file $conffile cannot be found" exit 1 >> >> > #2: How come rundig calls htdig with the -i flag by default? Doing >> > so makes it impossible (as far as I can tell) to use rundig to do >> > incremental indexing. >> > >> >> Correct! I don't use rundig myself.. So... what do we do about it? I think rundig would be fine (w/ the COMMONDIR fix above) and the removal of the default -i flag. >> FYI: A couple of us are in the process of reworking HtDig's code code for >> HtDig 4.0. You can see the progress reports here >> >> http://htdig.blogspot.org & http://opensource.rightnow.com/htdig.php I checked it out. It looks promising! >> I think the current version in the trunk of CVS is more or less >> DEAD. Alright. I'm using the htdig rpm from Fedora Core 4. I can just submit patches to whoever is maintaining the package at Redhat. However, I do want to make sure that the patches are agreeable to you first. |
From: Neal R. <ne...@ri...> - 2006-03-01 23:24:36
|
On Mon, 27 Feb 2006, Ahmon Dancy wrote: > Hello htdig developers. > > Two questions/comments: > > #1: bug #1123810 seems like an important bug with an easy fix. What > are the chances of getting it in? I had a look at rundig in the CVS > repository and it looks like it's still not done. Please submit the fix. > #2: How come rundig calls htdig with the -i flag by default? Doing > so makes it impossible (as far as I can tell) to use rundig to do > incremental indexing. > Correct! I don't use rundig myself.. FYI: A couple of us are in the process of reworking HtDig's code code for HtDig 4.0. You can see the progress reports here http://htdig.blogspot.org & http://opensource.rightnow.com/htdig.php I think the current version in the trunk of CVS is more or less DEAD. Feel free to participate in HtDig 4.0, see the CVS branch for 4.0. At this point we could use help reworking the htsearch CGI. basically we need to first gut it so that it does little accept respond as a CGI should.. then we'll add the calls to the new search engine and display the results. A cool set of other-language bindings for search would be awesome as well.. along the lines of what has been done for CLucene. Thanks -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Ahmon D. <da...@fr...> - 2006-02-27 23:56:44
|
Hello htdig developers. Two questions/comments: #1: bug #1123810 seems like an important bug with an easy fix. What are the chances of getting it in? I had a look at rundig in the CVS repository and it looks like it's still not done. #2: How come rundig calls htdig with the -i flag by default? Doing so makes it impossible (as far as I can tell) to use rundig to do incremental indexing. |
From: engine o. <eng...@gm...> - 2006-02-19 13:17:26
|
Hi, We - QualiSpace are web hosting provider company in India. We have found Mirrors <http://www.htdig.org/mirrors.html> option available at your website. And we are interested in providing mirrors to your valuable projec= t. I would like to inform you that, I have contacted you before (in last year 2005) for the mirroring purpose. At that time we tried to set mirror for your site, but due to some problem we were unable to download some files. I= t was showing some errors while downloading. It will better for me, if you can guide or give some additional information= . Best Regards Webmaster http://www.qualispace.com |
From: Paracoda <ad...@pa...> - 2006-02-19 05:40:09
|
Greetings, We have installed a local mirror for htdig (main site and devel site) in Canada with the following specifications. - URL: htdig.paracoda.com - Speed: 100 mbps - Update: Daily - Location: Montreal, Quebec, Canada - Sponsor: www.paracoda.com - Contact: preferrably via www.paracoda.com but if necessary ad...@pa... Please list it as an official mirror. Thank you, Hossam Hossny Paracoda.com |
From: Ralf U. <ra...@re...> - 2006-02-09 14:42:19
|
I have setup two mirrors for the htdig project: Organisation: oss-mirror http://www.oss-mirror.org Name........: htdig.oss-mirror.org IP..........: 82.195.155.78 Download....: http://htdig.oss-mirror.org/files Patches.....: http://htdig.oss-mirror.org/ftp.ccsf.org/htdig-patches Location....: Dublin - Ireland Connection..: 100mbit Update......: daily Maintainer..: ad...@re... Organisation: Linux Mirror http://www.linux-mirror.org Name........: htdig.linux-mirror.org IP..........: 80.237.211.23 Download....: http://htdig.linux-mirror.org/files Patches.....: http://htdig.linux-mirror.org/ftp.ccsf.org/htdig-patches Location....: Cologne/Koeln - Germany Connection..: 100mbit Update......: daily Maintainer..: ad...@re... add the sites to your mirror list please. Both mirrors are also official mirros for: apache, openssh, webmin, opensource, python ...... Thx Ralf Uhlemann |
From: Ralf U. <ra...@re...> - 2006-01-30 09:07:05
|
Hi, I wanted to setup two new htdig mirrors but the first CVS login hangs: ----------------snip------------------ server /home/htdocs/htdig $ cvs -d:pserver:ano...@cv...:/cvsroot/htdig login Logging in to :pserver:ano...@cv...:2401/cvsroot/htdig CVS password: ----------------snap------------------- I did it in the way described on the web site: ----------------snip------------------- When asked for a password, leave it blank (i.e. press the enter key). You only need to do this once since cvs(1) will create a CVS password file .cvspass in your home directory that will be used in future invocations. ----------------snap------------------- Any idea ? Thx Ralf |
From: G. T. Stresen-R. <ted...@ma...> - 2006-01-03 13:31:49
|
Remember that htidg tries to find "best matches" so it is sometimes=20 difficult to tie keywords to specific pages, but if you have the=20 ability to modify the contents of the pages for which you are seeking a=20= match, then you can tweak the system quite a bit. In other words, I=20 don't think it's possible to do exactly what you are looking for, but=20 here is what I would do: Employ the htdig keyword meta tags (<meta name=3D"htdig-keywords"=20 content=3D"comma,separated,keyword,list" />) in the head of your=20 documents. Then, increase the value of the keywords_factor so that=20 matches including the radio button value will be ranked higher in the=20 search results. Where I work we do something similar. We have a firm directory and want=20= people to be able to find matches on employees' names (first or last,=20 or parts thereof). Thus, in the TITLE element we put their complete=20 name first. Then, in the htdig-keywords meta tag we put the same name,=20= but include any "aliases". Then, near the top of the document we=20 enclose the name in H1 tags. This then gives us three values we can=20 manipulate to improve search results: http://www.htdig.org/attrs.html#keywords_factor http://www.htdig.org/attrs.html#heading_factor http://www.htdig.org/attrs.html#title_factor We're satisfied by the results of this approach because it also finds=20 matches for this person in other parts of the intranet (like documents=20= in the "best practices" section that this person has authored). Hope this helps. Ted Stresen-Reuter (Hi Neal ;-) On Dec 30, 2005, at 12:00 PM, Eduardo B Domanski wrote: > Hi! > =A0 > My name is Eduardo, and I work in a company that uses htdig, but now,=20= > I'd like to add a 'functionality' to this. I'd like to add a=20 > radiogroup with 4 buttons, like: > =A0 > o Title > o Color > o Model > o All > =A0 > Is it possible to search=A0a specific word, filtering with these = radios? > =A0 > My idea is to add more metas, as in the following=A0example: > =A0 > <meta name=3D"title" content=3D"Title here"> > <meta name=3D"Color" content=3D"Color here"> > <meta name=3D"Model" content=3D"Model here"> > =A0 > Is there any other kind of possibility to search this way? > =A0 > Thanks for help, > =A0 > Eduardo Bortoleto Domanski= |
From: Eduardo B D. <edu...@pu...> - 2005-12-30 12:30:13
|
Hi! My name is Eduardo, and I work in a company that uses htdig, but now, = I'd like to add a 'functionality' to this. I'd like to add a radiogroup = with 4 buttons, like: o Title o Color o Model o All Is it possible to search a specific word, filtering with these radios? My idea is to add more metas, as in the following example: <meta name=3D"title" content=3D"Title here"> <meta name=3D"Color" content=3D"Color here"> <meta name=3D"Model" content=3D"Model here"> Is there any other kind of possibility to search this way? Thanks for help, Eduardo Bortoleto Domanski |
From: G. T. Stresen-R. <ted...@ma...> - 2005-12-19 11:47:08
|
Hi, I see that real progress has been made on 4.0. I'd like to try building it on Mac OS X. Is the branch available via Sourceforge CVS and if so, does it have a special branch name? Is it visible via the CVS browser? Ted |
From: Neal R. <ne...@ri...> - 2005-12-09 21:11:07
|
On Fri, 9 Dec 2005, Gustave Stresen-Reuter wrote: > Neal, > > I've been reading, with interest, the posts on the blog. I have a few > of questions so far. > > - Is htdig a competitor to Nutch? If not, could you take a few minutes > to clarify the differences between the two? No! They will be complementary. I believe that HtDig is much easier to manage and has more clear flexibility of configuration than Nutch. Nutch is a very powerfull in terms of it's scalability in both documents and simultanious searches. Nutch uses an apache/tomcat server to service requests. It can (and has) scaled to 200 millions documents. It's written in 100% Java as a full application built on Java Lucene. It's great. However I do believe that getting a tomcat server up and running as well as having a JVM and other associated infastructure is a bit beyond the capabilites of a lot of our users. It's not quite as simple as compiling and installing the binaries or installing a package. I may be underestimating our users, but I base this assesment on reading the htdig-general list. HtDig 4.0 will be easy to configure install and/or install via RPM or other package manager. It won't require a user to keep a server-daemon running. And it will continue to provide a massive variety of flexible configuration options. The addition of the CLucene library underneath will enable HtDig to achieve good scalability in documents. The way I see it HtDig 4.0 is for the classic use of a site-specific search engine for modestly sized websites that don't have tons of search hits per second. Nutch is for people who have large document sets and/or lots of search hits per unit time and need a multi-threaded server daemon to handle the load. FYI: Doug Cutting, the leader of Nutch & Lucene, was one of the original authors of Excite and has been doing IR for 15+ years. Doug's aims are much higher in terms of what Nutch is. > - What, if any, modifications to the ranking engine will be made in 4.0 > (saw the note about back-links and anchor texts - what about incoming > links from other domains)? > > - It seems the goal is to create a library that can be included in > other programs. Will the library include all the code for spidering, > creating the indexes, and searching or just the database creation > stuff, or something else...? HtDig is an application for users. We are architecting 4.0 in such a way so that it can be used as a library in other applications. For a while KDE used a wrapper for the htdig binaries to enable document searching. That was a big ugly hack. I'd like to be able to have something that anyone including other open source projects can use to spider/index and search documents. > - Are there any security considerations that should be addressed at > this early stage (sanitizing of URL parameters, for example) HtDig currently has a flexible AWK rule method for doing any URL manipulation you can think up. I hope to provide a quick wrapper config for that that will ouput an AWK rule to specificaly strip a URL parameter (it's already done in some PHP code I wrote). -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Arnone, A. <aa...@ri...> - 2005-12-09 18:23:05
|
I'll try to answer (or dodge) some these questions. - Is htdig a competitor to Nutch? If not, could you take a few minutes=20 to clarify the differences between the two? This is a good one for Neal to answer. I can tell you that I'm expecting the new ht://Dig to epitomize a fast, lightweight and scalable domain-specific search engine. Nutch, Omega and similar projects all have their strengths (again, maybe Neal can talk about that), but one of the big strengths of ht://Dig is the vast array of options and settings that are available to the user. While some of these are going away because they are no longer applicable, we are committed to keeping as many of the nice bells and whistles as we can. - What, if any, modifications to the ranking engine will be made in 4.0=20 (saw the note about back-links and anchor texts - what about incoming=20 links from other domains)? The ranking engine will be moved over to CLucene. Right now, the CLucene database contains anything we want (the API is highly extensible), and we're working on making things like backlink counts and link descriptions work efficiently. As for external domain links, that is really outside the scope of ht://Dig, since it is primarily a single-site (or small group of sites) crawler. - It seems the goal is to create a library that can be included in=20 other programs. Will the library include all the code for spidering,=20 creating the indexes, and searching or just the database creation=20 stuff, or something else...? Creating a library is exactly what we're shooting for. It will contain the ability to spider and push documents into a CLucene database. For searching, we essentially want to be able to stick any appropriate wrapper on top of ht://Dig and be able to do searches. I've written about this on the blog, but what I'd like to do is separate the htsearch options from the htdisplay options. Search options can be sent down to the library, and search results can be returned in some kind of XML format to the wrapper. The wrapper can do whatever it wants with the results as far as cgi and pretty print. Since we're still in beta (or alpha since I keep writing stupid bugs), we're using Luke to verify index creation and validity. Luke (http://www.getopt.org/luke/) is a toolbox designed to interact with Java Lucene indexes, but since CLucene follows the standard, we use it for our own purposes. - Are there any security considerations that should be addressed at=20 this early stage (sanitizing of URL parameters, for example) Uhh... Neal? Anyway, I'm planning on making a tag in CVS that everyone can download and try soon. There is a htdig_4_0 branch right now, but it is lacking certain parts - namely the CLucene back end. We're working on adding CLucene to the make scripts; right now we're doing builds the hard way. I hope this answered some of your questions, and I hope that Neal can step in and answer a few more. I've been bad about updating the blog on a regular basis, but hopefully I can get myself in gear and let everyone know the day-to-day progress. Feel free to leave comments on my posts, too. Anthony -----Original Message----- From: htd...@li... [mailto:htd...@li...] On Behalf Of Gustave Stresen-Reuter Sent: Friday, December 09, 2005 5:08 AM To: Richter, Neal Cc: htd...@li... Subject: Re: [htdig-dev] htdig 4.0 updates Neal, I've been reading, with interest, the posts on the blog. I have a few=20 of questions so far. - Is htdig a competitor to Nutch? If not, could you take a few minutes=20 to clarify the differences between the two? - What, if any, modifications to the ranking engine will be made in 4.0=20 (saw the note about back-links and anchor texts - what about incoming=20 links from other domains)? - It seems the goal is to create a library that can be included in=20 other programs. Will the library include all the code for spidering,=20 creating the indexes, and searching or just the database creation=20 stuff, or something else...? - Are there any security considerations that should be addressed at=20 this early stage (sanitizing of URL parameters, for example) I'm not a C developer, but I'm more than happy to try building the=20 project on Linux and Mac OS X (10.3). Is there a 4.0 branch in CVS or=20 will we have to wait for you to tag it? Thanks for the work. Gustave (Ted) Stresen-Reuter On Dec 8, 2005, at 6:05 PM, Neal Richter wrote: > Hey all, > > We've been making good progress on HtDig 4.0 > > You can see the progress updates on this blog. > > http://htdig.blogspot.com/ > > Thanks. > > --=20 > Neal Richter > Sr. Researcher and Machine Learning Lead > Software Development > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick _______________________________________________ ht://Dig Developer mailing list: htd...@li... List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev |
From: Gustave Stresen-R. <ted...@ma...> - 2005-12-09 12:08:40
|
Neal, I've been reading, with interest, the posts on the blog. I have a few of questions so far. - Is htdig a competitor to Nutch? If not, could you take a few minutes to clarify the differences between the two? - What, if any, modifications to the ranking engine will be made in 4.0 (saw the note about back-links and anchor texts - what about incoming links from other domains)? - It seems the goal is to create a library that can be included in other programs. Will the library include all the code for spidering, creating the indexes, and searching or just the database creation stuff, or something else...? - Are there any security considerations that should be addressed at this early stage (sanitizing of URL parameters, for example) I'm not a C developer, but I'm more than happy to try building the project on Linux and Mac OS X (10.3). Is there a 4.0 branch in CVS or will we have to wait for you to tag it? Thanks for the work. Gustave (Ted) Stresen-Reuter On Dec 8, 2005, at 6:05 PM, Neal Richter wrote: > Hey all, > > We've been making good progress on HtDig 4.0 > > You can see the progress updates on this blog. > > http://htdig.blogspot.com/ > > Thanks. > > -- > Neal Richter > Sr. Researcher and Machine Learning Lead > Software Development > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev |
From: Neal R. <ne...@ri...> - 2005-12-08 18:06:13
|
Hey all, We've been making good progress on HtDig 4.0 You can see the progress updates on this blog. http://htdig.blogspot.com/ Thanks. -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Pointer - I. a. S. S. <in...@po...> - 2005-11-04 18:29:56
|
Hello, We would like to become an official greek mirror site for www.htdig.org=20 Our servers are in Greece, at the datacenter of the Forthnet Hellas. We are already an official mirror site for php.net and iptables.org. Please inform us of the procedure of becoming a mirror site for you... Thanks in advance, George Psaltakis www.pointer.gr in...@po... |