You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Arnone, A. <aa...@ri...> - 2005-10-24 16:48:05
|
Very sorry about that. That was the old CLucene API. I've now rewritten the API to accept an STL hash. The offending line The document object will contain a structure that the CLucene API expects Has been changed to The document object will contain a STL hash that the CLucene API expects Anyway, the way this works is that the CLucene API will now take a generic hash of all the fields in a document (body, title, meta_desc, etc...) and fill its internal document with those fields. The hash looks something like this: std::map<std::basic_string<char>, std::pair<std::basic_string<char>, std::basic_string<char> > Those char's will eventually be replaced with wchar_t's. What this means to the API is this: <field_name, <field_contents, field_type> > field_name is the name of the field (body, title, etc..) field_contents is the actual words/data in the field field_type is the CLucene type, which can be one of four different types.=20 Take a look at DocumentRef::initialize() in htcommon/DocumentRef.cc to see what kind of fields are already defined, along with their types. Hope this clears it up somewhat, Anthony -----Original Message----- From: htd...@li... [mailto:htd...@li...] On Behalf Of Gustave Stresen-Reuter Sent: Saturday, October 22, 2005 11:46 AM To: htd...@li... Subject: [htdig-dev] what is a structure? Anthony and Neal: I'm reading the design document=20 (http://opensource.rightnow.com/htdig4_refactor_design.pdf) and I see=20 several references to custom "structures". My understanding of a=20 structure is that it is a labeled array (whose elements may also be=20 labeled arrays). Is that what you are referring to? If not, if you=20 could point me to what you are talking about, that would be great. Thanks in advance. Ted Stresen-Reuter ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ ht://Dig Developer mailing list: htd...@li... List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev |
From: <mc...@ci...> - 2005-10-23 14:32:44
|
In-Reply-To: <200...@sc...> "Hairong Li" <ha...@ms...> asked on Wed, 12 Oct 2005 15:58:59 -0400 > I'm trying to use the "wrapper.html" file. I created the file and also > modified the .conf file to enable the wrapper.html. But htdig still uses > the old header.html and footer.html. Any advice? Many thanks. I don't see a response to this. I am sorry to say that all that I can offer is that I gave up, having decided that using header.html and footer.html was less troubl for me than finding out how not to use them. |
From: Gustave Stresen-R. <te...@cl...> - 2005-10-22 17:46:38
|
Anthony and Neal: I'm reading the design document (http://opensource.rightnow.com/htdig4_refactor_design.pdf) and I see several references to custom "structures". My understanding of a structure is that it is a labeled array (whose elements may also be labeled arrays). Is that what you are referring to? If not, if you could point me to what you are talking about, that would be great. Thanks in advance. Ted Stresen-Reuter |
From: Jeff B. <bre...@gm...> - 2005-10-21 04:01:00
|
> > Neal, are you tracking the Java Lucene dev lists? There's > > some recent discussion with respect to index interoperability > > that may be relevant. > > Not yet... just the Clucene list. I'll have a look. Here's some starting points maybe worth half an eyeball: The UTF-8 interoperability thread http://www.mail-archive.com/jav...@lu.../msg01970.html Interoperability with Perl Lucene http://www.mail-archive.com/jav...@lu.../msg02187.html Features in the approaching Java Lucene 1.9 http://www.mail-archive.com/jav...@lu.../msg02284.html Debian & Kaffe, Redhat & GCJ http://www.mail-archive.com/jav...@lu.../msg02092.html > We have been able to verify that the Java Lucene tool 'luke' is able to > read and query the indexes produced by CLucene. Very cool. > > The names of the searchable-fields we are using at this point is likely > different than nutch. Might be worth a look to see how different. As of Nutch 0.7.1, the crawler + indexer is getting close. If it had an easy to configure equivalent to HtDig's "local_urls" and "<!--htdig_noindex-->" features I think it would probably be good enough. Running Java for these operations does not feel like such a big deal, and maybe there would be GCJ magic to ease the pain. The search portion is a different story and requiring Tomcat is kind of a pain in the butt. If some miracle occurred and htdig 4.0 and nutch were super-compatible, I could imagine wanting to use htsearch against a nutch built index. Dropping a search program into cgi-bin is really convenient. > If you look at the 4.0 cvs branch, we've devised a pretty cool method o= f > using an STL map container to hold the fieldname & fieldtext pairs with > index/noindex and store/nostore flags. These are filled per document > during htdig's parsing. > > It makes the htdig<->clucene interface very elegant. I'm a straight C guy, so STL is a little beyond me. But I like the sound of elegant and am tracking the blog. |
From: Neal R. <ne...@ri...> - 2005-10-20 21:21:51
|
> > After having looked at many commercial implementation of search engines > > over the past few years and following Nutch a bit.. I am still convinced > > that HtDig has plenty of legs. > > I know what you mean. Every time I look at Nutch I decide > to stick with htdig 3.1.6 a little longer. However, UTF-8 support > is getting super critical and some time in 2006 I'm going to have > to bite the bullet and do something. Exactly the impetus for the 4.0 development. I need Unicode in 2006 as well. > Neal, are you tracking the Java Lucene dev lists? There's > some recent discussion with respect to index interoperability > that may be relevant. Not yet... just the Clucene list. I'll have a look. We have been able to verify that the Java Lucene tool 'luke' is able to read and query the indexes produced by CLucene. Very cool. The names of the searchable-fields we are using at this point is likely different than nutch. Might be worth a look to see how different. If you look at the 4.0 cvs branch, we've devised a pretty cool method of using an STL map container to hold the fieldname & fieldtext pairs with index/noindex and store/nostore flags. These are filled per document during htdig's parsing. It makes the htdig<->clucene interface very elegant. Thanks -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2005-10-20 18:26:04
|
Great! I would say we need help getting the current htdig 4.0 devel branch in shape as things are progressing.. ie configure and makefiles. At this point we're building Clucene in a seperate directory and using a shell script to build the new version of htdig.exe I have some ideas on how we could organize things for better cross-platform development. Could you check out the current version of the 4.0 branch and read through the htdig directory and see where it leads you? We also are going to nearly abandon htsearch. We've been using a tool called 'luke' http://www.getopt.org/luke/ to query and examine the Clucene index. Anthony has been working on a document listing each htdig config verb and what action we'll need to take on it {delete, reimplement, modify-behavior} Thanks. On Thu, 20 Oct 2005, Christopher Murtagh wrote: > On Thu, 2005-10-20 at 10:24 -0600, Neal Richter wrote: > > I wanted to touch base with both of you to guage your interest in > > htdig 4.0 on Mac OS X. > > > > Ted: I know you've said you pitch in when you get time > > > > Chris: I still have the Mac you sent me. > > > > Would you be interested in having a discussion about what you have > > done with htdig + postgresql? I want to make sure we design the API > > appropriately for your type of usage (it's likely very similar to > > RightNow's potential usage) > > Yeah, actually this sounds interesting, not exclusively from the Mac > perspective, but as a more general discussion, yes. > > We (Eric Dorland, a colleague of mine and myself) would also like to > help out with development and take on a more active role. We've gotten > the ok to officially devote a 1/2 day per two weeks to htdig development > from our boss. We don't want to come in and take control or steer things > away from what you might have intended, but since we do benefit from > htdig quite a bit, we would like to contribute back and also, if this > would help get utf-8 support sooner, it would benefit us greatly. > > So, if there's stuff that needs doing, and you want to delegate, or a > cvs repo that we can check out and start looking at, please let us know. > > Cheers, > > Chris > > > > -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2005-10-20 16:00:34
|
An updated design doc is here: http://opensource.rightnow.com/htdig4_refactor_design.pdf Thanks On Tue, 18 Oct 2005, Neal Richter wrote: > > I've been lax in checking-in myself. > > Anthony Arnone and I have started work on HtDig 4.0 > > Here is a blog that Anthony has been keeping on Htdig 4.0 development. > > http://htdig.blogspot.com/ > > There is a new branch in CVS. > > http://cvs.sourceforge.net/viewcvs.py/htdig/htdig/?only_with_tag=htdig_4_0 > > This is an older design document.. I'll get an updated one put on the blog > ASAP. > http://opensource.rightnow.com/htdig4_refactor_design.pdf > > Basically the idea is to rip out the existing word-index and searching > code and replace it with CLucene while preserving as much of htdig > configurability as possible. The function of the spider will be nearly > unchanged. The db.doc.index will still exist, but that's the only thing > Berkeley DB will be used for. > > I've removed the hacked version of BDB in 4.0 CVS. > > What do we do about 3.2? My vote is to call it 'final', update the > website and move forward. I could do this, and have posted this thought > in the past.. no consensus emerged and I have no desire to be > heavy-handed. > > After having looked at many commercial implementation of search engines > over the past few years and following Nutch a bit.. I am still convinced > that HtDig has plenty of legs. > > 3.2 has become a road-block to progress. We know it has issues, and > various people have made valiant efforts to address them. From working > with the 'general' list some, plenty of users try moving to 3.2 then move > back to 3.1.6. > > On the other hand people, like Christopher Murtagh and myself have used it > as a cog in a larger application. > > My thought process for 4.0 is to get the htdig developers to concentrate > on building an application for web-servers rather than trying to do it all > and maintain the inverted index code... the Lucene community has already > cracked that nut. > > Maybe this will get development kick-started again, since it's 100% > obvious that we're all not interested in furthering the current 3.2 code > for whatever reason. > > Thanks. > > On Sat, 15 Oct 2005, Gustave Stresen-Reuter wrote: > > > It's been pretty quiet on the list lately. Is the party over? > > > > Ted > > -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Robert R. <ri...@li...> - 2005-10-19 11:55:30
|
Just to let you guys know. Robert |
From: Jeff B. <bre...@gm...> - 2005-10-19 06:13:07
|
> What do we do about 3.2? My vote is to call it 'final', update the > website and move forward. I could do this, and have posted this thought > in the past.. no consensus emerged and I have no desire to be > heavy-handed. +1 > After having looked at many commercial implementation of search engines > over the past few years and following Nutch a bit.. I am still convinced > that HtDig has plenty of legs. I know what you mean. Every time I look at Nutch I decide to stick with htdig 3.1.6 a little longer. However, UTF-8 support is getting super critical and some time in 2006 I'm going to have to bite the bullet and do something. > My thought process for 4.0 is to get the htdig developers to concentrate > on building an application for web-servers rather than trying to do it al= l > and maintain the inverted index code... the Lucene community has already > cracked that nut. Neal, are you tracking the Java Lucene dev lists? There's some recent discussion with respect to index interoperability that may be relevant. -Jeff |
From: Neal R. <ne...@ri...> - 2005-10-18 17:23:45
|
I've been lax in checking-in myself. Anthony Arnone and I have started work on HtDig 4.0 Here is a blog that Anthony has been keeping on Htdig 4.0 development. http://htdig.blogspot.com/ There is a new branch in CVS. http://cvs.sourceforge.net/viewcvs.py/htdig/htdig/?only_with_tag=htdig_4_0 This is an older design document.. I'll get an updated one put on the blog ASAP. http://opensource.rightnow.com/htdig4_refactor_design.pdf Basically the idea is to rip out the existing word-index and searching code and replace it with CLucene while preserving as much of htdig configurability as possible. The function of the spider will be nearly unchanged. The db.doc.index will still exist, but that's the only thing Berkeley DB will be used for. I've removed the hacked version of BDB in 4.0 CVS. What do we do about 3.2? My vote is to call it 'final', update the website and move forward. I could do this, and have posted this thought in the past.. no consensus emerged and I have no desire to be heavy-handed. After having looked at many commercial implementation of search engines over the past few years and following Nutch a bit.. I am still convinced that HtDig has plenty of legs. 3.2 has become a road-block to progress. We know it has issues, and various people have made valiant efforts to address them. From working with the 'general' list some, plenty of users try moving to 3.2 then move back to 3.1.6. On the other hand people, like Christopher Murtagh and myself have used it as a cog in a larger application. My thought process for 4.0 is to get the htdig developers to concentrate on building an application for web-servers rather than trying to do it all and maintain the inverted index code... the Lucene community has already cracked that nut. Maybe this will get development kick-started again, since it's 100% obvious that we're all not interested in furthering the current 3.2 code for whatever reason. Thanks. On Sat, 15 Oct 2005, Gustave Stresen-Reuter wrote: > It's been pretty quiet on the list lately. Is the party over? > > Ted -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Gustave Stresen-R. <ted...@ma...> - 2005-10-15 18:25:22
|
It's been pretty quiet on the list lately. Is the party over? Ted |
From: Hairong L. <ha...@ms...> - 2005-10-12 19:59:23
|
Hello, I'm trying to use the "wrapper.html" file. I created the file and also modified the .conf file to enable the wrapper.html. But htdig still uses the old header.html and footer.html. Any advice? Many thanks. Hairong Li |
From: Wessel S. <we...@jn...> - 2005-10-10 08:41:21
|
Hello, I've setup a new mirror of ht://Dig located in Amsterdam, The Netherlands and is hosted in a state of the art multi-homed datacentre just under Schiphol and has a 10Mbps connection. As a contact email you can use mi...@ne.... The website mirror is updated daily and can be found at http://htdig.nedmirror.nl/. With kind regards, Wessel Sandkuyl |
From: Hossam H. <ho...@tr...> - 2005-09-03 03:17:30
|
Please update our mirror listing (trexle) in dallas, texas as follows: Organization: www.trexle.com ==> www.trexle.net Main Site: htdig.trexle.com ==> htdig.trexle.net Dev Site: htdig.trexle.com/dev ==> htdig.trexle.net/dev Best Regards, Hossam Hossny |
From: Geoffrey H. <ge...@ge...> - 2005-08-16 15:26:50
|
Begin forwarded message: > From: Harald Koenig <H.K...@sc...> > Date: August 16, 2005 11:19:50 AM EDT > To: ghu...@ws... > Cc: Harald Koenig <H.K...@sc...> > Subject: htdig-3.2.0b6: getpeername_length_t > > > Hi Geoff, > > trying to compile htdig-3.2.0b6 on IRIX 6.5.13m I had two problems: > > 1) autoconf test for GETPEERNAME_LENGTH_T is broken for sock_t == void > > IRIX has the following definition for getpeername > > getpeername(int, void*, int*) > > and the "correct" test using gcc-3.3.3 fails like this > > configure:27494: g++ -Wa,-mips3 -c -g -O2 -Wall -fno-rtti - > fno-exceptions conftest.cc >&5 > conftest.cc:123: error: variable or field `s' declared void > configure:27500: $? = 1 > configure: failed program was: > ... > | #include <sys/types.h> > | #include <sys/socket.h> > | extern "C" int getpeername(int, void *, int *); > ==> | void s; int l; > | int > | main () > | { > | getpeername(0, &s, &l); > | ; > | return 0; > | } > > > a possible patch would be: > > ---------------------------------------------------------------------- > --------- > --- htdig-3.2.0b6/configure.in~ 2004-06-14 10:25:30.000000000 +0200 > +++ htdig-3.2.0b6/configure.in 2005-08-16 15:24:39.000000000 +0200 > @@ -209,7 +209,7 @@ > AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include <sys/types.h> > #include <sys/socket.h> > extern "C" int getpeername(int, $sock_t *, $getpeername_length_t *); > - $sock_t s; $getpeername_length_t l; ]], [[ getpeername(0, &s, > &l); ]])],[ac_found=yes ; break 2],[ac_found=no]) > + struct sockaddr s; $getpeername_length_t l; ]], [[ getpeername > (0, &s, &l); ]])],[ac_found=yes ; break 2],[ac_found=no]) > done > done > > ---------------------------------------------------------------------- > --------- > > > > 2) db/mutex.h isn't compatible with /usr/include/mutex.h > > error message: > > g++ -Wa,-mips3 -DHAVE_CONFIG_H -I. -I/soft/htdig/htdig-3.2.0b6/ > htdig-3.2.0b6/htlib -I../include -DDEFAULT_CONFIG_FILE=\"/usr/local/ > htdig/conf/htdig.conf\" -I/soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/ > include -I/soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/htlib -I/soft/ > htdig/htdig-3.2.0b6/htdig-3.2.0b6/htnet -I/soft/htdig/htdig-3.2.0b6/ > htdig-3.2.0b6/htcommon -I/soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/ > htword -I/soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/db -I../db -g -O2 - > Wall -fno-rtti -fno-exceptions -c /soft/htdig/htdig-3.2.0b6/ > htdig-3.2.0b6/htlib/Configuration.cc -DPIC -o .libs/Configuration.o > In file included from /usr/local/gcc-3.3.3/include/c++/3.3.3/bits/ > ios_base.h:45, > from /usr/local/gcc-3.3.3/include/c++/3.3.3/ios:49, > from /usr/local/gcc-3.3.3/include/c++/3.3.3/ > ostream:45, > from /usr/local/gcc-3.3.3/include/c++/3.3.3/ > iostream:45, > from /soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/htlib/ > htString.h:23, > from /soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/htlib/ > Dictionary.h:22, > from /soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/htlib/ > Configuration.h:107, > from /soft/htdig/htdig-3.2.0b6/htdig-3.2.0b6/htlib/ > Configuration.cc:24: > /usr/local/gcc-3.3.3/include/c++/3.3.3/mips-sgi-irix6.5/bits/ > atomicity.h: In > function `_Atomic_word __exchange_and_add(_Atomic_word*, int)': > /usr/local/gcc-3.3.3/include/c++/3.3.3/mips-sgi-irix6.5/bits/ > atomicity.h:40: error: ` > test_then_add' undeclared (first use this function) > /usr/local/gcc-3.3.3/include/c++/3.3.3/mips-sgi-irix6.5/bits/ > atomicity.h:40: error: (Each > undeclared identifier is reported only once for each function it > appears > in.) > gmake[1]: *** [Configuration.lo] Error 1 > gmake[1]: Leaving directory `/net/jazz/fs2/scr/jazz/koenig/ > htdig-3.2.0b6/ARENA/32/htlib' > > > I've renamed db/mutex.h to db/db_mutex.h and applied the patch below > which avoids this problem: > > ---------------------------------------------------------------------- > --------- > diff -ur orig/htdig-3.2.0b6/db/Makefile.am htdig-3.2.0b6/db/ > Makefile.am > --- orig/htdig-3.2.0b6/db/Makefile.am 2002-02-02 > 19:18:05.000000000 +0100 > +++ htdig-3.2.0b6/db/Makefile.am 2005-08-16 14:36:15.000000000 > +0200 > @@ -168,7 +168,7 @@ > log_ext.h \ > mp.h \ > mp_ext.h \ > - mutex.h \ > + db_mutex.h \ > mutex_ext.h \ > os.h \ > os_ext.h \ > diff -ur orig/htdig-3.2.0b6/db/Makefile.in htdig-3.2.0b6/db/ > Makefile.in > --- orig/htdig-3.2.0b6/db/Makefile.in 2004-06-14 > 10:25:30.000000000 +0200 > +++ htdig-3.2.0b6/db/Makefile.in 2005-08-16 14:36:04.000000000 > +0200 > @@ -317,7 +317,7 @@ > log_ext.h \ > mp.h \ > mp_ext.h \ > - mutex.h \ > + db_mutex.h \ > mutex_ext.h \ > os.h \ > os_ext.h \ > diff -ur orig/htdig-3.2.0b6/db/db_int.h htdig-3.2.0b6/db/db_int.h > --- orig/htdig-3.2.0b6/db/db_int.h 2004-01-12 13:48:23.000000000 > +0100 > +++ htdig-3.2.0b6/db/db_int.h 2005-08-16 14:37:09.000000000 +0200 > @@ -260,7 +260,7 @@ > * More general includes. > *******************************************************/ > #include "debug.h" > -#include "mutex.h" > +#include "db_mutex.h" > #include "mutex_ext.h" > #include "region.h" > #include "env_ext.h" > ---------------------------------------------------------------------- > --------- > > > Harald Koenig > -- > "I hope to die ___ _____ > before I *have* to use Microsoft Word.", 0--,| /OOOOOOO\ > Donald E. Knuth, 02-Oct-2001 in Tuebingen. <_/ / / > OOOOOOOOOOO\ > \ \/ > OOOOOOOOOOOOOOO\ > \ > OOOOOOOOOOOOOOOOO|// > Harald Koenig \/\/\/\/\/\/ > \/\/\/ > science+computing ag // / \ > \ \ > ko...@sc... ^^^^^ > ^^^^^ > |
From: Scott B. <sc...@ho...> - 2005-08-12 00:36:04
|
Dear Webmasters, We are a web hosting company located in the United States, more exactly in Dallas, Texas. We have set up a new mirror for htdig. Here are the details: URL: http://htdig.hostingzero.com/maindocs/ Files URL: http://htdig.hostingzero.com/maindocs/files/ Update Frequency: Every Morning at 2 AM Organisation: HostingZero - http://www.hostingzero.com Hope everything is okay. Please let me know if I need to add/modify something. Many thanks, Scott Braynard |
From: Paracoda <ad...@pa...> - 2005-08-09 19:23:16
|
Greetings, We have installed a local mirror for htdig (main site and devel site) in Canada with the following specifications. - URL: htdig.paracoda.com - Speed: 100 mbps - Update: Daily - Location: Montreal, Quebec, Canada - Sponsor: www.paracoda.com - Contact: preferrably via www.paracoda.com but if necessary ad...@pa... Please list it as an official mirror. Thank you, Hossam Hossny Paracoda.com |
From: Christopher M. <chr...@mc...> - 2005-07-26 17:55:27
|
On Fri, 2005-07-22 at 01:08 -0300, Manuel Lemos wrote: > on 07/19/2005 08:10 PM Christopher Murtagh said the following: > > To do an incremental index: > > > > echo URL_list.txt | htdig -m foo -c conf_file.conf - > > > > (notice the trailing '-'). Making this work wasn't obvious, but I had a > > bit of help from the list, and it's all working for me now. > > hummm... I had the impression from a message posted in this list that > when you do incremental indexing, HtDig will still traverse all pages > but just performs HEAD requests to verify whether other pages were > updated. Is this what happens or I misunderstood the point of this? > > Another thing that confuses me about the example above is the parameter > that follows the -m switch. If it is supposed to read from STDIN, why > foo and not just - ? Yeah, I can't remember exactly why, other than it didn't work if I didn't do it. Sorry, it was a while ago when I set things up. A smarter person would have documented what I did, but I was swamped and didn't. :-) > Other than that, if I want to update existing index database files, > letting the users search the current databases while htdig is finishe, > adding -a switch to the htdig command line will work ok whe just > updating a few URLs as you suggest? I use htdig for several things, including indexing results of PostgreSQL queries and joins. For example, if you go to: http://www.mcgill.ca/classified/ The search tool uses htdig, embedded inside PostgreSQL (via stored procedures that call htdig). Same goes for: http://www.mcgill.ca/search/ Just about everything there uses htdig, inside PostgreSQL and with a PHP wrapper. Cheers, Chris |
From: Gustave T. Stresen-R. <ted...@ma...> - 2005-07-26 17:54:41
|
Very well said... Thanks for the clear thought. You probably saved all of us a lot of time and effort for almost nothing... I was just ruminating on what could be done to speed up htdig on the mac and that was the only thing I could think of... but you're really right: why put time into something that is ultimately nearing the end of its life... Ted On Jul 26, 2005, at 6:49 PM, Christopher Murtagh wrote: > On Tue, 2005-07-26 at 14:59 +0100, Gustave T. Stresen-Reuter wrote: >> One way to improve the speed of htdig on mac os x is to take advantage >> of the AltiVec processor. I don't know anything about how to do this, >> but in reading some documentation I ran across a document says that >> "Existing C code written for serial execution" can take advantage of >> the AltiVec processor. > > Although that has a pretty short life span (since Apple is moving to > Intel), and writing proper vectorized code is not simple. So, basically > a lot work and code for probably a very small audience and short term. > It would have to be a project that someone would do out of love rather > than need. :-) > > Cheers, > > Chris > > |
From: Christopher M. <ch...@sa...> - 2005-07-26 17:49:21
|
On Tue, 2005-07-26 at 14:59 +0100, Gustave T. Stresen-Reuter wrote: > One way to improve the speed of htdig on mac os x is to take advantage > of the AltiVec processor. I don't know anything about how to do this, > but in reading some documentation I ran across a document says that > "Existing C code written for serial execution" can take advantage of > the AltiVec processor. Although that has a pretty short life span (since Apple is moving to Intel), and writing proper vectorized code is not simple. So, basically a lot work and code for probably a very small audience and short term. It would have to be a project that someone would do out of love rather than need. :-) Cheers, Chris |
From: Neal R. <ne...@ri...> - 2005-07-26 16:46:35
|
On Tue, 26 Jul 2005, Gustave T. Stresen-Reuter wrote: > One way to improve the speed of htdig on mac os x is to take advantage > of the AltiVec processor. I don't know anything about how to do this, > but in reading some documentation I ran across a document says that > "Existing C code written for serial execution" can take advantage of > the AltiVec processor. > > If someone on the list knows whether or not the htdig source contains > "C code written for serial execution" and can point me to it, I can > explore what would be needed to get the code to take advantage of the > processor. I'll look into this a bit more. HtDig does essentially nothing in parallel.. My initial look is that the code has to be instrumented to do much with altivec: http://developer.apple.com/documentation/DeveloperTools/gcc-4.0.0/gcc/PowerPC-AltiVec-Built_002din-Functions.html There are definetly a few vector operations in htdig w.r.t word vectors, however they are not really the type of thing that AltiVec would help with at first glance. I would think some speedup could be had in htsearch during the index-row-hit summarization. Thanks -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2005-07-26 16:32:38
|
On Tue, 26 Jul 2005, Geoffrey Hutchison wrote: > > On Jul 26, 2005, at 9:59 AM, Gustave T. Stresen-Reuter wrote: > > > One way to improve the speed of htdig on mac os x is to take > > advantage of the AltiVec processor. I don't know anything about how > > to do this, but in reading some documentation I ran across a > > document says that "Existing C code written for serial execution" > > can take advantage of the AltiVec processor. > > I'm not sure there's really a lot that can happen here. Neal can > contradict me, but the last time I did benchmarking/profiling on the > code, most of the slowdown during indexing was in the database code, > and a large part of that was I/O bound. What is your wordlist_cache_size set to? Make sure that it is about 2-3% of the expected index size. I would also disable all index compression and run htdig again to see if you notice a speed up. Setting wordlist_compress & wordlist_compress_zlib to 'false' is the way to test that. > Furthermore, AltiVec really shines doing floating-point processing, > e.g. matrix multiplication. There's very, very little of that in ht:// > Dig. > > I think for future development (i.e., 4.0) multi-threading the > indexing would probably help *MUCH* more than adding AltiVec or SSE > processor-specific optimizations which mostly help with floating- > point ops. > > And of course, moving to a different database backend would probably > help immensely too. Yep. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Geoffrey H. <ge...@ge...> - 2005-07-26 14:32:07
|
On Jul 26, 2005, at 9:59 AM, Gustave T. Stresen-Reuter wrote: > One way to improve the speed of htdig on mac os x is to take > advantage of the AltiVec processor. I don't know anything about how > to do this, but in reading some documentation I ran across a > document says that "Existing C code written for serial execution" > can take advantage of the AltiVec processor. I'm not sure there's really a lot that can happen here. Neal can contradict me, but the last time I did benchmarking/profiling on the code, most of the slowdown during indexing was in the database code, and a large part of that was I/O bound. Furthermore, AltiVec really shines doing floating-point processing, e.g. matrix multiplication. There's very, very little of that in ht:// Dig. I think for future development (i.e., 4.0) multi-threading the indexing would probably help *MUCH* more than adding AltiVec or SSE processor-specific optimizations which mostly help with floating- point ops. And of course, moving to a different database backend would probably help immensely too. Cheers, -Geoff |
From: Gustave T. Stresen-R. <ted...@ma...> - 2005-07-26 13:59:52
|
One way to improve the speed of htdig on mac os x is to take advantage of the AltiVec processor. I don't know anything about how to do this, but in reading some documentation I ran across a document says that "Existing C code written for serial execution" can take advantage of the AltiVec processor. If someone on the list knows whether or not the htdig source contains "C code written for serial execution" and can point me to it, I can explore what would be needed to get the code to take advantage of the processor. Ted Stresen-Reuter |
From: Gustave T. Stresen-R. <ted...@ma...> - 2005-07-25 17:26:49
|
>> Thanks in advance... And please count on me providing a final Mac OS X >> package when this release is ready to go. > > How has it been working for you? Any issues? Actually, it's been working fine, but I'm not using it in production (and I'm only testing it on the OS X client version). However, I did start a dig that dug thousands of pages (over the course of a couple days) and it went just fine, no problems at all and it's only a little slower than a standard Linux machine of equal horsepower (but that's more a reflection of the Mac subsystems than the htdig source). I built it on Panther (10.3) but it is my understanding that Tiger (10.4) ships with a new version of the gcc compiler (not sure which) so I don't know if the new compiler might be causing problems for others or not, my guess is not. I haven't updated yet and don't have any plans to so the final package will be built on 10.3.9. Thanks for the replies to the other inquiries. I'll probably release binaries of these packages as part of the main package as well. Ted |