From: Mick W. <mic...@gm...> - 2005-07-18 06:35:16
|
I'm just wondering what the status is on the development of htdig. I haven't seen much going on on the list for a while. Best Regards, - Mick |
From: Neal R. <ne...@ri...> - 2005-07-19 22:11:45
|
3.2b6 is either done or dead. I suppose that either me or Geoff need to finish it off and call it 3.2 and update the website. I've called for a psuedo-vote on this several times with silence being the general response. On a more positive note Anthony Arnone (Montana State Univ. grad student) and I have started active development of HtDig 4.0. It will be a merge of HtDig + CLucene with a significant amount of code for the existing Berkeley DB based WordDb being flushed. The main impetous for this is Unicode support and a speed and index size improvement. We expect to produce a decently detailed refactoring document next week and create a 4.0 CVS branch then. Thanks On Mon, 18 Jul 2005, Mick Weiss wrote: > I'm just wondering what the status is on the development of htdig. I > haven't seen much going on on the list for a while. > > Best Regards, > > - Mick -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Manuel L. <ml...@ac...> - 2005-07-19 22:45:36
|
Hello, on 07/19/2005 07:14 PM Neal Richter said the following: > 3.2b6 is either done or dead. I suppose that either me or Geoff need to > finish it off and call it 3.2 and update the website. I've called for a > psuedo-vote on this several times with silence being the general response. The lack of visible activity and the fact that many of us us htdig as is without problems made people believe that voting would irrelevant. Anyway, if you still think you need votes to go ahead, here is a +1 on my behalf. > On a more positive note Anthony Arnone (Montana State Univ. grad student) > and I have started active development of HtDig 4.0. It will be a merge of > HtDig + CLucene with a significant amount of code for the existing > Berkeley DB based WordDb being flushed. > > The main impetous for this is Unicode support and a speed and index size > improvement. > > We expect to produce a decently detailed refactoring document next week > and create a 4.0 CVS branch then. Great. I hope that will allow us to do things like making Htdig crawl individual pages and only update their entries in the index. That is what miss most in the current HTDig version. I make htdig crawl the static version of my site every day, but that is not very efficient and often it is too late. I can keep track of all pages that change and need to reindexed, but it is odd to make Htdig crawl the hole site just because a few pages changed. I would be more satisfied if I could just tell htdig once an hour to reindex a limited list of pages that changed. -- Regards, Manuel Lemos PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/ PHP Reviews - Reviews of PHP books and other products http://www.phpclasses.org/reviews/ Metastorage - Data object relational mapping layer generator http://www.meta-language.net/metastorage.html |
From: Christopher M. <chr...@mc...> - 2005-07-19 23:10:27
|
On Tue, 2005-07-19 at 19:45 -0300, Manuel Lemos wrote: > Great. I hope that will allow us to do things like making Htdig crawl > individual pages and only update their entries in the index. That is > what miss most in the current HTDig version. I'm using htdig 3.2 for doing incremental indexing right now and it seems to be working fine. What sort of problems are you having? To remove a list of URLs: htpurge -c conf_file.conf -u list_of_urls.txt To do an incremental index: echo URL_list.txt | htdig -m foo -c conf_file.conf - (notice the trailing '-'). Making this work wasn't obvious, but I had a bit of help from the list, and it's all working for me now. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: Neal R. <ne...@ri...> - 2005-07-21 16:32:46
|
On Tue, 19 Jul 2005, Manuel Lemos wrote: > > We expect to produce a decently detailed refactoring document next week > > and create a 4.0 CVS branch then. > > Great. I hope that will allow us to do things like making Htdig crawl > individual pages and only update their entries in the index. That is > what miss most in the current HTDig version. > > I make htdig crawl the static version of my site every day, but that is > not very efficient and often it is too late. > > I can keep track of all pages that change and need to reindexed, but it > is odd to make Htdig crawl the hole site just because a few pages > changed. I would be more satisfied if I could just tell htdig once an > hour to reindex a limited list of pages that changed. This should be exactly what Chris Murtagh's command does: echo URL_list.txt | htdig -m foo -c conf_file.conf - -m suppresses the addition of the full list of URLs in the db.docs to the 'to be requested queue' in the spider. Please reply back if this is not addressing your desires... Thanks -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Manuel L. <ml...@ac...> - 2005-07-22 04:08:02
|
Hello, on 07/19/2005 08:10 PM Christopher Murtagh said the following: > On Tue, 2005-07-19 at 19:45 -0300, Manuel Lemos wrote: >>Great. I hope that will allow us to do things like making Htdig crawl >>individual pages and only update their entries in the index. That is >>what miss most in the current HTDig version. > > I'm using htdig 3.2 for doing incremental indexing right now and it > seems to be working fine. What sort of problems are you having? > > To remove a list of URLs: > > htpurge -c conf_file.conf -u list_of_urls.txt > > To do an incremental index: > > echo URL_list.txt | htdig -m foo -c conf_file.conf - > > (notice the trailing '-'). Making this work wasn't obvious, but I had a > bit of help from the list, and it's all working for me now. hummm... I had the impression from a message posted in this list that when you do incremental indexing, HtDig will still traverse all pages but just performs HEAD requests to verify whether other pages were updated. Is this what happens or I misunderstood the point of this? Another thing that confuses me about the example above is the parameter that follows the -m switch. If it is supposed to read from STDIN, why foo and not just - ? Other than that, if I want to update existing index database files, letting the users search the current databases while htdig is finishe, adding -a switch to the htdig command line will work ok whe just updating a few URLs as you suggest? Should I follow htdig command with the usual htmerge and htfuzzy command calls as in a full reindex? If this works ok like this, that will solve my problem as I need. If so, I plan to update my HTDIG PHP interface class and release a new version soon for the benefit of all that use HTDIG with PHP. http://www.phpclasses.org/htdiginterface -- Regards, Manuel Lemos PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/ PHP Reviews - Reviews of PHP books and other products http://www.phpclasses.org/reviews/ Metastorage - Data object relational mapping layer generator http://www.meta-language.net/metastorage.html |
From: Neal R. <ne...@ri...> - 2005-07-25 04:15:53
|
On Fri, 22 Jul 2005, Manuel Lemos wrote: > > (notice the trailing '-'). Making this work wasn't obvious, but I had a > > bit of help from the list, and it's all working for me now. > > hummm... I had the impression from a message posted in this list that > when you do incremental indexing, HtDig will still traverse all pages > but just performs HEAD requests to verify whether other pages were > updated. Is this what happens or I misunderstood the point of this? It does exactly that unless you give it the -m switch, at which point it just adds/updates the pages you specify. > Another thing that confuses me about the example above is the parameter > that follows the -m switch. If it is supposed to read from STDIN, why > foo and not just - ? I'll have to read the code to see why, looks like a command-line parsing problem. > Other than that, if I want to update existing index database files, > letting the users search the current databases while htdig is finishe, > adding -a switch to the htdig command line will work ok whe just > updating a few URLs as you suggest? The 'alt work files' command line option is another way to do what you described above (with the -m switch). > Should I follow htdig command with the usual htmerge and htfuzzy command > calls as in a full reindex? Htmerge is only necessary when you are merging two databases... the above scenario doesn't need it. > If this works ok like this, that will solve my problem as I need. If so, > I plan to update my HTDIG PHP interface class and release a new version > soon for the benefit of all that use HTDIG with PHP. > > http://www.phpclasses.org/htdiginterface There is a 'libhtdig' & 'libhtdigphp' directories in the 3.2b6 snapshot which would enable you to directly call htdig search functions without a PHP wrapper. Thanks -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Gustave T. Stresen-R. <ted...@ma...> - 2005-07-25 06:44:46
|
On Jul 25, 2005, at 5:17 AM, Neal Richter wrote: > There is a 'libhtdig' & 'libhtdigphp' directories in the 3.2b6 > snapshot > which would enable you to directly call htdig search functions without > a > PHP wrapper. > How are these libraries built? Do I just navigate to the directory and ./configure make make install? Doing just the basic ./configure make make install from the root directory doesn't seem to build the "binaries". Also, does PHP need to be built with these libraries or can they be accessed by adding lines to php.ini (or at runtime using dl())? Finally, do these libraries come with any documentation? Thanks in advance... And please count on me providing a final Mac OS X package when this release is ready to go. Ted Stresen-Reuter |
From: Neal R. <ne...@ri...> - 2005-07-25 17:16:46
|
On Mon, 25 Jul 2005, Gustave T. Stresen-Reuter wrote: > On Jul 25, 2005, at 5:17 AM, Neal Richter wrote: > > > There is a 'libhtdig' & 'libhtdigphp' directories in the 3.2b6 > > snapshot > > which would enable you to directly call htdig search functions without > > a > > PHP wrapper. > > > > How are these libraries built? Do I just navigate to the directory and > ./configure make make install? Doing just the basic ./configure make > make install from the root directory doesn't seem to build the > "binaries". A 'make' in libhtdig (after ./configure in the main directory) should give you a libhtdig.so In libhtdigphp you will have to ./configure (with the appropriate PHP system config flags to configure) then run 'make' (which will fail to link) and then run 'relink.sh'. Since these are auxilary libraries, I have not bothered to integrate them into the base htdig make system. htdig 4.0 will probably be structured a bit differently.... > Also, does PHP need to be built with these libraries or can they be > accessed by adding lines to php.ini (or at runtime using dl())? > Finally, do these libraries come with any documentation? runtime using dl() Not much documentation, but libhtdig_api.h is fairly instructive. There is an example PHP file n libhtdigphp > Thanks in advance... And please count on me providing a final Mac OS X > package when this release is ready to go. How has it been working for you? Any issues? -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Gustave T. Stresen-R. <ted...@ma...> - 2005-07-25 17:26:49
|
>> Thanks in advance... And please count on me providing a final Mac OS X >> package when this release is ready to go. > > How has it been working for you? Any issues? Actually, it's been working fine, but I'm not using it in production (and I'm only testing it on the OS X client version). However, I did start a dig that dug thousands of pages (over the course of a couple days) and it went just fine, no problems at all and it's only a little slower than a standard Linux machine of equal horsepower (but that's more a reflection of the Mac subsystems than the htdig source). I built it on Panther (10.3) but it is my understanding that Tiger (10.4) ships with a new version of the gcc compiler (not sure which) so I don't know if the new compiler might be causing problems for others or not, my guess is not. I haven't updated yet and don't have any plans to so the final package will be built on 10.3.9. Thanks for the replies to the other inquiries. I'll probably release binaries of these packages as part of the main package as well. Ted |
From: Christopher M. <chr...@mc...> - 2005-07-26 17:55:27
|
On Fri, 2005-07-22 at 01:08 -0300, Manuel Lemos wrote: > on 07/19/2005 08:10 PM Christopher Murtagh said the following: > > To do an incremental index: > > > > echo URL_list.txt | htdig -m foo -c conf_file.conf - > > > > (notice the trailing '-'). Making this work wasn't obvious, but I had a > > bit of help from the list, and it's all working for me now. > > hummm... I had the impression from a message posted in this list that > when you do incremental indexing, HtDig will still traverse all pages > but just performs HEAD requests to verify whether other pages were > updated. Is this what happens or I misunderstood the point of this? > > Another thing that confuses me about the example above is the parameter > that follows the -m switch. If it is supposed to read from STDIN, why > foo and not just - ? Yeah, I can't remember exactly why, other than it didn't work if I didn't do it. Sorry, it was a while ago when I set things up. A smarter person would have documented what I did, but I was swamped and didn't. :-) > Other than that, if I want to update existing index database files, > letting the users search the current databases while htdig is finishe, > adding -a switch to the htdig command line will work ok whe just > updating a few URLs as you suggest? I use htdig for several things, including indexing results of PostgreSQL queries and joins. For example, if you go to: http://www.mcgill.ca/classified/ The search tool uses htdig, embedded inside PostgreSQL (via stored procedures that call htdig). Same goes for: http://www.mcgill.ca/search/ Just about everything there uses htdig, inside PostgreSQL and with a PHP wrapper. Cheers, Chris |