You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Bill C. <wca...@vh...> - 2002-02-26 22:54:53
|
Hey all, I ran across a logical problem when handling <META name="robots" content="noindex"> on a page. The behavior expected is that links on the page will be followed and indexed. This works fine on the initial index. Let's call the page that shouldn't be indexed TOC (Tables Of Contents, a typical application) and pages linked to the TOC are the content. If the only link to a page of the content is on the TOC, later indexing will not index that page as the bridging TOC is dropped from the list of documents (this assumes any pages linking to the TOC have not been modified since the last run and hence are not re-fetched). This causes the page to drop from the database, it will only be picked up on the next full index and dropped again on the next partial index. I didn't see that this issue had been discussed before, would this still be an issue for 3.2x? Later, Bill Carlson -- Systems Programmer wca...@vh... | Anything is possible, Virtual Hospital http://www.vh.org/ | given time and money. University of Iowa Hospitals and Clinics | Opinions are mine, not my employer's. | |
|
From: Jessica B. <jes...@ya...> - 2002-02-25 21:33:13
|
Does anyone here compile with static linking rather than dynamic? I was curious to know if there were any advantages or disadvantages. Right now I'm using one of the snapshots. __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
|
From: Jessica B. <jes...@ya...> - 2002-02-25 21:31:50
|
__________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
|
From: Geoff H. <ghu...@us...> - 2002-02-24 08:13:58
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Joshua G. <jg...@pt...> - 2002-02-22 16:49:55
|
Hi Harri, > > On Fri, 22 Feb 2002, Harri Pasanen wrote: > > > > > I downloaded htdig-3.2.0b2, but I did not see anything to support https > > > indexing. What is the status on this front? > > > > Well, it's certainly not in 3.2.0b2. Keep in mind that there's a 3.2.0b3 > > release and we recommend snapshots of 3.2.0b4, which fix a large number > > of > > bugs in 3.2.0b3. > > > > Where did you get 3.2.0b2 that didn't have 3.2.0b3? > > > > Hmm, looks like I clicked on 3.2.0b2 by accident. > > I got 3.2.0b3 now, but that doesn't seem to have SSL support either. Although my memory fails me a bit, I think I added the SSL stuff to a 3.2.0b4 snapshot so it should be in there. Cheers, Joshua |
|
From: Harri P. <har...@tr...> - 2002-02-22 15:54:56
|
On Fri, 22 Feb 2002 10:30:58 -0500 (EST) Geoff Hutchison <ghu...@ws...> wrote: > On Fri, 22 Feb 2002, Harri Pasanen wrote: > > > I downloaded htdig-3.2.0b2, but I did not see anything to support https > > indexing. What is the status on this front? > > Well, it's certainly not in 3.2.0b2. Keep in mind that there's a 3.2.0b3 > release and we recommend snapshots of 3.2.0b4, which fix a large number > of > bugs in 3.2.0b3. > > Where did you get 3.2.0b2 that didn't have 3.2.0b3? > The CVS version seems to have some SSL support in it, I'll check that out. -Harri |
|
From: Harri P. <har...@tr...> - 2002-02-22 15:48:02
|
On Fri, 22 Feb 2002 10:30:58 -0500 (EST) Geoff Hutchison <ghu...@ws...> wrote: > On Fri, 22 Feb 2002, Harri Pasanen wrote: > > > I downloaded htdig-3.2.0b2, but I did not see anything to support https > > indexing. What is the status on this front? > > Well, it's certainly not in 3.2.0b2. Keep in mind that there's a 3.2.0b3 > release and we recommend snapshots of 3.2.0b4, which fix a large number > of > bugs in 3.2.0b3. > > Where did you get 3.2.0b2 that didn't have 3.2.0b3? > Hmm, looks like I clicked on 3.2.0b2 by accident. I got 3.2.0b3 now, but that doesn't seem to have SSL support either. -Harri |
|
From: Geoff H. <ghu...@ws...> - 2002-02-22 15:31:28
|
On Fri, 22 Feb 2002, Harri Pasanen wrote: > I downloaded htdig-3.2.0b2, but I did not see anything to support https > indexing. What is the status on this front? Well, it's certainly not in 3.2.0b2. Keep in mind that there's a 3.2.0b3 release and we recommend snapshots of 3.2.0b4, which fix a large number of bugs in 3.2.0b3. Where did you get 3.2.0b2 that didn't have 3.2.0b3? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: W. S. <wi...@im...> - 2002-02-22 11:46:31
|
Hello, I want to create a database of our projects. But the Results should be displayed as file://P:\Project A\... instead of http://Project A/..., the read permissions of our different departments of our institute must be integrated. We have some secret Projects which should not be seen be everybody. So the people should see, that there is a Match, but contact the projects manager for the document. Where should I change the code?? Thank you for your help, Sven Willer Fraunhofer IML Joseph-von-Fraunhofer Str. 2-4 D-44227 Dortmund |
|
From: Harri P. <har...@tr...> - 2002-02-22 11:19:51
|
I'm new to this list, so pardon my ignorance. I downloaded htdig-3.2.0b2, but I did not see anything to support https indexing. What is the status on this front? Thanks, -Harri |
|
From: Gabriele B. <an...@ti...> - 2002-02-18 21:02:54
|
Ciao dear ht://Dig friends!
Sorry if I bug you, but this is going to be a very important day for me
and I am sure for the ht://Dig Group as well. I am so proud to announce
that after almost 3 years of work, ht://Check has finally come to its first
stable release!
Probably the Group members,Geoff and Gilles particularly (and also
Loic), know how much I have stressed them in this period, especially at the
beginning! And I can't believe that the project that was just nothing more
than an idea, with my efforts, those of the people working with me and
those of this wonderful group of people at ht://Dig, has come true.
ht://Check is probably one of the most diffused link checkers and Web
sites management tools for GNU/Linux systems and thanks to my friend Marco,
there is a Debian package for it, and hopefully soon an RPM (thanks Gilles
for your help).
Its relationship with ht://Dig is the core library, which is heavily
used by ht://Check, and the network library (HTTP/1.1 especially), which I
developed for both ht://Check and ht://Dig.
For those who want to know more about it, you can give a look at this
URL: http://htcheck.sourceforge.net/. You can download the tar.gz file
containing the sources and the documentation (in html and ps formats).
Well, I think that's enough for now. Sorry again if this e-mail
disturbed you somehow, please know that I didn't mean it, and understand
that ... well ... I am so happy and I can't keep it inside of me!
Ciao and thanks again
-Gabriele
--
Gabriele Bartolini - Web Programmer
Current Location: Prato, Tuscany, Italy
an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> find bin/laden -name osama -exec rm {} \;
-
Important:
--------------
I've experienced problems when receiving e-mail sent to the
address: an...@us.... I think I lost much of it.
So if you sent me a message, and I never replied to you,
that's probably the reason. Please update your address book to
this one: an...@ti.... Sorry and thank you!
|
|
From: Geoff H. <ghu...@us...> - 2002-02-17 08:14:02
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* MySQL patches to 3.1.x to be forward-ported and cleaned up.
(Should really only attempt to use SQL for doc_db and related, not word_db)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
* Handle local_urls through file:// handler, for mime.types support.
* Handle directory redirects in RetrieveLocal.
* Merge with mifluz
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|
|
From: Geoff H. <ghu...@ws...> - 2002-02-16 22:00:53
|
Malcolm, I'm not sure what the current status of mifluz is as I honestly have never really been an active developer of mifluz. From what I can tell, Loic has moved on to other work and has not been developing mifluz either. Perhaps he or someone else on the mifluz-dev list can give you more information. On the other hand, the ht://Dig project is still active and is working, in part, on resyncing with the current mifluz and updating the String and word-parsing code to handle multi-byte strings. (I'm not entirely sure how well mifluz currently works in this regard since the shared String class in both projects seems to assume char == byte in places.) If Loic or others in mifluz can give you some suggestions on areas needing work with mifluz, then I'm sure your help would be greatly appreciated. Otherwise, many of us in the ht://Dig project can offer suggestions that may help with the applications you mention. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Saturday, February 16, 2002, at 09:14 AM, Malcolm Melville wrote: > Geoff > > I have used miflux on and off over the past couple of years > in an experimental way - looking at building news databases > which are searchable within seconds of a story appearing on > a wire. Currently I am in a state of change and am > interested in knowing whether you guys are interested in > any development effort on mifluz. I have about 10 years of > C and C++ in the text database area behind me and another 8 > years working on market data and other business real time > applications. Prior to that I worked on various AC power > systems simulations and bit and bobs. > > Over the last 4 years, I have been looking at hardware > speedups - compiling searches to hardware for execution > using arrays of processors, and most recently FPGAs. > > I have enjoyed using miflux but always been slightly > puzzled as to why it has never got to a version 1.0 and > what criteria would be used to say it had arrived. > > While I am able, for a few months and probably more, would > like to contrinute rather than use, if there is anything > useful I can do. > > regards > malcolm > > |
|
From: Geoff H. <ghu...@ws...> - 2002-02-16 16:45:00
|
Hi Donald, Sorry to take so long to get back to you, but the recent release and follow-up have certainly taken my focus for a bit. We'd definitely love to have an FTP mirror at ibiblio, either as a virtual host of ftp.htdig.org (if that's something you can do) or as a simple mirror to add to our growing mirror list. We have mirroring instructions using wget for the files at <http://www.htdig.org/howto-mirror.html>. Or of course if you'd like me to handle the mirror, I'd be glad to do that. So if you'd like me to handle the mirror, let me know what I need to do to get an account, otherwise let me know if I can help in another way to get the mirror from SF going. Thanks very much! -Geoff On Tuesday, January 29, 2002, at 09:06 AM, Don Sizemore wrote: > > Hi Geoff, > > I know it's old news that Sourceforge has stopped its FTP services, > but I'd still like to offer you FTP space on ibiblio for htdig, either > mirrored from sourceforge or managed by you or another volunteer. We > already host a sourceforge mirror, so this would be easy and brainless > for us. Your users would benefit from FTP services once more. > > If you need ibiblio accounts, CVS, mysql, or mailman down the road > these are also no problem. Just let me know? > > Thanks, > Donald ibiblio.org > formerly known as SunSITE > 919.843.8215 and stoof. > |
|
From: Gilles D. <gr...@sc...> - 2002-02-14 17:41:47
|
According to IMRAN SABIR: > The thing i want to know is that > which opertaing systems is necessary to run HTDIG or it may run on windows > operating systems like windows 2000, NT See http://www.htdig.org/require.html and http://www.htdig.org/FAQ.html#q2.6 -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gabriele B. <g.b...@co...> - 2002-02-14 14:22:59
|
>Looks reasonable to me. What about doing anything with $(MODIFIED) >and $(SIZE) in $_Results? I know, it's just my bias, but I like >seeing those in search results. I also noticed you don't use $(WORD) >in $_Head, nor any of the other template variables commonly used for the >followup search form (e.g. RESTRICT, EXCLUDE, CONFIG, SELECTED_FORMAT, >SELECTED_METHOD and SELECTED_SORT), so I'm assuming that you don't have >a followup form. Maybe I'm wrong, though, and you simply propagate the >user input to the followup form directly in PHP, without the need for >anything from htsearch. Is that right? Ciao Gilles, yes, I know, I was just giving it a try! I didn't put the variables I wanted, just picked the templates in the PHP guide. I haven't yet thought about the propagation of the variables. I guess I am gonna manage it all through the PHP script. I'll let you know anyway about my progress. Ciao and thanks -Gabriele -- Gabriele Bartolini - Computer Programmer U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa g.b...@co... | http://www.po-net.prato.it/ The nice thing about Windows is - It does not just crash, it displays a dialog box and lets you press 'OK' first. |
|
From: Dan C. <dan...@so...> - 2002-02-13 23:36:10
|
Thanks Geoff, I gave the latest snapshot a go (3.2.0b4-20020210) but the phrase searching seems equally flakey. I'm going to try and narrow it down to a very specific example to aid in the group's debugging efforts. Since I have to move quickly on my decision for a production search engine however (and don't have time to get up to speed on the 3.2 source), I'm probably going to go with 3.1.6, which is a real shame as phrase searching would be brilliant! Cheers, Dan -----Original Message----- From: Geoff Hutchison [mailto:ghu...@ws...] Sent: Wednesday, 13 February 2002 2:39 AM To: Dan Cutting Cc: 'htd...@li...' Subject: Re: [htdig-dev] 3.2.0b3 phrase searches <snip> Yes, it's a known bug. I would use the 3.2.0b4 snapshots which are certainly more stable than 3.2.0b3 and fix the security hole in 3.2.0b3. As to whether it's ready for a production environment, I can't say. Certainly if you find bugs, we'll try to fix them as fast as possible. <snip> ********************************************************************** visit http://www.solution6.com visit http://www.eccountancy.com - everything for accountants. UK Customers - http://www.solution6.co.uk ********************************************************************* This email message (and attachments) may contain information that is confidential to Solution 6. If you are not the intended recipient you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to the official business of Solution 6 are neither given nor endorsed by it. ********************************************************************* |
|
From: Gilles D. <gr...@sc...> - 2002-02-13 16:52:08
|
According to Gabriele Bartolini: > >opened up any security holes right in htsearch. What did you have > >to change directly in htsearch, or did you manage everything by using > >template files to spit out the PHP code? I think the more you do with > > Sorry Gilles, I didn't explain it very well! > > Yes, everything is made by using template files. No change to the internal > code. For instance this is the header template file content: > > \$_Head['matches'] = $(MATCHES); > \$_Head['firstdisplayed'] = $(FIRSTDISPLAYED); > \$_Head['lastdisplayed'] = $(LASTDISPLAYED); > \$_Head['logical_words'] = '$%(LOGICAL_WORDS)'; > \$i = 0; > > whereas here is the code for the results template: > > \$_Results[\$i]['title'] = '$%(TITLE)'; > \$_Results[\$i]['url'] = '$%(URL)'; > \$_Results[\$i]['percent'] = $(PERCENT); > \$_Results[\$i]['excerpt'] = '$%(EXCERPT)'; > \$i += 1; > > As you can see I am not using all of the variables. Just a few. Anyway, I > just need to evaluate the code resulting from htsearch and ... that's it. I > have 2 associative arrays, one called $_Head and one called $_Result. > > Let me know what you think about it! Looks reasonable to me. What about doing anything with $(MODIFIED) and $(SIZE) in $_Results? I know, it's just my bias, but I like seeing those in search results. I also noticed you don't use $(WORD) in $_Head, nor any of the other template variables commonly used for the followup search form (e.g. RESTRICT, EXCLUDE, CONFIG, SELECTED_FORMAT, SELECTED_METHOD and SELECTED_SORT), so I'm assuming that you don't have a followup form. Maybe I'm wrong, though, and you simply propagate the user input to the followup form directly in PHP, without the need for anything from htsearch. Is that right? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Geoff H. <ghu...@ws...> - 2002-02-13 16:49:25
|
Jessica, I'm sorry, but I'm afraid I cannot help you as I simply do not have enough time. Right now, there are only 24 hours in the day and I think the last few months, I've had about 25 hours spoken for. :-( I forwarded your previous request to the htdig-dev mailing list (and CC'ed this reply) since there are a number of people who could do this work. It's not all that difficult but does take time to test, especially for memory leaks. Regards, -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Wed, 13 Feb 2002, Jessica Biola wrote: > Geoff, I was wondering if you would be interest in > taking on a project to modify htsearch (3.2.0 beta) so > that it properly runs in fastcgi? I'd be willing to > pay a reasonable amount for such development. Please > let me know if this is something you'd be interested > in or that you have the time to develop. > > Sincerely, > Jes > > > __________________________________________________ > Do You Yahoo!? > Send FREE Valentine eCards with Yahoo! Greetings! > http://greetings.yahoo.com > |
|
From: Gabriele B. <g.b...@co...> - 2002-02-13 16:31:06
|
>opened up any security holes right in htsearch. What did you have >to change directly in htsearch, or did you manage everything by using >template files to spit out the PHP code? I think the more you do with Sorry Gilles, I didn't explain it very well! Yes, everything is made by using template files. No change to the internal code. For instance this is the header template file content: \$_Head['matches'] = $(MATCHES); \$_Head['firstdisplayed'] = $(FIRSTDISPLAYED); \$_Head['lastdisplayed'] = $(LASTDISPLAYED); \$_Head['logical_words'] = '$%(LOGICAL_WORDS)'; \$i = 0; whereas here is the code for the results template: \$_Results[\$i]['title'] = '$%(TITLE)'; \$_Results[\$i]['url'] = '$%(URL)'; \$_Results[\$i]['percent'] = $(PERCENT); \$_Results[\$i]['excerpt'] = '$%(EXCERPT)'; \$i += 1; As you can see I am not using all of the variables. Just a few. Anyway, I just need to evaluate the code resulting from htsearch and ... that's it. I have 2 associative arrays, one called $_Head and one called $_Result. Let me know what you think about it! Ciao and thanks again -Gabriele -- Gabriele Bartolini - Computer Programmer U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa g.b...@co... | http://www.po-net.prato.it/ The nice thing about Windows is - It does not just crash, it displays a dialog box and lets you press 'OK' first. |
|
From: Gilles D. <gr...@sc...> - 2002-02-13 15:41:01
|
According to Gabriele Bartolini: > Basically this is my idea. I tried it and it works pretty fast and it > is very flexible. I make htsearch write PHP code itself, by generating code > to be evaluated inside the wrapper script. It's needless to say to we have > to be extremely careful about checking the code. > > I am testing it. If you are interested I can share it with you guys and > discuss about it. Waiting for your opinion, especially as far as the > security is concerned. I wouldn't be able to comment on the security of the PHP code itself, but I could certainly look at the htsearch changes to see if you've opened up any security holes right in htsearch. What did you have to change directly in htsearch, or did you manage everything by using template files to spit out the PHP code? I think the more you do with templates, rather than direct code changes, the better. It keeps the htsearch code clean that way, as well as keeping it general, and the template facility is flexible enough that you should be able to do most of what you need as far as custom output that way. If there are some things that you can't do in template files, that you'd need to do, we can address these limitations on a case by case basis. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gabriele B. <g.b...@co...> - 2002-02-13 12:54:45
|
>As Neal suggested, check out the xmlsearch code, which is also bundled
>in the contrib directory of 3.1.6. You can avoid any HTML code in
>the excerpts by turning off the add_anchors_to_excerpt attribute, and
>changing start_ellipses, end_ellipses, start_highlight, and end_highlight.
>I suppose these attribute definitions should be added to xml.conf in
>contrib/xmlsearch. I think there will also be a problem with the 3.2
>betas, not just with excerpts but with all $&(var) expansions, in that
>all accented characters are mapped back to ISO-8850-1 character entities,
>which, if I understand correctly, are invalid in XML.
Ciao Gilles and Neal.
Thanks for your postings. I tried with the XML output, but I found also
another way for managing the output with PHP. Indeed, creating a wrapper
PHP script which handles the XML output generated by htsearch is kinda slow.
Basically this is my idea. I tried it and it works pretty fast and it
is very flexible. I make htsearch write PHP code itself, by generating code
to be evaluated inside the wrapper script. It's needless to say to we have
to be extremely careful about checking the code.
I am testing it. If you are interested I can share it with you guys and
discuss about it. Waiting for your opinion, especially as far as the
security is concerned.
Ciao and thanks
-Gabriele
--
Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
g.b...@co... | http://www.po-net.prato.it/
The nice thing about Windows is - It does not just crash,
it displays a dialog box and lets you press 'OK' first.
|
|
From: IMRAN S. <imr...@ho...> - 2002-02-13 11:08:50
|
Dear Geoff Hutchison I want to certain queries regarding using the HtDig. Actually i want to implement the full text based search in HTML, Text based documents. I know Htdig is capable of the doing this. I have throughly studies all the material. The thing i want to know is that which opertaing systems is necessary to run HTDIG or it may run on windows operating systems like windows 2000, NT And please notify me other helpful material regarding to keyword based search in html documents With Very best Regards Imran _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |
|
From: Gilles D. <gr...@sc...> - 2002-02-12 20:54:52
|
According to Gabriele Bartolini: > I am working on a PHP wrapper project for ht://Dig. I read an > interesting guide on the contributed work, but I think it is kinda old now, > especially by keeping in mind new versions of PHP. > > Basically, I would like to create an XML file as output of the htsearch > program, then use an XML parser from the PHP script. The PHP opens a pipe > to the htsearch program and the XML reads its pointer. > > I got some problems as far as the excerpt is concerned. I was just > wondering, if somebody of you is interested on it. And of course has some > ideas and opinion! As Neal suggested, check out the xmlsearch code, which is also bundled in the contrib directory of 3.1.6. You can avoid any HTML code in the excerpts by turning off the add_anchors_to_excerpt attribute, and changing start_ellipses, end_ellipses, start_highlight, and end_highlight. I suppose these attribute definitions should be added to xml.conf in contrib/xmlsearch. I think there will also be a problem with the 3.2 betas, not just with excerpts but with all $&(var) expansions, in that all accented characters are mapped back to ISO-8850-1 character entities, which, if I understand correctly, are invalid in XML. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Neal R. <ne...@ri...> - 2002-02-12 19:48:02
|
On Tue, 12 Feb 2002, Gabriele Bartolini wrote: > Ciao guys, > > I am working on a PHP wrapper project for ht://Dig. I read an > interesting guide on the contributed work, but I think it is kinda old now, > especially by keeping in mind new versions of PHP. You might look at the http://www.htdig.org/files/contrib/wrappers/xmlsearch.tar.gz They have used the header/footer/wrapper files to put the search results into an XML format. Also the http://www.htdig.org/files/contrib/wrappers/htsearch-php3.0.1.1.tar.gz is a start on a PHP page that calls the CGI. Check out XSLT for use a nice XML parser in PHP. > Basically, I would like to create an XML file as output of the htsearch > program, then use an XML parser from the PHP script. The PHP opens a pipe > to the htsearch program and the XML reads its pointer. I'm working on a project to compile htdig with a set of API functions into 'libhtdig'. Part of this will be writing a second very small library 'libhtdigphp' that contains PHP wrappers for libhtdig. Ultimately this could be a prefereable approach to forking an htsearch process and processng the output. There is a definite performance hit for calling a seperate CGI from php vs calling functions within PHP... context switching, memory paging, & the overhead of parsing the output etc. My ETA on a servicable first version is end of the week. Thanks -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |