You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Geoff H. <ghu...@ws...> - 2003-04-15 04:06:58
|
On Saturday, April 12, 2003, at 04:00 AM, J. op den Brouw wrote: > Shall I make a list of candidate docs for sourceforge? > Neal takes care of the literature list? Yes, if you could take a look at the docs and compare them with the SF site, that would be great. As I said, I think I put up the /dev/ site into that section, but it's been quite a while. -Geoff |
|
From: Simon G. <big...@ho...> - 2003-04-14 15:35:45
|
Hi i would to know if with Htdig we can do exact phrase search? and if so in wich version ? Thanx a lot Simon Gauthier big...@ho... _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail |
|
From: Jim C. <li...@yg...> - 2003-04-14 09:34:52
|
On Thursday, April 10, 2003, at 08:30 PM, Pritesh Sharma wrote: > I am interested in knowing whether the htdig search process is > multithreaded ie. can the server respond to multiple requests > simultaneously, assuming that the server is running on a > multiprocessor. The htsearch executable is not multithreaded. It is a CGI program rather than a server, and as such a new process is created for each request. These processes share resources just as any other processes would. Jim |
|
From: Geoff H. <ghu...@us...> - 2003-04-13 07:17:47
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: J. op d. B. <ht...@op...> - 2003-04-12 08:59:24
|
Geoff Hutchison wrote: >On Fri, 11 Apr 2003, Neal Richter wrote: > > > >> I'd be happy to take responsibility for keeping up the 'literature' >>page! >> >> > >Oof, it's probably horribly out of date, so this would be great. > > > >> Is there a place to link to misc pages within the soruceforge project >>pages? >> >> > >Yes. There's a "project documentation section" that should have most, if >not all, stuff from the /dev/ section of the website: > >http://sourceforge.net/docman/?group_id=4593 > Okay, so many internal documents and/or development documents can be put on the SourceForge site. This means that we can tide up the /dev site and put relevant docs on SourceForge and eventually remove it. Note that it's not a big deal for mirrors to update their copies of /dev as it is included in the CVS tree. Only the link table on the web site will be updated.... Shall I make a list of candidate docs for sourceforge? Near takes care of the literature list? Greetz, --Jesse |
|
From: Geoff H. <ghu...@ws...> - 2003-04-11 23:55:33
|
On Fri, 11 Apr 2003, Neal Richter wrote: > I'd be happy to take responsibility for keeping up the 'literature' > page! Oof, it's probably horribly out of date, so this would be great. > Is there a place to link to misc pages within the soruceforge project > pages? Yes. There's a "project documentation section" that should have most, if not all, stuff from the /dev/ section of the website: http://sourceforge.net/docman/?group_id=4593 Cheers, -Geoff |
|
From: Neal R. <ne...@ri...> - 2003-04-11 23:14:40
|
On Fri, 11 Apr 2003, Gilles Detillieux wrote: > My only concern at the time was that there seemed to be a lot of useful > information under /dev that I never saw on the SourceForge site. If this > info is no longer useful (which I doubt), or if this has all now been put > on the SourceForge site, then by all means feel free to get rid of /dev. The stuff in the 'other' section is nice. I'd be happy to take responsibility for keeping up the 'literature' page! Is there a place to link to misc pages within the soruceforge project pages? Thanks Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gilles D. <gr...@sc...> - 2003-04-11 20:16:19
|
According to Geoff Hutchison: > > However, in the mirror list, there are entries to all > > the /dev sites. Maybe this is confusing. So: > > > > If /dev isn't used anymore, because the SourceForge > > site is used now, why not delete them and remove > > the entries from the mirror list. > > This has been my feeling too, but IIRC Gilles thought it would be better > to keep the /dev/ section. Honestly, I don't have much opinion one way or > the other on this, but it could cut down on the requirements for mirror > sites. My only concern at the time was that there seemed to be a lot of useful information under /dev that I never saw on the SourceForge site. If this info is no longer useful (which I doubt), or if this has all now been put on the SourceForge site, then by all means feel free to get rid of /dev. The problem right now is that a lot of the stuff under /dev isn't all that accessible, because there's no longer a link to it from the main site. My recommendation would be to put the /dev info elsewhere on the main site, or else on SourceForge, and then do away with /dev. I won't be the one to do it, though. As far as the testing of 3.2.0b4/b5 is concerned, I regretably have to let you know that my contribution to the effort will be minimal. Things have been too busy for me at work since I got back, and I don't see things letting up before the summer. Then it'll be vacation time and the cycle will start again. I don't even have time now to follow the mailing list traffic other than peeking at the occasional message which seems important. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Geoff H. <ghu...@ws...> - 2003-04-11 18:52:13
|
> However, in the mirror list, there are entries to all > the /dev sites. Maybe this is confusing. So: > > If /dev isn't used anymore, because the SourceForge > site is used now, why not delete them and remove > the entries from the mirror list. This has been my feeling too, but IIRC Gilles thought it would be better to keep the /dev/ section. Honestly, I don't have much opinion one way or the other on this, but it could cut down on the requirements for mirror sites. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2003-04-11 16:42:26
|
> 2. *Everyone* puts time into trying to locate/fix the compression bug I've certainly been putting in what time I have, but as I'm not having much luck reliably reproducing it, it's a bit difficult for me. Further pointers from you and/or Neil would really help in my search. -Geoff |
|
From: Pritesh S. <prs...@cs...> - 2003-04-11 02:30:42
|
Hi, I am interested in knowing whether the htdig search process is multithreaded ie. can the server respond to multiple requests simultaneously, assuming that the server is running on a multiprocessor. Thanks Pritesh |
|
From: Neal R. <ne...@ri...> - 2003-04-10 18:25:57
|
I certainly don't want you to quit Lachlan! You've been a help finding compression bugs, even if I can't duplicate them yet ;-). I think a bit more time could be spent on the compression bug and we can make the call for a new beta release. I commited the 'libhtdig' directory to CVS and will commit libhtdigphp as soon as I get the build process more generic. It would also be nice to do a kind of code audit soon, were we look at the overall design of the various classes and evaluate them for correctness and efficiency. I'd also like to see the new Search API soon. Any word on that? Thanks. On Sun, 6 Apr 2003, Lachlan Andrew wrote: > Greetings all, > > Can I suggest that releasing 3.2.0b5 is becoming rather urgent? It > was about five months ago that there was a decision that something > had to be released in about a month... > > I see three options. If anyone sees another one, please suggest it! > > 1. Disable database compression by default, and release in two weeks > 2. *Everyone* puts time into trying to locate/fix the compression bug > 3. We conclude no new beta will ever be released, and those of us > interested in releasing software find other projects to work on... > > Yours in hope, > Lachlan > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: J. op d. B. <ht...@op...> - 2003-04-09 07:04:31
|
Hi all, I was wondering if the htdig group did anything with the developer site (/dev on the main htdig site). My best guess is that developers go to the Sourceforge site and that the /dev is never updated and/or consulted by anyone. However, in the mirror list, there are entries to all the /dev sites. Maybe this is confusing. So: If /dev isn't used anymore, because the SourceForge site is used now, why not delete them and remove the entries from the mirror list. Always trying to keep the discussion going.... --Jesse op den Brouw. |
|
From: Aaron B. <ab...@mi...> - 2003-04-07 13:59:53
|
Lachlan Andrew wrote:
> There *shouldn't* be a need to flush stdout unless the program exits
> abnormally (e.g. segfaults). Is the resulting file a multiple of
> 4096 bytes long? What happens if you simply pipe the output through
> more/less? Is it still truncated?
Here are two more strace's which are set to watch read()'s on FD=5 and
write()'s on FD=1. I have found that when the output is sent to the
screen (line buffered) the final write() before the open of the footer
is different than the final write() when the output is to a file or program.
output to screen (line buffered) strace.screen line # 1114:
----------------------------------
write(1, "</dd></dl>\n", 11) = 11
| 00000 3c 2f 64 64 3e 3c 2f 64 6c 3e 0a </dd></d l>.|
open("/web/pspec/htdocs/Templates/footer.html", O_RDONLY) = 5
output to file/program (4K block buffered) strace.file line # 865:
------------------------------------------
| 00fe0 20 32 36 20 4e 65 77 65 73 74 3a 20 30 34 2d 32 26 Newe
st: 04-2 |
| 00ff0 31 2d 39 38 20 4f 6c 64 65 73 74 3a 20 30 35 2d 1-98 Old
est: 05- |
open("/web/pspec/htdocs/Templates/footer.html", O_RDONLY) = 5
----
The line buffered write() data is the final result of the search (#9 of
9). This is correct since the output of results is complete and the
next item to read/write is the footer.
The block buffered write() data is not correct since the output of
results is not complete (this output is actually part of result # 7 of
9). Here the program does not complete the output of results before it
goes on to read() in the footer template. However the write() after the
read of the footer template picks up where it left off in result # 7 of
9. Is this correct behavior?
Does this help pinpoint where the bug may be? I'm not the best when it
comes to C++ so any help tracking this down would be appreciated.
Thanks,
-ab
|
|
From: Jim C. <li...@yg...> - 2003-04-06 19:21:34
|
Hi - I am willing to contribute some time to tracking down problems and general testing, but it will be at least another week and a half before I get back to the point of having any free time. Jim On Saturday, April 5, 2003, at 11:05 PM, Lachlan Andrew wrote: > Greetings all, > > Can I suggest that releasing 3.2.0b5 is becoming rather urgent? It > was about five months ago that there was a decision that something > had to be released in about a month... > > I see three options. If anyone sees another one, please suggest it! > > 1. Disable database compression by default, and release in two weeks > 2. *Everyone* puts time into trying to locate/fix the compression bug > 3. We conclude no new beta will ever be released, and those of us > interested in releasing software find other projects to work on... > > Yours in hope, > Lachlan > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev |
|
From: Geoff H. <ghu...@us...> - 2003-04-06 08:27:41
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Lachlan A. <lh...@us...> - 2003-04-06 05:05:43
|
Greetings all, Can I suggest that releasing 3.2.0b5 is becoming rather urgent? It=20 was about five months ago that there was a decision that something=20 had to be released in about a month... I see three options. If anyone sees another one, please suggest it! 1. Disable database compression by default, and release in two weeks 2. *Everyone* puts time into trying to locate/fix the compression bug 3. We conclude no new beta will ever be released, and those of us=20 interested in releasing software find other projects to work on... Yours in hope, Lachlan |
|
From: Aaron B. <ab...@mi...> - 2003-04-05 00:56:05
|
> > Sorry Aaron, I don't know what could be causing the problem. However, > the fact that there are fewer write()s when the ouput is a file is > to be expected. Output is "block-buffered" if the output is not a > terminal, which means it saves up 4096 bytes before writing. If the > output is a tty, the data is "line-buffered", which means it is > written after every '\n'. This explains all the small writes to the terminal and the 4K to the file. > > There *shouldn't* be a need to flush stdout unless the program exits > abnormally (e.g. segfaults). Is the resulting file a multiple of > 4096 bytes long? What happens if you simply pipe the output through > more/less? Is it still truncated? The entire stdout to a file is 12315 bytes. If you remove all the header: Content-type: text/html^M ^M the file becomes 12288 (4096 * 3). The same problem is seen when i pipe the output to less. Thanks, -ab |
|
From: Lachlan A. <lh...@us...> - 2003-04-04 23:41:22
|
Sorry Aaron, I don't know what could be causing the problem. However,=20 the fact that there are fewer write()s when the ouput is a file is=20 to be expected. Output is "block-buffered" if the output is not a=20 terminal, which means it saves up 4096 bytes before writing. If the=20 output is a tty, the data is "line-buffered", which means it is=20 written after every '\n'. There *shouldn't* be a need to flush stdout unless the program exits=20 abnormally (e.g. segfaults). Is the resulting file a multiple of=20 4096 bytes long? What happens if you simply pipe the output through =20 more/less? Is it still truncated? Good luck! Lachlan On Saturday 05 April 2003 07:21, Aaron Bush wrote: > If i run htsearch from the command line and allow the output to go > to the screen the results appear complete and proper. If i run > htsearch from the command line and redirect the output to a file > via ">/tmp/pg" or from a browser via apache then the results will > be truncated. I have ran strace against htsearch with output to > the screen and to a file and there are clearly some missing > write()'s. |
|
From: Aaron B. <ab...@mi...> - 2003-04-04 21:58:00
|
The previous post had the same file attached twice. Here is the correct strace.file. Maybe a problem with not flushing stdout? Is this necessary? Thanks, -ab |
|
From: Aaron B. <ab...@mi...> - 2003-04-04 21:22:14
|
I posted this as a question to the users list but have more detailed info that suggests this may be a bug with the system libraries... These tests were ran using v3.1.6. If i run htsearch from the command line and allow the output to go to the screen the results appear complete and proper. If i run htsearch from the command line and redirect the output to a file via ">/tmp/pg" or from a browser via apache then the results will be truncated. I have ran strace against htsearch with output to the screen and to a file and there are clearly some missing write()'s. On the strace with output redirected to a file the write()'s are clearly much less even though the read()'s from the templates are the same. Can someone with a little more intimate knowledge of htsearch help out and see what the problem might be? Thanks, -ab |
|
From: Will B. <wi...@ch...> - 2003-04-03 17:08:23
|
I've noticed a periodic/pause sleep when retrieving content from an SSL site. The exact same site with the same configuration under 3.1.x does not have the problem, nor does the content show the problem if SSL is not used. The information is still retrieved correctly, but the pause of 10 or so seconds every few pages makes it impossible to index the entire site. Again, the exact same configuration to a non-SSL version with the same content does not exhibit the pauses. There are no delays configured in either htdig or the web site. The web site is local (same machine) and there are no network delays. There is no activity during the pause (cpu is idle). I added some debug statements and the pause occurs within the Connection::Read_Partial call from Connection.cc. The pause seems to occur when the read call is made in line 663 (using snapshot from 20030330). Anyone else run into this problem? |
|
From: Geoff H. <ghu...@us...> - 2003-03-30 08:15:20
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b5: Next release, First quarter 2003???
3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease.
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying to add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
-- Does Neal's new zlib patch solve this for now?
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
Can anyone reproduce this? I can't! -- Lachlan
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. regex matching, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
I've tried. Someone "official" please check and remove this -- Lachlan
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
* The code needs a security audit, esp. htsearch. PR#405765.
|
|
From: Neal R. <ne...@ri...> - 2003-03-27 23:21:49
|
I was trying to do a better job of supplying a 'document icon' other than (guessing based on extension) in a PHP search interface (uses libhtdig). Since we have the information at the time we call the external parsers, it would seem the only barrier is a place to put it in the documentDB... Ideas on implementations? Thanks. On Wed, 26 Mar 2003, Geoff Hutchison wrote: > > Has there been any thought to storing the MIME-type of a document > > in the index? An integer would do just fine for this.. I can't seem > > to > > find anything in classes like DocumentRef to indicate that anyone > > has done this before. > > No, it hasn't been done yet. Probably not a bad idea as it could allow > per-search restriction of MIME types (e.g. image-only searching or PDFs > or ... well anything). > > I can think of a variety of good uses for such information (MIME and/or > DTD info). > > -Geoff > > > > ------------------------------------------------------- > This SF.net email is sponsored by: > The Definitive IT and Networking Event. Be There! > NetWorld+Interop Las Vegas 2003 -- Register today! > http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Geoff H. <ghu...@ws...> - 2003-03-27 05:16:19
|
> Has there been any thought to storing the MIME-type of a document > in the index? An integer would do just fine for this.. I can't seem > to > find anything in classes like DocumentRef to indicate that anyone > has done this before. No, it hasn't been done yet. Probably not a bad idea as it could allow per-search restriction of MIME types (e.g. image-only searching or PDFs or ... well anything). I can think of a variety of good uses for such information (MIME and/or DTD info). -Geoff |