You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Gilles D. <gr...@sc...> - 2004-06-09 20:10:42
|
According to Gabriele Bartolini: > > Yes, probably another conclusion I jumped to, based on the recollection > > that Neal had done the Win32 port. After reviewing cvs diffs yesterday, > > I realized that Gabriele had committed the conf_lexer.cxx that had those > > ifdefs. > > I am sorry, but I did not think I had done it. I had a better look and I > just committed the changes made by Marco Nenciarini. At that time, I > tested everything, it worked and I applied the changes. > > However, I just had a look at your patch, Gilles, and manually applied > it on the current CVS code; I then ran: Just curious, but why did you need to manually apply my patch? Did it not work with the patch command? The manual changes to conf_lexer.lxx were OK, despite lots of changes to the indentation which now make it harder to read (remember standard tab stops are 8 spaces), but you missed one of my changes to conf_parser.yxx, which is likely to break things. So, I've rebuilt a patch, attached, that includes my changes to both of these files, with a few cosmetic changes, plus the changes to config_parser.cxx (which I rebuilt on a system with bison 1.875c). The only missing piece is conf_lexer.cxx, which I'm reluctant to rebuild because the most recent flex program I have installed on any of my systems is flex-2.5.4a-31.1 (on Red Hat FC2). I've gzipped the patch so it won't get messed up by any e-mail program. Could you try this patch, then rebuild conf_lexer.cxx with your flex program? > flex -oconf_lexer.cxx conf_lexer.lxx > > and > > bison -o conf_parser.cxx conf_parser.yxx > > Here are my flex and bison versions: > - flex 2.5.31 > - bison (GNU Bison) 1.875a > > In attach, you find the patch I just made. I'd suggest to wait after > 3.2.0b6 is out to commit it into the CVS, unless you guys tell me to do > differently. My lack of faith comes from my ignorance with lex/yacc, > even though I built the package and I got no errors. > > I wait your orders guys for the release. Please vote: > > 1) apply this patch on 3.2.0b6 and test it > 2) release 3.2.0b6 and apply it straightly after > > Please tell me what you think and eventually have a try. Well, remember what this patch is supposed to fix: - improved error handling, gives file name and correct line number, even if using include files - allows space before comment, because otherwise it would just complain about the "#" character and go on to parse the text after it as a definition - allows config file with an unterminated line at end of file, by pushing an extra newline token to the parser at EOF - parser correctly handles extra newline tokens, by moving this handling out of simple_expression, and into simple_expression_list and block, as simple_expression must return a new ConfigDefaults object and a newline token doesn't cut it (caused segfaults when dealing with fix above) It would be a shame to release yet another beta release and still not have these problems fixed! Proper error handling has been lacking for a long time, and many users have been confused by the config parser's cryptic error messages. This will at least tell them what file to look at, and the proper line number when using includes. The problem with files then end without a newline has caught many people too. 3.2.0b6 is likely to be the last beta before 3.2.0 final, so if you're at all worried about the impact of this change, isn't it better to put it in the hands of beta testers before final release? I'd really like this patch to go into b6, but it would be a good idea if a few other developers tested it first, as I asked back in April. All I can say is it works for me (but I tested it using flex 2.5.4a, and only on RH 9). -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Joe R. J. <jj...@cl...> - 2004-06-09 19:28:40
|
On Wed, 9 Jun 2004, Christopher Murtagh wrote: > Date: Wed, 09 Jun 2004 14:37:33 -0400 > From: Christopher Murtagh <chr...@mc...> > To: htd...@li... > Subject: Re: [htdig-dev] boolean terms in phrase - bug? > > Has no one else been able to reproduce this? Should I just try a later > snapshot? Sorry to be a pest about this, I've been working on a major > search tool that I want to go live with either today or tomorrow and > would like to know if I need to solve this in the wrapper or not. > > Cheers, > > Chris It's a serious BUG;( Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... > On Tue, 2004-06-08 at 10:39, Christopher Murtagh wrote: > > When I try to perform a boolean search with a phrase that contains a > > boolean keyword, htsearch seems to spin indefinitely. For example, a > > boolean search for: > > > > '"black and white" and color not grey' > > > > will cause htsearch to consume 100% CPU resources and not seem to end > > until it is sent a sighup. > > > > However, if I remove the 'and' from the phrase: > > > > '"black white" and color not grey' > > > > this works without any problems. > > > > This problem happens with any boolean term (and, not & or) and will > > cause a problem even if the boolean logic is broken within the phrase: > > > > '"black white and" and color not grey' > > > > also spins indefinitely (without a boolean error message). > > > > Is this a known problem? I'm currently running a 3.2.0b5 snapshot, has > > it been fixed in 3.2.0b6? > > > > > > Cheers, > > > > Chris > -- > Christopher Murtagh > Enterprise Systems Administrator > ISR / Web Communications Group > McGill University > Montreal, Quebec > Canada > > Tel.: (514) 398-3122 > Fax: (514) 398-2017 |
From: Christopher M. <chr...@mc...> - 2004-06-09 18:39:52
|
Has no one else been able to reproduce this? Should I just try a later snapshot? Sorry to be a pest about this, I've been working on a major search tool that I want to go live with either today or tomorrow and would like to know if I need to solve this in the wrapper or not. Cheers, Chris On Tue, 2004-06-08 at 10:39, Christopher Murtagh wrote: > When I try to perform a boolean search with a phrase that contains a > boolean keyword, htsearch seems to spin indefinitely. For example, a > boolean search for: > > '"black and white" and color not grey' > > will cause htsearch to consume 100% CPU resources and not seem to end > until it is sent a sighup. > > However, if I remove the 'and' from the phrase: > > '"black white" and color not grey' > > this works without any problems. > > This problem happens with any boolean term (and, not & or) and will > cause a problem even if the boolean logic is broken within the phrase: > > '"black white and" and color not grey' > > also spins indefinitely (without a boolean error message). > > Is this a known problem? I'm currently running a 3.2.0b5 snapshot, has > it been fixed in 3.2.0b6? > > > Cheers, > > Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: Gilles D. <gr...@sc...> - 2004-06-09 17:02:02
|
According to Lachlan Andrew: > Even though Gilles has solved Dominique's problem, the issue of > chaining fuzzy rules remains. One possible solution would be to make > the user (administrator) specify explicit chainings. For example > > match_method: accents:0.9 endings:0.9 accents->endings:0.5 > > That overcomes the problem of possible looping, and undesirable > orderings of rules creating non-words, and also gives more control > over the weights. Yes, this was the approach Geoff suggested back in 2001, and I think that's the way it should be done, rather than automatically chaining them. See http://www.mail-archive.com/htd...@li.../msg02598.html BTW, that should be search_algorithm, not match_method. I suppose we might even want to allow even more complex chaining, e.g.: synonyms->endings->accents:0.4, or even double-dipping to further expand things in case things were missed the first time around, e.g.: synonyms->endings->accents->synonyms->endings:0.4 Or maybe we could add an iteration count to the syntax, to get it to repeat the whole chain. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Dominique A. <dom...@es...> - 2004-06-09 13:57:39
|
Hi, Thanks for the help, its work perfectly. I just need to complete french.= 0 file for new words. > According to Dominique Arpin: >> I will install htdig 3.2 beta6 and I will try this patch. > > Your french.aff file only defines altstringchar entries for tex, not > for latin1, so you shouldn't need the patch I mentioned. As far as > I can tell, the patch won't make any difference for your affix file. > >> here my config: >> >> endings_affix_file: ${lang_dir}/french.aff >> endings_dictionary: ${lang_dir}/french.0 >> endings_root2word_db: ${common_dir}/root2wordfr.db >> endings_word2root_db: ${common_dir}/word2rootfr.db >> >> You can see a copy of the files on: >> >> http://darwin.espacecourbe.com/~dominique/ > > The problem is in the french.0 file. You'll need to add an entry: > > acouph=E8ne/X > > in the appropriate spot, and then run "htfuzzy endings". Do the same > for any other word that doesn't pluralize properly, if the word doesn't > appear in french.0. > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > --=20 Dominique Arpin_______________________[ Espace administrateur r=E9seau Courbe ] http://www.espacecourbe.com/ t=E9l=E9phone 514.933.9861 t=E9l=E9copieur 514.933.9546 |
From: Lachlan A. <lh...@us...> - 2004-06-09 11:43:44
|
Greetings, Sorry, I didn't get to read my email yesterday. I've just applied a patch to support OpenOffice.org documents. I don't mind if it misses out on 3.2.0b6, but I hope it doesn't cause you any problems. Thanks for doing the release!! Lachlan On Tue, 8 Jun 2004 07:14 am, Gabriele Bartolini wrote: > Hi guys, > > I invite you to point the last changes you would like to be > applied before 3.2.0b6 is out. I intend to release it tomorrow > indeed. Sounds good? > > Ciao, > -Gabriele -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Lachlan A. <lh...@us...> - 2004-06-09 10:57:45
|
Greetings all, Even though Gilles has solved Dominique's problem, the issue of chaining fuzzy rules remains. One possible solution would be to make the user (administrator) specify explicit chainings. For example match_method: accents:0.9 endings:0.9 accents->endings:0.5 That overcomes the problem of possible looping, and undesirable orderings of rules creating non-words, and also gives more control over the weights. $0.02 Lachlan On Wed, 9 Jun 2004 06:41 am, Gilles Detillieux wrote: > if we "silently > augment" the search query as a fuzzy match method, then we still > run into the need for chaining. -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Lachlan A. <lh...@us...> - 2004-06-09 10:51:02
|
On Wed, 9 Jun 2004 03:41 pm, Joe R. Jah wrote: > Hi Folks, > > Make check errors on BSD/OS 4.3.1: > > ../htlib/.libs/libht.a(HtWordType.o): In function > `HtStripPunctuation(String &)': > /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined > reference to `WordType::instance' gmake[2]: *** [testnet] Error 1 Greetings Joe, This is the same problem as Jesse was getting on HP-UX... To hunt this problem down, could you please 1. Try the explicit g++ command I suggested in <http://www.mail-archive.com/htd...@li.../msg02078.html> 2. Replace '--mode=link' by '--mode=link --preserve-dup-deps' in line 324 of test/Makefile and then try make check again. 3. Replace the line something like HTLIBS = $(top_builddir)/htnet/libhtnet.la \ $(top_builddir)/htcommon/libcommon.la \ $(top_builddir)/htword/libhtword.la \ $(top_builddir)/htlib/libht.la \ $(top_builddir)/htcommon/libcommon.la \ $(top_builddir)/htword/libhtword.la \ $(top_builddir)/db/libhtdb.la \ $(top_builddir)/htlib/libht.la in test/Makefile, by a line like HTLIBS = $(top_builddir)/htnet/libhtnet.la \ $(top_builddir)/htcommon/libcommon.la \ $(top_builddir)/htword/libhtword.la \ $(top_builddir)/htlib/libht.la \ $(top_builddir)/./htcommon/libcommon.la \ $(top_builddir)/./htword/libhtword.la \ $(top_builddir)/./db/libhtdb.la \ $(top_builddir)/./htlib/libht.la (that is, for the repeated libraries, add a './' to the path) and then rerun make check. 4. Type nm htword/.libs/libhtword.a | grep instance nm test/testnet.o | grep instance 5. Type cp /bin/true test/testnet make check > Warnings from htdig: > > Warning: Configuration option heading_factor_1 is no longer > supported > Warning: Configuration option heading_factor_2 is no longer > supported > Warning: Configuration option heading_factor_3 is no longer > supported > Warning: Configuration option heading_factor_4 is no longer > supported > Warning: Configuration option heading_factor_5 is no longer > supported > Warning: Configuration option heading_factor_6 is no longer > supported > Warning: Configuration option modification_time_is_now is no longer > supported > Warning: Configuration option pdf_parser is no longer > supported > Warning: Configuration option translate_amp is no longer > supported > Warning: Configuration option translate_lt_gt is no longer supported > Warning: Configuration option translate_quot is no longer supported > > Huh? Because people were confused by pdf_parser no longer working in ht://Dig, it now checks for old 3.1.x configuration attributes which are in the htdig.conf file but not supported by ht://Dig 3.2.x Are any of these options specified in your htdig.conf? If not, this is a bug... Thanks for the testing, Lachlan -- lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Gabriele B. <g.b...@co...> - 2004-06-09 09:28:17
|
Hi guys, > Yes, probably another conclusion I jumped to, based on the recollection > that Neal had done the Win32 port. After reviewing cvs diffs yesterday, > I realized that Gabriele had committed the conf_lexer.cxx that had those > ifdefs. I am sorry, but I did not think I had done it. I had a better look and I just committed the changes made by Marco Nenciarini. At that time, I tested everything, it worked and I applied the changes. However, I just had a look at your patch, Gilles, and manually applied it on the current CVS code; I then ran: flex -oconf_lexer.cxx conf_lexer.lxx and bison -o conf_parser.cxx conf_parser.yxx Here are my flex and bison versions: - flex 2.5.31 - bison (GNU Bison) 1.875a In attach, you find the patch I just made. I'd suggest to wait after 3.2.0b6 is out to commit it into the CVS, unless you guys tell me to do differently. My lack of faith comes from my ignorance with lex/yacc, even though I built the package and I got no errors. I wait your orders guys for the release. Please vote: 1) apply this patch on 3.2.0b6 and test it 2) release 3.2.0b6 and apply it straightly after Please tell me what you think and eventually have a try. Ciao and thanks, -Gabriele |
From: Robert R. <ri...@li...> - 2004-06-09 07:38:56
|
Hello list, In german (and probably dutc,h and the scandinavian laguages, no idea there), accents (or ûmlauts') are widely used. There's been a problem though with uppercase accented characters: Since the Swiss keyboard (and typewriter keyboard) does support both German and French, Uppercase 'Umlauts' (Ö) have been sacrificed to support the french 'é'. Hence, uppercase accented characters need to be composed. What I am hinting at is: In German, the 'ö' can also be written 'oe', for cases where I cant get hold of the umlauted character. Similarly '€' (the Euro Sign) can be written EURO AFAIK, such transliteration does not exist for French. That way we could try to develop a translator class (Mediator, Adapter Pattern) with the following features: - Handle transliteration ('ö'->'oe'). We could also transliterate the html way ('ö'->'ö') - Store the respective words in a transliterated form. Please note that we'd need a token to represent the language of the word as well, else htdig will stubidly transliterate 'oeuf' to 'öuf' and Soeur' to 'söur' However, I suggest we do that for the nexct htdig release so that 3.2.0b6 gets out. Robert |
From: Rustem U. <ru...@ma...> - 2004-06-09 07:13:57
|
Good day, I created a mirror to your site on the address: http://htdig.rin.ru/ Could you, please, include it in the list of your mirrors. Updates: daily Bandwidth: 180 Mbps Looking forward to your reply, Kind regards, Rustem Yusupov |
From: Joe R. J. <jj...@cl...> - 2004-06-09 05:41:37
|
Hi Folks, Make check errors on BSD/OS 4.3.1: ../htlib/.libs/libht.a(HtWordType.o): In function `HtIsWordChar(char)': /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined reference to `WordType::instance' ../htlib/.libs/libht.a(HtWordType.o): In function `HtIsStrictWordChar(char)': /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined reference to `WordType::instance' ../htlib/.libs/libht.a(HtWordType.o): In function `HtWordNormalize(String &)': /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined reference to `WordType::instance' ../htlib/.libs/libht.a(HtWordType.o): In function `HtStripPunctuation(String &)': /tmp/htdig-3.2.0b6/htlib/../htword/WordType.h:66: undefined reference to `WordType::instance' gmake[2]: *** [testnet] Error 1 gmake[2]: Leaving directory `/tmp/htdig-3.2.0b6/test' gmake[1]: *** [check-am] Error 2 gmake[1]: Leaving directory `/tmp/htdig-3.2.0b6/test' gmake: *** [check-recursive] Error 1 Warnings from htdig: Warning: Configuration option heading_factor_1 is no longer supported Warning: Configuration option heading_factor_2 is no longer supported Warning: Configuration option heading_factor_3 is no longer supported Warning: Configuration option heading_factor_4 is no longer supported Warning: Configuration option heading_factor_5 is no longer supported Warning: Configuration option heading_factor_6 is no longer supported Warning: Configuration option modification_time_is_now is no longer supported Warning: Configuration option pdf_parser is no longer supported Warning: Configuration option translate_amp is no longer supported Warning: Configuration option translate_lt_gt is no longer supported Warning: Configuration option translate_quot is no longer supported Huh? Regards Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2004-06-08 21:23:46
|
According to Dominique Arpin: > I will install htdig 3.2 beta6 and I will try this patch. Your french.aff file only defines altstringchar entries for tex, not for latin1, so you shouldn't need the patch I mentioned. As far as I can tell, the patch won't make any difference for your affix file. > here my config: >=20 > endings_affix_file: ${lang_dir}/french.aff > endings_dictionary: ${lang_dir}/french.0 > endings_root2word_db: ${common_dir}/root2wordfr.db > endings_word2root_db: ${common_dir}/word2rootfr.db >=20 > You can see a copy of the files on: >=20 > http://darwin.espacecourbe.com/~dominique/ The problem is in the french.0 file. You'll need to add an entry: acouph=E8ne/X in the appropriate spot, and then run "htfuzzy endings". Do the same for any other word that doesn't pluralize properly, if the word doesn't appear in french.0. > thanks >=20 >=20 > > According to Lachlan Andrew: > >> Greetings Dominique, > >> > >> I have tried to reproduce your problem (as I understood it), but > >> can't. Several possibilities come to mind: > >> 1. You are (as Gilles suggested) relying on the fuzzy rule "accent= s" > >> rather than explicitly entering the accent into the query. In > >> this case, you are out of luck. > >> 2. Your endings_dictionary file doesn't contain the words with > >> actual accents. > >> 3. Your endings_dictionary has the accents, but encoded as > >> multi-byte unicode sequences. Currently, ht://Dig doesn't > >> support unicode. > >> In either case 2 or case 3, the solution is to replace the entries i= n > >> your endings_dictionary file with the single-byte latin1 (not > >> unicode) accents. > >> > >> Do any of these cases apply? > > > > Your 3rd possibility brings to mind a 4th one I heard about a few yea= rs > > ago. Some ispell affix files make use of "altstringchar" to define a > > sequence of ASCII characters that can be used in the dictionary file = to > > represent an accented character. If Dominique's francais.0 dictionar= y > > uses these, that could be the problem. > > > > There was a patch posted to the mailing list back in June of 2000, wh= ich > > added a hack to the endings algorithm to support these, for latin1 on= ly. > > The patch was for 3.1.5, so I don't know how well it'll work for 3.1.= 6 or > > the 3.2 betas. For some reason, it never made it into the patch arch= ive, > > but it's available here: > > > > http://www.mail-archive.com/ht...@ht.../msg05248.html > > > > The only way to know for sure which of the 4 possibilities is the cor= rect > > one would be to look at the dictionary and affix file Dominique used = to > > generate the endings database. > > > > -- > > Gilles R. Detillieux E-mail: <gr...@sc...> > > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.c= a/ > > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > >=20 >=20 > --=20 > Dominique Arpin_______________________[ Espace > administrateur r=E9seau Courbe ] >=20 > http://www.espacecourbe.com/ > t=E9l=E9phone 514.933.9861 > t=E9l=E9copieur 514.933.9546 >=20 --=20 Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Joe R. J. <jj...@cl...> - 2004-06-08 21:14:39
|
On Tue, 8 Jun 2004, Gilles Detillieux wrote: > Date: Tue, 8 Jun 2004 15:11:32 -0500 (CDT) > From: Gilles Detillieux <gr...@sc...> > To: lh...@us... > Cc: Dominique Arpin <dom...@es...>, htd...@li... > Subject: Re: [htdig-dev] Re: Accents, endings and chaining > > According to Lachlan Andrew: > > Greetings Dominiqu, > > > > I have tried to reproduce your problem (as I understood it), but > > can't. Several possibilities come to mind: > > 1. You are (as Gilles suggested) relying on the fuzzy rule "accents" > > rather than explicitly entering the accent into the query. In > > this case, you are out of luck. > > 2. Your endings_dictionary file doesn't contain the words with > > actual accents. > > 3. Your endings_dictionary has the accents, but encoded as > > multi-byte unicode sequences. Currently, ht://Dig doesn't > > support unicode. > > In either case 2 or case 3, the solution is to replace the entries in > > your endings_dictionary file with the single-byte latin1 (not > > unicode) accents. > > > > Do any of these cases apply? > > Your 3rd possibility brings to mind a 4th one I heard about a few years > ago. Some ispell affix files make use of "altstringchar" to define a > sequence of ASCII characters that can be used in the dictionary file to > represent an accented character. If Dominique's francais.0 dictionary > uses these, that could be the problem. > > There was a patch posted to the mailing list back in June of 2000, which > added a hack to the endings algorithm to support these, for latin1 only. > The patch was for 3.1.5, so I don't know how well it'll work for 3.1.6 or > the 3.2 betas. For some reason, it never made it into the patch archive, > but it's available here: > > http://www.mail-archive.com/ht...@ht.../msg05248.html > > The only way to know for sure which of the 4 possibilities is the correct > one would be to look at the dictionary and affix file Dominique used to > generate the endings database. Thank you Gilles; it just made it to the patch archive: ftp://ftp.ccsf.org/htdig-patches/3.1.5/latin1_patch.0 Better late than never;) Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2004-06-08 20:42:23
|
According to Neal Richter: > Here are two possible approaches: >=20 > 1) Strip accents from all stored words & queries. This is a fairly com= mon > practice in search engines & NLP systems. The obvious dissadvantage is > that a user can't restrict results to contain that specific accent... t= hey > get back results with all of the different accents for a 'base letter'. For some languages, the accents are more than just mere pronounciation cues, though, and can be quite significant. I think if you stripped accents unconditionally it could lead to a lot of false matches in some languages. I think this approach would probably work fine for languages like Spanish and Italian, but I think there might be some problems for some French words where a user might want to make a distinction between the two. In Scandinavian countries, where for example =F6 (or =F8) is a completely different letter from o, I'd expect accent stripping would generate some pretty bad results. The advantage of treating accents via a fuzzy match method is you have search-time control over whether you will treat accented and unaccented letters as equivalent, and if so, how much weight the variants will have in the search results. > 2) Store BOTH the accented word & unaccented/stripped word in the > db.words.db. Silently augment each search query with the stripped vers= ion > of each word. > This steps around the dissadvantage of #1 and still get the > 'generalization' of stripped accents. I'm not sure exactly how this gets around the problem with the first approach. By putting the stripped words into the same database as the original ones, you lose some ability to make the distinction between the two at search time. Also, if we "silently augment" the search query as a fuzzy match method, then we still run into the need for chaining. If it's via another mechanism, how is that mechanism to be controlled? Also, it may well be that none of these changes will help Dominique, if the source of his problem is indeed that the words he wants to be automatically capitalized aren't even getting into his endings database to begin with. --=20 Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Dominique A. <dom...@es...> - 2004-06-08 20:31:52
|
Hi, I will install htdig 3.2 beta6 and I will try this patch. here my config: endings_affix_file: ${lang_dir}/french.aff endings_dictionary: ${lang_dir}/french.0 endings_root2word_db: ${common_dir}/root2wordfr.db endings_word2root_db: ${common_dir}/word2rootfr.db You can see a copy of the files on: http://darwin.espacecourbe.com/~dominique/ thanks > According to Lachlan Andrew: >> Greetings Dominique, >> >> I have tried to reproduce your problem (as I understood it), but >> can't. Several possibilities come to mind: >> 1. You are (as Gilles suggested) relying on the fuzzy rule "accents" >> rather than explicitly entering the accent into the query. In >> this case, you are out of luck. >> 2. Your endings_dictionary file doesn't contain the words with >> actual accents. >> 3. Your endings_dictionary has the accents, but encoded as >> multi-byte unicode sequences. Currently, ht://Dig doesn't >> support unicode. >> In either case 2 or case 3, the solution is to replace the entries in >> your endings_dictionary file with the single-byte latin1 (not >> unicode) accents. >> >> Do any of these cases apply? > > Your 3rd possibility brings to mind a 4th one I heard about a few years > ago. Some ispell affix files make use of "altstringchar" to define a > sequence of ASCII characters that can be used in the dictionary file to > represent an accented character. If Dominique's francais.0 dictionary > uses these, that could be the problem. > > There was a patch posted to the mailing list back in June of 2000, whic= h > added a hack to the endings algorithm to support these, for latin1 only. > The patch was for 3.1.5, so I don't know how well it'll work for 3.1.6 = or > the 3.2 betas. For some reason, it never made it into the patch archiv= e, > but it's available here: > > http://www.mail-archive.com/ht...@ht.../msg05248.html > > The only way to know for sure which of the 4 possibilities is the corre= ct > one would be to look at the dictionary and affix file Dominique used to > generate the endings database. > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > --=20 Dominique Arpin_______________________[ Espace administrateur r=E9seau Courbe ] http://www.espacecourbe.com/ t=E9l=E9phone 514.933.9861 t=E9l=E9copieur 514.933.9546 |
From: Gilles D. <gr...@sc...> - 2004-06-08 20:12:07
|
According to Lachlan Andrew: > Greetings Dominiqu, > > I have tried to reproduce your problem (as I understood it), but > can't. Several possibilities come to mind: > 1. You are (as Gilles suggested) relying on the fuzzy rule "accents" > rather than explicitly entering the accent into the query. In > this case, you are out of luck. > 2. Your endings_dictionary file doesn't contain the words with > actual accents. > 3. Your endings_dictionary has the accents, but encoded as > multi-byte unicode sequences. Currently, ht://Dig doesn't > support unicode. > In either case 2 or case 3, the solution is to replace the entries in > your endings_dictionary file with the single-byte latin1 (not > unicode) accents. > > Do any of these cases apply? Your 3rd possibility brings to mind a 4th one I heard about a few years ago. Some ispell affix files make use of "altstringchar" to define a sequence of ASCII characters that can be used in the dictionary file to represent an accented character. If Dominique's francais.0 dictionary uses these, that could be the problem. There was a patch posted to the mailing list back in June of 2000, which added a hack to the endings algorithm to support these, for latin1 only. The patch was for 3.1.5, so I don't know how well it'll work for 3.1.6 or the 3.2 betas. For some reason, it never made it into the patch archive, but it's available here: http://www.mail-archive.com/ht...@ht.../msg05248.html The only way to know for sure which of the 4 possibilities is the correct one would be to look at the dictionary and affix file Dominique used to generate the endings database. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Ted Stresen-R. <ted...@ma...> - 2004-06-08 20:02:07
|
When confronted with this sort of thing in my own work I usually create a "translator" class as a "stop-gap" measure while making the transition. The "translator" class simply maps old methods to new methods, making any necessary adjustments to input along the way... kind of like an "abstraction layer" (but that sounds like a highfalutin word to me). Might that make the transition simpler? Ted On Jun 8, 2004, at 8:32 PM, Neal Richter wrote: > A unicode version of HtDig will require tons of work converting to > a decent string class! |
From: Neal R. <ne...@ri...> - 2004-06-08 19:33:54
|
> The big problem with the current endings algorithm is its dependence on > a static dictionary which may be incomplete. The stemming algorithms > which Neal talked about, and wants to add to ht://Dig, adapt automatically > to whatever words get indexed, based on a set of rules for stemming words. Adding stemming to htdig will be my main task next week, full-time. I'll keep you posted. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Neal R. <ne...@ri...> - 2004-06-08 19:32:15
|
On Mon, 7 Jun 2004, Gabriele Bartolini wrote: > At 20.55 07/06/2004, Neal Richter wrote: > > I have written accent stripping code that I'd be happy to add.... > > > > We'll also have a Unicode version soon ;-) Oops... I meant a Unicode version of the accentr stripper available! A unicode version of HtDig will require tons of work converting to a decent string class! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Gilles D. <gr...@sc...> - 2004-06-08 18:43:42
|
According to Lachlan Andrew: > Greetigns Gilles, > > Are you sure that Neal committed the last *.cxx builds? There's > hardly a sign of him when browsing CVS. > > In particular, the #ifdef _WIN32 lines in conf_lexer.cxx appeared on > July 21 when Gabriele applied the patch by Marco Nenciarini. Yes, probably another conclusion I jumped to, based on the recollection that Neal had done the Win32 port. After reviewing cvs diffs yesterday, I realized that Gabriele had committed the conf_lexer.cxx that had those ifdefs. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2004-06-08 18:32:51
|
According to Lachlan Andrew: > Greetings Gilles, >=20 > As I understand Dominique's problem, it is not an issue of chaining=20 > the "accents" fuzzy rule with the "endings" fuzzy rule. He is=20 > searching for a word with an accent. If it matches a word in the=20 > database with an accent, then it should not need a fuzzy algorithm to=20 > register a match. I thought that "accents" causes "acouph=E8ne" to=20 > match "acouphene" (without the accend). I think you're right. I likely jumped to the wrong conclusion, as it resembles a problem that's come up many times before. More likely, the problem is that "herbe" exists in Dominique's francais.0 dictionary, or whatever he used to generate the endings database, but "acouph=E8ne" isn'= t. He'd need to add it, with the appropriates suffixes for pluralization (usually "/S"), and then do "htfuzzy endings" again. The big problem with the current endings algorithm is its dependence on a static dictionary which may be incomplete. The stemming algorithms which Neal talked about, and wants to add to ht://Dig, adapt automaticall= y to whatever words get indexed, based on a set of rules for stemming words. > Dominique, could you confirm that both the document and the query have=20 > an accent? If that is the case, then we *may* be able to fix this=20 > problem without needing to chain fuzzy rules. Also, please check the dictionary file used by your endings_dictionary attribute in htdig.conf, to make sure it contains the proper pluralizatio= n rules for any words for which you have a problem, as for acouph=E8ne in this case. > On Thu, 3 Jun 2004 02:07 am, Gilles Detillieux wrote: > > Dominique had written: > > > > 3- I have a problem with the accent and plurials. > > > > > > > > If a search for "herbe" or "herbes", no problems. But, if the > > > > works have an accent, like "acouph=E8ne", htdig have a problem to > > > > find the plural. > > > > > > > > herbe: 136 results > > > > herbes: 136 results > > > > > > > > acouph=E8ne: 6 results > > > > acouph=E8nes: 25 results > > > > > > > > > > > > search_algorithm: exact:1 endings:1 prefix:1 accent:1 > > > > synonyms:0,5 > > > > htsearch does not yet support chaining of fuzzy match algorithms, > > so the results of the accents algorithms don't have the endings > > algorithm applied to them (nor vice-versa). --=20 Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Christopher M. <chr...@mc...> - 2004-06-08 14:42:44
|
When I try to perform a boolean search with a phrase that contains a boolean keyword, htsearch seems to spin indefinitely. For example, a boolean search for: '"black and white" and color not grey' will cause htsearch to consume 100% CPU resources and not seem to end until it is sent a sighup. However, if I remove the 'and' from the phrase: '"black white" and color not grey' this works without any problems. This problem happens with any boolean term (and, not & or) and will cause a problem even if the boolean logic is broken within the phrase: '"black white and" and color not grey' also spins indefinitely (without a boolean error message). Is this a known problem? I'm currently running a 3.2.0b5 snapshot, has it been fixed in 3.2.0b6? Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: Christopher M. <chr...@mc...> - 2004-06-08 13:15:35
|
On Mon, 2004-06-07 at 14:55, Neal Richter wrote: > I have written accent stripping code that I'd be happy to add.... > > We'll also have a Unicode version soon ;-) This sounds awesome. Is this because of new code in htdig, or the adoption of CLucene? Either way, it sounds great to me. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 |
From: Gabriele B. <an...@ti...> - 2004-06-07 21:20:30
|
At 20.55 07/06/2004, Neal Richter wrote: > I have written accent stripping code that I'd be happy to add.... > > We'll also have a Unicode version soon ;-) That's great news, Neal! -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Toscana, Italia an...@ti... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |