|
From: Lachlan A. <lh...@us...> - 2003-10-13 13:35:07
|
Greetings all,
I have a question about the interpretation of allow_numbers.
If allow_numbers is false, should digits be considered separators? =20
Looking at the code, it seems someone wanted to say that "3G", "Y2K"=20
and "X11" would be words, even if allow_numbers is false, because=20
they contain at least one letter:
int alpha =3D 0;
for(const unsigned char *p =3D
(const unsigned char*)(const char*)(char *)word; *p; p++) {
if(IsStrictChar(*p) || (allow_numbers && IsDigit(*p))) {
alpha =3D 1;
} else if(IsControl(*p)) {
return status | WORD_NORMALIZE_CONTROL;
}
}
//
// Reject if contains no alpha characters
//
if(!alpha) return status | WORD_NORMALIZE_NOALPHA;
Current behaviour is to *ignore* allow_numbers and to default to=20
treating digits as letters [since WORD_TYPE_DIGIT is included in =20
IsChar() and IsStrictChar()].
I propose the following behaviour:
1. If allow_numbers is true then digits are treated the same as=20
extra_word_characters.
2. If allow_numbers is false, then digits are treated as ("invalid")=20
punctuation.
3. The default be changed to allow_numbers=3Dtrue (which is=20
compatibile with the current buggy default behaviour).
Any objections?
Lachlan
On Sat, 11 Oct 2003 05:56, Neal Richter wrote:
> Everyone: Please let me know what kind of time you'd be willing to
> put in to get this stuff tested??!!
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|