From: Erik v. d. P. <er...@go...> - 2009-04-02 18:03:42
|
That's cool. We should try this thing out on some real-world URLs, and see which ones are flagged. Erik On Thu, Apr 2, 2009 at 10:35 AM, Andy Heninger <and...@gm...> wrote: > On Thu, Apr 2, 2009 at 10:17 AM, Erik van der Poel <er...@go...> wrote: >> I see. Yeah, it makes sense to use "are" when there are two parameters >> s1 and s2. It just looked funny, that's all. I don't feel strongly >> about this. >> >> By the way, if we have a single domain name that we want to check for >> confusability, what do we check it against? I.e. you have s1, but >> where do you get s2 from? >> >> I suppose you have another API that checks a single string for mixed >> scripts and other issues? > > Yes. It's called uspoof_check(), again with variants for strings of > different types. The set of checks to be performed is a property of > the spoof checker object. For this function, the "position" output > parameter does provide useful information. > > -- Andy >> >> Erik >> >> On Thu, Apr 2, 2009 at 9:56 AM, Andy Heninger <and...@gm...> wrote: >>> On Thu, Apr 2, 2009 at 7:20 AM, Erik van der Poel <er...@go...> wrote: >>>> You spoof are confusable UTF-8? >>>> >>>> I can has cheezburger? >>>> >>>> The English just sounds really funny. >>> >>> Well, it's only sort-of English. >>> The "are confusable" refers to the two string parameters, s1 and s2, >>> not to the "UTF-8" which is an adjective, the type of the string >>> parameters. "UTF8" is in the function name only because this is a >>> plain C API which can't do overloaded functions. >>> >>> "isXXX" is common in API function names, but it is generally referring >>> to a property of a single item, not of a pair of items. >>> >>> I agree that the function name reads oddly, but changing to "is" doesn't fix it. >>> >>> areConfusableUTF8Identifiers(s1, s2, ...) might be clearer. >>> >>> But overall, I favor leaving it as it is. >>> >>> -- Andy >>> >>> >>>> >>>> For the UTF-8 it might be debatable, but >>>> uspoof_areConfusableUnicodeString() should probably have the "are" >>>> changed to "is". I'd change it to "is" for the UTF-8 APIs too. It's >>>> very common to have the word "is" in APIs, but not so common to use >>>> the word "are", I believe. >>>> >>>> Thanks for working on this, >>>> >>>> Erik >>>> >>>> On Wed, Apr 1, 2009 at 10:35 PM, Andy Heninger <and...@gm...> wrote: >>>>> I am proposing a small change to the USpoofChecker API. >>>>> >>>>> In the function >>>>> >>>>> U_DRAFT int32_t U_EXPORT2 >>>>> uspoof_areConfusableUTF8(const USpoofChecker *sc, >>>>> const char *s1, int32_t length1, >>>>> const char *s2, int32_t length2, >>>>> int32_t *position, >>>>> UErrorCode *status); >>>>> >>>>> >>>>> I propose eliminating the "position" parameter. >>>>> >>>>> This parameter turns out to serve no useful purpose. In other spoof >>>>> checking functions, the corresponding "position" parameter returns the >>>>> position of a detected problem with the identifier being checked. For >>>>> this function, we are testing whether two complete identifiers are >>>>> potentially visually confusable. There is no specific position in >>>>> them that causes them to be confusable - they must be confusable at >>>>> all positions for there to be a problem. >>>>> >>>>> Since this is a new API, there are no compatibility issues with >>>>> removing the parameter. >>>>> >>>>> The same change is needed in uspoof_areConfusableUTF8() and >>>>> uspoof_areConfusableUnicodeString() >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> _______________________________________________ >>>>> icu-design mailing list >>>>> icu...@li... >>>>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> icu-design mailing list >>>> icu...@li... >>>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> icu-design mailing list >>> icu...@li... >>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design >>> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> icu-design mailing list >> icu...@li... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design > > ------------------------------------------------------------------------------ > _______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design > |