Menu

#3 tld() says it accepts URI but actually it expects just a domain name

1.0
open
nobody
None
2015-08-24
2015-08-24
Dzmitry
No

Both the documentation and the parameter name suggest that tld() would accept a full URI (same as tld_check_uri()). However it only works correctly if given a domain name.

To fix this you would need to change the level collection loop condition (line 563 in tld.c) to something like:

while(*end != '\0' && (level == 0 || (*end != '/' && *end != ':')))

(you might then want to store the 'end' to the tld_result structure too)

... or fix the documentation (and make the tld() function less useful).

I am not actually interested in performing URIs validation per se. All I need is to get TLD. Even if there is some error (e.g. empty password). Thus tld() would work just fine for me ... if it actually accepted URIs.

With that in mind there is no easy way I could use tld_domain_to_lowercase() - I would have to redo most of the tld() work. So it would be great if tld() handled the domain case-insensitivity transparently.
(I cannot convert the whole URI to lowercase either since location and parameters are case sensitive)

Discussion

  • Dzmitry

    Dzmitry - 2015-08-24

    (editor ate the asterisks before 'end' in the code line)

     
  • Alexis Wilke

    Alexis Wilke - 2015-08-24

    Fixed the code so it appears in a pre tag.

     
  • Alexis Wilke

    Alexis Wilke - 2015-08-24
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,7 +1,9 @@
     Both the documentation and the parameter name suggest that tld() would accept a full URI (same as tld_check_uri()). However it only works correctly if given a domain name.
    
     To fix this you would need to change the level collection loop condition (line 563 in tld.c) to something like:
    -while(*end != '\0' && (level == 0 || (*end != '/' && *end != ':')))
    +
    +    while(*end != '\0' && (level == 0 || (*end != '/' && *end != ':')))
    +
     (you might then want to store the 'end' to the tld_result structure too)
    
     ... or fix the documentation (and make the tld() function less useful).
    
     
  • Alexis Wilke

    Alexis Wilke - 2015-08-24

    The tld() function cannot be allocating memory and that's why I could not implement the lowercase in that function (and also keep the speed, because without the lower done once ahead, you would have to do it on the fly once per cmp() call which can be quite many).

    It cannot allocate a temporary buffer because the answer in the tld_info structure points to the input string. Also the input string is a constant so it cannot be modified (i.e. put a '\0' at the end of the TLD.)

    It seems to me that you can easily extract the info from your string and put that in another string befor making the call. That I could offer, I guess, is a way to limit the size of the string when calling the tld_domain_to_lowercase() function.

    Also, I can look into fixing the documentation to make sure one understand that only the domain name itself can be passed to the function (no protocol/user/password/port/path/query string/anchor...)

     

    Last edit: Alexis Wilke 2015-08-24
  • Dzmitry

    Dzmitry - 2015-08-24

    But you don't need to allocate memory in tld(). You could do that in search().
    Yes it would still be multiple times. But not as many as in cmp() - only once per each domain level (i.e. 5 times at most and just once or twice in most cases).

    By the way you could use some fixed buffer to avoid memory allocations. Just give it the size of the longest TLD and make search() return -1 immediately if the (lowercased) domain is longer than that.

     
  • Alexis Wilke

    Alexis Wilke - 2015-08-24

    No, the pointers I return in the tld_info structures cannot be to allocated data (and even less temporary data on the stack, obviously). It has to be to the string you give me as input. And Unicode lowercase() may not have the same length as the input. So it would not be that crystal clear...

     
  • Dzmitry

    Dzmitry - 2015-08-24

    But you don't return pointers to the user data from search(). So the transformed domain will only be used during the search and can be free'd without any problem.

    So search() will indeed receive a pointer to the domain in user data, create a local copy of it in lower case, search for that copy in the TLD's list and return the index (free-ing the lower case copy of the domain before returning).

     

Log in to post a comment.