libtld / Tickets / #3 tld() says it accepts URI but actually it expects just a domain name

Dzmitry - 2015-08-24

(editor ate the asterisks before 'end' in the code line)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexis Wilke - 2015-08-24

Fixed the code so it appears in a pre tag.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Description has changed:

Diff:

--- old
+++ new
@@ -1,7 +1,9 @@
 Both the documentation and the parameter name suggest that tld() would accept a full URI (same as tld_check_uri()). However it only works correctly if given a domain name.

 To fix this you would need to change the level collection loop condition (line 563 in tld.c) to something like:
-while(*end != '\0' && (level == 0 || (*end != '/' && *end != ':')))
+
+    while(*end != '\0' && (level == 0 || (*end != '/' && *end != ':')))
+
 (you might then want to store the 'end' to the tld_result structure too)

 ... or fix the documentation (and make the tld() function less useful).

Alexis Wilke - 2015-08-24

The tld() function cannot be allocating memory and that's why I could not implement the lowercase in that function (and also keep the speed, because without the lower done once ahead, you would have to do it on the fly once per cmp() call which can be quite many).

It cannot allocate a temporary buffer because the answer in the tld_info structure points to the input string. Also the input string is a constant so it cannot be modified (i.e. put a '\0' at the end of the TLD.)

It seems to me that you can easily extract the info from your string and put that in another string befor making the call. That I could offer, I guess, is a way to limit the size of the string when calling the tld_domain_to_lowercase() function.

Also, I can look into fixing the documentation to make sure one understand that only the domain name itself can be passed to the function (no protocol/user/password/port/path/query string/anchor...)

Last edit: Alexis Wilke 2015-08-24

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dzmitry - 2015-08-24

But you don't need to allocate memory in tld(). You could do that in search().
Yes it would still be multiple times. But not as many as in cmp() - only once per each domain level (i.e. 5 times at most and just once or twice in most cases).

By the way you could use some fixed buffer to avoid memory allocations. Just give it the size of the longest TLD and make search() return -1 immediately if the (lowercased) domain is longer than that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexis Wilke - 2015-08-24

No, the pointers I return in the tld_info structures cannot be to allocated data (and even less temporary data on the stack, obviously). It has to be to the string you give me as input. And Unicode lowercase() may not have the same length as the input. So it would not be that crystal clear...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dzmitry - 2015-08-24

But you don't return pointers to the user data from search(). So the transformed domain will only be used during the search and can be free'd without any problem.

So search() will indeed receive a pointer to the domain in user data, create a local copy of it in lower case, search for that copy in the TLD's list and return the index (free-ing the lower case copy of the domain before returning).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tld() says it accepts URI but actually it expects just a domain name

Extract the TLD of any, world wide, URI.

Milestone

Searches

Help

#3 tld() says it accepts URI but actually it expects just a domain name

Discussion