From: SourceForge.net <no...@so...> - 2004-11-24 01:21:53
|
Patches item #1011857, was opened at 2004-08-18 19:27 Message generated for change (Comment added) made by thekingant You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=300235&aid=1011857&group_id=235 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: François Gagné (frenchfrog) >Assigned to: Tim Ringenbach (marv_sf) Summary: [patch] bug 929510 no log if name have invalid chars Initial Comment: [ 929510 ] no log if screen name have invalid file system characters the patch add a new function that filter invalid windows or linux characters and replace them with %xx where xx is the hex of the char also replace % with the the good %xx. ---------------------------------------------------------------------- >Comment By: Mark Doliner (thekingant) Date: 2004-11-23 20:21 Message: Logged In: YES user_id=20979 This is fixed in CVS, right? ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-09-11 20:46 Message: Logged In: YES user_id=1013807 checking util.c/1.325 Revision 1.325 Thu Sep 2 19:45:15 2004 UTC and here my conclusion: It escape anything that is not a-zA-Z0-9@-_.# If you got a filename in UTF-8 outside of the characters above all characters will get escaped even if they are valid in the system filename encoding charset (no use of g_filename_from_utf8). On a side not its not escaping the invalid windows files. It's works, but it's not really i18n friendly. ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-09-11 12:53 Message: Logged In: YES user_id=159910 I think datallah's patch didn't handle all those special win32 file names like "CON" and what not? ---------------------------------------------------------------------- Comment By: Ethan Blanton (eblanton) Date: 2004-09-11 11:58 Message: Logged In: YES user_id=298616 I think datallah fixed the problem, although not with this patch... The solution was, however, quite similar. Ethan ---------------------------------------------------------------------- Comment By: Luke Schierer (lschiere) Date: 2004-09-11 11:33 Message: Logged In: YES user_id=28833 where do we stand on this patch? ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-09-01 00:26 Message: Logged In: YES user_id=1013807 4) k you're right, I'm not use use to if(open || die) kind of thing but still need to change line 203: unsigned char *nonhex; to: unsigned char *nonhex = NULL; 5) not necessarily "superior solution" but more compliant. ---------------------------------------------------------------------- Comment By: Ethan Blanton (eblanton) Date: 2004-08-31 23:59 Message: Logged In: YES user_id=298616 1) The committed solution is fine. No byte in a UTF-8 string will look like an ASCII character unless it _is_ an ASCII character. 2) This should be addressed. 3) The old logs will never be escaped, so that line should _not_ be changed. 4) The logic prevents this from being a problem -- we know *c is a non-NULL character, and if *(c + 1) or *(c + 2) is NULL it will short-circuit the condition and fail correctly. 5) I suppose these should be fixed, but never again releasing a build for Windows would be a superior solution. 6) Agreed. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-31 23:40 Message: Logged In: YES user_id=1013807 checking the patch applied to log.c/1.102/Tue Aug 31 18:28:35 2004. And here is what I have found: 1)doing a while(*c) on a UTF-8 string is probably bad (since one unicode char could be more then one byte), should use g_utf8_get_char, g_utf8_find_next_char and with a g_utf8_validate at the beginning 2)should convert the UTF8 string to filename encoding with g_filename_from_UTF8 by also paying attention to the fact that g_filename_from_UTF8 could fails if some UTF8 chars could not be converted to the filename encoding charset. 3)in function old_logger_list line 910: char *logfile = g_strdup_printf("%s.log", gaim_normalize(account, sn)); in function old_logger_total_size line 1031: char *logfile = g_strdup_printf("%s.log", gaim_normalize(account, name)); logfile is not escaped 4) possible segfault in unescape_filename line 182: "if (*(c + 1) && *(c + 2)) {" could poke char outside de string also possible g_free of a random address since gaim_base16_decode could return before changing the value of the return char. line 203: unsigned char *nonhex; by: unsigned char *nonhex = NULL; or add the following line in util.c : gaim_base16_decode line 91: *raw = NULL; 5) does not check for invalid filename on win32: CON, PRN, AUX, NUL, COM0-9, LPT0-9, CON.TXT (true for lower, upper and mixed case) 6) could be a good idea to escape any chars below 0x20 7) could be a idea to check the last file i have attached to this thread and perhaps do a mix of both solutions. ---------------------------------------------------------------------- Comment By: Luke Schierer (lschiere) Date: 2004-08-31 20:52 Message: Logged In: YES user_id=28833 was this fixed by the patch from datallah i accepted earlier today? ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-26 15:44 Message: Logged In: YES user_id=1013807 ok the patch against 0.82 ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-24 09:35 Message: Logged In: YES user_id=1013807 ok then, this is patch is fine. Would just be nicer a add a new function in gaim util.c/util.h to convert utf-8 to system filename charset _with_ fall back (unlike g_filename_from_utf8). ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-22 19:01 Message: Logged In: YES user_id=159910 Oh, I just found out that part of a utf-8 multi-byte characters cannot be ASCII, so one less thing to worry about. ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-21 03:25 Message: Logged In: YES user_id=159910 Someone should make our i18n guru to come here to say something, you know, that Blanton kid. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-21 01:53 Message: Logged In: YES user_id=1013807 dunno but right now g_filename_from_utf8 fails, so i guess the right way would be to check each chars one by one and replace the invalid with %xxxxxxxx or another char. BTW extended ASCII is 8-bit wide chars but its not the same mapping to most os, (windows, mac OSX, linux, sun) so its pretty much a big mess. and its not utf8 compatible either... ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-21 00:48 Message: Logged In: YES user_id=159910 Isn't extended ASCII just 8-bit wide chars? What should happen if part of a multi-byte UTF-8 character is not a legal filesystem character? ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 23:08 Message: Logged In: YES user_id=1013807 ok further testing revealed: 1) g_filename_from_utf8 fails if its not capable of converting a uft8 char to the system filename charset. 2) if you give a utf8 string with characters wider than 1 to windows its simply take each byte one by one and take them as extended ascii. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 22:25 Message: Logged In: YES user_id=1013807 ok fixed a minor issue and did the diff the good way around :P ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 21:11 Message: Logged In: YES user_id=1013807 ok 1) g_build_filename seem to only put separators between the strings you pass to it (replacing a big mess of strcpy, malloc, ex). 2) filenames in windows seem to be in extended ascii thus any characters outside ascii have to be converted or replaced (since gaim, glib, GTK are working with UTF-8 internaly). ex: 'é' (é). 3) it don't solve the whole issue about characters not accepted in filenames. ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-20 21:11 Message: Logged In: YES user_id=159910 I think you made a patch with the - become the + ;-) Switching the two would be cool. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 21:01 Message: Logged In: YES user_id=1013807 New patch submitted with diff -u, UTF-8 compliant ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-20 20:54 Message: Logged In: YES user_id=159910 I don't claim to know much about i18n, but I would assume that g_build_filename already returns something that's suitable as a filename? Anyhow, I don't understand what g_filename_from_utf8 is for. On linux, a file name is just a bitstream, and there's no single encoding for file names. On OS X, I believe the "official" encoding is UTF8, but I don't know if you can work around it. Is Windows any different? If Windows does allow UTF8 encoded file name, then this is totally a non-issue. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 16:09 Message: Logged In: YES user_id=1013807 s/g_build_filename/g_filename_from_utf8/g ;) ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 16:08 Message: Logged In: YES user_id=1013807 follow up on the previous post: ud in g_build_filename is given by gaim_user_dir which is ~given by g_get_home_dir which seem(not specified) to return in locale filename system charset and not in utf8. So the g_build_filename should be call on individual items I guess. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 15:44 Message: Logged In: YES user_id=1013807 oh, we have a problem then because all the log functions are simply using g_build_filename to get the path with absolutely no call to convert the utf8 characters to system filename charset. Every log functions thus need to call g_filename_from_utf8. [code in html logger] dir = g_build_filename(ud, "logs", prpl, guy, name_clean, NULL); ... filename = g_build_filename(dir, date, NULL); ... data->file = fopen(filename, "a"); [/code in html logger] The only question, should g_filename_from_utf8 be call after g_build_filename or before on individual items? ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-20 11:59 Message: Logged In: YES user_id=159910 I don't think that's what you want, since if my reading of g_filename_from_utf8 is correct, you can still end up with a string with multi-byte characters. I think what you want to do is, instead of looping over the byte array, use g_utf8_next_char, g_utf8_get_char to check each characters. Since those 2 functions don't make sure the string is utf8, you should also make sure g_utf8_validate is called. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-20 09:20 Message: Logged In: YES user_id=1013807 for the case of '/' I don't really know but since this log function is for any protocol there is not way to know if all possible protocols will always invalid the '/'. I have checked the glib for a possible "g_filename_convert_to_valid_filename" but it don't seem to exist. What I could do is to a g_filename_from_utf8 before cycling the filename. This way it should remove the problem with possible more than 1 char wide utf-8 characters. ---------------------------------------------------------------------- Comment By: Ka-Hing Cheung (bsponline) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=159910 Is '/' even a valid screenname character? And something tells me that looping over the byte array this way isn't so friendly with multibyte characters, I think you should use glib's utf-8 functions for that. Also, please submut a unified diff (diff -u), it's much easier to read. ---------------------------------------------------------------------- Comment By: François Gagné (frenchfrog) Date: 2004-08-18 19:31 Message: Logged In: YES user_id=1013807 oops, didn't attach the diff ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=300235&aid=1011857&group_id=235 |