[Gaim-patches] [ gaim-Patches-1011857 ] [patch] bug 929510 no log if name have invalid chars

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Patches item #1011857, was opened at 2004-08-18 19:27
Message generated for change (Comment added) made by thekingant
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300235&aid=1011857&group_id=235

Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: François Gagné (frenchfrog)
>Assigned to: Tim Ringenbach (marv_sf)
Summary: [patch] bug 929510 no log if name have invalid chars

Initial Comment:
[ 929510 ] no log if screen name have invalid file
system characters

the patch add a new function that filter invalid
windows or linux characters and replace them with %xx
where xx is the hex of the char also replace % with the
the good %xx.

----------------------------------------------------------------------

>Comment By: Mark Doliner (thekingant)
Date: 2004-11-23 20:21

Message:
Logged In: YES 
user_id=20979

This is fixed in CVS, right?

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-09-11 20:46

Message:
Logged In: YES 
user_id=1013807

checking util.c/1.325 Revision 1.325 Thu Sep 2 19:45:15 2004
UTC and here my conclusion:

It escape anything that is not a-zA-Z0-9@-_.#

If you got a filename in UTF-8 outside of the characters
above all characters will get escaped even if they are valid
in the system filename encoding charset (no use of
g_filename_from_utf8). On a side not its not escaping the
invalid windows files.

It's works, but it's not really i18n friendly.

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-09-11 12:53

Message:
Logged In: YES 
user_id=159910

I think datallah's patch didn't handle all those special
win32 file names like "CON" and what not?

----------------------------------------------------------------------

Comment By: Ethan Blanton (eblanton)
Date: 2004-09-11 11:58

Message:
Logged In: YES 
user_id=298616

I think datallah fixed the problem, although not with this
patch...  The solution was, however, quite similar.

Ethan

----------------------------------------------------------------------

Comment By: Luke Schierer (lschiere)
Date: 2004-09-11 11:33

Message:
Logged In: YES 
user_id=28833

where do we stand on this patch? 

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-09-01 00:26

Message:
Logged In: YES 
user_id=1013807

4) k you're right, I'm not use use to if(open || die) kind
of thing
but still need to change line 203: unsigned char *nonhex;
to: unsigned char *nonhex = NULL;

5) not necessarily "superior solution" but more compliant.

----------------------------------------------------------------------

Comment By: Ethan Blanton (eblanton)
Date: 2004-08-31 23:59

Message:
Logged In: YES 
user_id=298616

1) The committed solution is fine.  No byte in a UTF-8
string will look like an ASCII character unless it _is_ an
ASCII character.

2) This should be addressed.

3) The old logs will never be escaped, so that line should
_not_ be changed.

4) The logic prevents this from being a problem -- we know
*c is a non-NULL character, and if *(c + 1) or *(c + 2) is
NULL it will short-circuit the condition and fail correctly.

5) I suppose these should be fixed, but never again
releasing a build for Windows would be a superior solution.

6) Agreed.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-31 23:40

Message:
Logged In: YES 
user_id=1013807

checking the patch applied to log.c/1.102/Tue Aug 31
18:28:35 2004. And here is what I have found:

1)doing a while(*c) on a UTF-8 string is probably bad (since
one unicode char could be more then one byte), should use
g_utf8_get_char, g_utf8_find_next_char and with a
g_utf8_validate at the beginning

2)should convert the UTF8 string to filename encoding with
g_filename_from_UTF8 by also paying attention to the fact
that g_filename_from_UTF8 could fails if some UTF8 chars
could not be converted to the filename encoding charset.

3)in function old_logger_list line 910:
	char *logfile = g_strdup_printf("%s.log",
gaim_normalize(account, sn));

in function old_logger_total_size line 1031:
	char *logfile = g_strdup_printf("%s.log",
gaim_normalize(account, name));

logfile is not escaped

4) possible segfault in unescape_filename line 182:
"if (*(c + 1) && *(c + 2)) {" could poke char outside de string

also possible g_free of a random address since
gaim_base16_decode could return before changing the value of
the return char.

line 203: unsigned char *nonhex;
by: unsigned char *nonhex = NULL;

or add the following line in util.c : gaim_base16_decode
line 91: *raw = NULL;

5) does not check for invalid filename on win32: CON, PRN,
AUX, NUL, COM0-9, LPT0-9, CON.TXT (true for lower, upper and
mixed case)

6) could be a good idea to escape any chars below 0x20

7) could be a idea to check the last file i have attached to
this thread and perhaps do a mix of both solutions.

----------------------------------------------------------------------

Comment By: Luke Schierer (lschiere)
Date: 2004-08-31 20:52

Message:
Logged In: YES 
user_id=28833

was this fixed by the patch from datallah i accepted earlier
today? 

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-26 15:44

Message:
Logged In: YES 
user_id=1013807

ok the patch against 0.82

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-24 09:35

Message:
Logged In: YES 
user_id=1013807

ok then, this is patch is fine. Would just be nicer a add a
new function in gaim util.c/util.h to convert utf-8 to
system filename charset _with_ fall back (unlike
g_filename_from_utf8).

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-22 19:01

Message:
Logged In: YES 
user_id=159910

Oh, I just found out that part of a utf-8 multi-byte
characters cannot be ASCII, so one less thing to worry about.

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-21 03:25

Message:
Logged In: YES 
user_id=159910

Someone should make our i18n guru to come here to say
something, you know, that Blanton kid.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-21 01:53

Message:
Logged In: YES 
user_id=1013807

dunno but right now g_filename_from_utf8 fails, so i guess
the right way would be to check each chars one by one and
replace the invalid with %xxxxxxxx or another char.

BTW extended ASCII is 8-bit wide chars but its not the same
mapping to most os, (windows, mac OSX, linux, sun) so its
pretty much a big mess. and its not utf8 compatible either...

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-21 00:48

Message:
Logged In: YES 
user_id=159910

Isn't extended ASCII just 8-bit wide chars? What should
happen if part of a multi-byte UTF-8 character is not a
legal filesystem character?

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 23:08

Message:
Logged In: YES 
user_id=1013807

ok further testing revealed:

1) g_filename_from_utf8 fails if its not capable of
converting a uft8 char to the system filename charset.

2) if you give a utf8 string with characters wider than 1 to
windows its simply take each byte one by one and take them
as extended ascii.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 22:25

Message:
Logged In: YES 
user_id=1013807

ok fixed a minor issue and did the diff the good way around :P

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 21:11

Message:
Logged In: YES 
user_id=1013807

ok 1) g_build_filename seem to only put separators between
the strings you pass to it (replacing a big mess of strcpy,
malloc, ex).

2) filenames in windows seem to be in extended ascii thus
any characters outside ascii have to be converted or
replaced (since gaim, glib, GTK are working with UTF-8
internaly). ex: 'é' (&eacute;).

3) it don't solve the whole issue about characters not
accepted in filenames.

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-20 21:11

Message:
Logged In: YES 
user_id=159910

I think you made a patch with the - become the + ;-)
Switching the two would be cool.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 21:01

Message:
Logged In: YES 
user_id=1013807

New patch submitted with diff -u, UTF-8 compliant

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-20 20:54

Message:
Logged In: YES 
user_id=159910

I don't claim to know much about i18n, but I would assume
that g_build_filename already returns something that's
suitable as a filename?
Anyhow, I don't understand what g_filename_from_utf8 is for.
On linux, a file name is just a bitstream, and there's no
single encoding for file names. On OS X, I believe the
"official" encoding is UTF8, but I don't know if you can
work around it. Is Windows any different?
If Windows does allow UTF8 encoded file name, then this is
totally a non-issue.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 16:09

Message:
Logged In: YES 
user_id=1013807

s/g_build_filename/g_filename_from_utf8/g ;)

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 16:08

Message:
Logged In: YES 
user_id=1013807

follow up on the previous post: ud in g_build_filename is
given by gaim_user_dir which is ~given by g_get_home_dir
which seem(not specified) to return in locale filename
system charset and not in utf8.

So the g_build_filename should be call on individual items I
guess.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 15:44

Message:
Logged In: YES 
user_id=1013807

oh, we have a problem then because all the log functions are
simply using g_build_filename to get the path with
absolutely no call to convert the utf8 characters to system
filename charset. Every log functions thus need to call
g_filename_from_utf8.
[code in html logger]
dir = g_build_filename(ud, "logs",
	       prpl, guy, name_clean, NULL);

...
filename = g_build_filename(dir, date, NULL);
...	
data->file = fopen(filename, "a");
[/code in html logger]

The only question, should g_filename_from_utf8 be call after
g_build_filename or before on individual items?

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-20 11:59

Message:
Logged In: YES 
user_id=159910

I don't think that's what you want, since if my reading of
g_filename_from_utf8 is correct, you can still end up with a
string with multi-byte characters.
I think what you want to do is, instead of looping over the
byte array, use g_utf8_next_char, g_utf8_get_char to check
each characters. Since those 2 functions don't make sure the
string is utf8, you should also make sure g_utf8_validate is
called.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-20 09:20

Message:
Logged In: YES 
user_id=1013807

for the case of '/' I don't really know but since this log
function is for any protocol there is not way to know if all
possible protocols will always invalid the '/'. 

I have checked the glib for a possible
"g_filename_convert_to_valid_filename" but it don't seem to
exist. What I could do is to a g_filename_from_utf8 before
cycling the filename. This way it should remove the problem
with possible more than 1 char wide utf-8 characters.

----------------------------------------------------------------------

Comment By: Ka-Hing Cheung (bsponline)
Date: 2004-08-19 12:16

Message:
Logged In: YES 
user_id=159910

Is '/' even a valid screenname character? And something
tells me that looping over the byte array this way isn't so
friendly with multibyte characters, I think you should use
glib's utf-8 functions for that.
Also, please submut a unified diff (diff -u), it's much
easier to read.

----------------------------------------------------------------------

Comment By: François Gagné (frenchfrog)
Date: 2004-08-18 19:31

Message:
Logged In: YES 
user_id=1013807

oops, didn't attach the diff

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300235&aid=1011857&group_id=235

[Gaim-patches] [ gaim-Patches-1011857 ] [patch] bug 929510 no log if name have invalid chars

A universal instant messaging (IM) program

[Gaim-patches] [ gaim-Patches-1011857 ] [patch] bug 929510 no log if name have invalid chars