From: Steven J. S. <sj...@Ju...> - 2002-08-06 17:19:26
|
This is a little function I wrote to escape any character not contained in the variable ok_chars. Characters not represented in that variable are escaped to the corresponding html entities. For example, a space character (ASCII 32) is converted to   It's VBscript. This is an app running on IIS, and the function is used to cleanse data going into a SQL Server 2000 database. First, do the & # and semicolon have any special meaning to SQL2K or any of the other popular database engines? I'm thinking not - but I'm not the expert here. Second, do you think it's ok, given the purpose of the function, to include the % as a valid character that will not be escaped? Thanks ------------------------------------------------------------------------ function sanitize(string1) ok_chars = "1234567890!@%&_=+:,./abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" for i = 1 to len(string1) char=mid(string1,i,1) if instr(ok_chars,char)=0 then temp=temp&"&#" & asc(char) & ";" else temp=temp&char end if next sanitize=temp end function ------------------------------------------------------------------------ -- Steve Sobol, CTO JustThe.net LLC, Mentor On The Lake, OH |
From: Alex R. <al...@se...> - 2002-08-06 17:57:55
|
On Tue, 6 Aug 2002, Steven J. Sobol wrote: > This is a little function I wrote to escape any character not contained > in the variable ok_chars. Characters not represented in that variable are > escaped to the corresponding html entities. For example, a space character > (ASCII 32) is converted to   There's no canonicalization. Use of the asc() function assumes that the char can be represented in 1 byte. Is that a valid assumption at all times? Will you ever have input from non-ascii char sets? What does the asc() function return in those cases? Is that output safe? > It's VBscript. This is an app running on IIS, and the function is used > to cleanse data going into a SQL Server 2000 database. I think you should just drop any chars that aren't allowed. > First, do the & # and semicolon have any special meaning to SQL2K or > any of the other popular database engines? I'm thinking not - but I'm not > the expert here. > > Second, do you think it's ok, given the purpose of the function, to > include the % as a valid character that will not be escaped? I would be warry of "%", "!", "/", and "=" the "/" char is used as a division operator in some SQL dialects, while "!" is used as negation. "=" is obviously part of the SQL BNF, and should be dissalowed. As for "%", I dunno, but it just kinda strikes me as dangerous somehow. > ------------------------------------------------------------------------ > function sanitize(string1) > > ok_chars = "1234567890!@%&_=+:,./abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" > > for i = 1 to len(string1) > char=mid(string1,i,1) > if instr(ok_chars,char)=0 then > temp=temp&"&#" & asc(char) & ";" > else > temp=temp&char > end if > next > sanitize=temp > > end function > ------------------------------------------------------------------------ Good Luck. -- Alex Russell al...@Se... al...@ne... |
From: Steven J. S. <sj...@Ju...> - 2002-08-07 15:04:57
|
On Tue, 6 Aug 2002, Alex Russell wrote: > On Tue, 6 Aug 2002, Steven J. Sobol wrote: > > > This is a little function I wrote to escape any character not contained > > in the variable ok_chars. Characters not represented in that variable are > > escaped to the corresponding html entities. For example, a space character > > (ASCII 32) is converted to   > > There's no canonicalization. Use of the asc() function assumes that the > char can be represented in 1 byte. I acknowledge that that is an issue. > Is that a valid assumption at all > times? Will you ever have input from non-ascii char sets? What does the > asc() function return in those cases? Is that output safe? Not sure yet. I have to do some more research on the subject. > > It's VBscript. This is an app running on IIS, and the function is used > > to cleanse data going into a SQL Server 2000 database. > > I think you should just drop any chars that aren't allowed. In practice, this code is going to be used on a script that allows the site owner to post news items. What happens if they post something like "It's time" ("it is time", the apostrophe is appropriate) and it comes out as "Its time" (grammatically incorrect?) Not good... I'd rather keep the dangerous characters there and escape them. > > First, do the & # and semicolon have any special meaning to SQL2K or > > any of the other popular database engines? I'm thinking not - but I'm not > > the expert here. > > > > Second, do you think it's ok, given the purpose of the function, to > > include the % as a valid character that will not be escaped? > > I would be warry of "%", "!", "/", and "=" > the "/" char is used as a division operator in some SQL dialects, while > "!" is used as negation. "=" is obviously part of the SQL BNF, and > should be dissalowed. As for "%", I dunno, but it just kinda strikes me > as dangerous somehow. Perhaps the fact that % is the wildcard character in ANSI SQL and most SQL dialects is what is concerning you. And I probably should take all four of those characters out. Question is how to allow the client to make posts using those characters without opening up holes. HTML entities can handle multi-byte characters, can't they? That's not the problem here. -- Steve Sobol, CTO JustThe.net LLC, Mentor On The Lake, OH |
From: Nik C. <ni...@ni...> - 2002-08-07 15:27:36
|
> Perhaps the fact that % is the wildcard character in ANSI SQL and most SQL > dialects is what is concerning you. And I probably should take all four of > those characters out. Question is how to allow the client to make posts > using those characters without opening up holes. HTML entities can handle > multi-byte characters, can't they? That's not the problem here. > convert allowed non A-Z,a-z,0-9 charcters to their UTF-8 equivs and store them in that format. UTF-8 RFC: <http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html> In code: <http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/locale/utf2.c> -Nik |
From: Steven J. S. <sj...@Ju...> - 2002-08-07 15:49:49
|
On Thu, 8 Aug 2002, Nik Cubrilovic wrote: > convert allowed non A-Z,a-z,0-9 charcters to their UTF-8 equivs and store > them in that format. Hm. That will work. I don't feel comfortable writing PHP/ASP code, so I'm going to look for UTF-8 conversion functions. The snippet of C you pointed to is a good start. I think PHP has built-in UTF-8 functionality, but I'm not sure about VBScript. Thanks. -- Steve Sobol, CTO JustThe.net LLC, Mentor On The Lake, OH |
From: Matt W. <wi...@ce...> - 2002-08-07 15:56:30
|
On Wednesday 07 August 2002 10:49, Steven J. Sobol wrote: > On Thu, 8 Aug 2002, Nik Cubrilovic wrote: > > convert allowed non A-Z,a-z,0-9 charcters to their UTF-8 equivs and s= tore > > them in that format. > > Hm. > > That will work. > > I don't feel comfortable writing PHP/ASP code, so I'm going to look > for UTF-8 conversion functions. The snippet of C you pointed to is a go= od > start. I think PHP has built-in UTF-8 functionality, but I'm not sure > about VBScript. > > Thanks. <?php =09$data =3D utf8_encode($_SERVER['data']); ?> www.php.net/utf8_encode -matt |
From: Steven J. S. <sj...@Ju...> - 2002-08-07 17:00:15
|
On Wed, 7 Aug 2002, Matt Wirges wrote: > > > > Thanks. > > <?php > $data = utf8_encode($_SERVER['data']); > ?> Looks like it requires XML support... -- Steve Sobol, CTO JustThe.net LLC, Mentor On The Lake, OH |
From: Steven J. S. <sj...@Ju...> - 2002-08-11 02:42:44
|
On Wed, 7 Aug 2002, Steven J. Sobol wrote: > > > Thanks. > > > > <?php > > $data = utf8_encode($_SERVER['data']); > > ?> > > Looks like it requires XML support... I do know enough about extending PHP to be able to yank those functions and place them into a dynamically loaded PHP extension, though. What I really want to find is a pointer to a COM object that'll do the same for ASP. If someone knows of one, lemme know. Otherwise I'll search for one, and post here when I find it. -- Steve Sobol, CTO JustThe.net LLC, Mentor On The Lake, OH |