Re: [mod-security-users] Odd encoding of <script>
Brought to you by:
victorhora,
zimmerletw
From: Achim H. <web...@se...> - 2010-07-21 16:32:00
|
On Wed, 21 Jul 2010, MARTIN, JASON (ATTSI) wrote: !! Thank you, that exactly explains how this was able to affect the !! browser. If we know that the expected charset is iso-8859-1, yes, it's a missmatch between the charset in the HTTP header and the charset used in the HTTP body (aka HTML content). !! is there a way to detect it? hmm, first problem is that you need to tell you developers to talk to your server administrators that they *have to* agree on the same charset ;-) Then you can search for impoper url-encoded chars acording that agreement. The biggest problem will be your developers, I guess ... !! On Tue, 20 Jul 2010, Brian Rectanus wrote: !! !! !! On 07/20/2010 09:59 AM, MARTIN, JASON (ATTSI) wrote: !! !! > Hello, I am seeing that %EF%BC%A2%EA%A8%BE%EF%BC%BCscript%EA%A8%BE !! is !! !! > somehow translated to <script> when decoded by a browser. The !! !! > characters all map to high-ascii, but I don't see how the browser !! would !! !! > interpret that as a valid <script> tag yet it does. Has anyone !! seen !! !! > that before? !! !! > !! !! > Thank you, !! !! > -Jason Martin !! !! !! !! This looks like UTF-8, not ascii. There are 3 characters before !! !! "script" and one after. !! !! !! !! 0xEFBCA2 = U+FF22: FULLWIDTH LATIN CAPITAL LETTER B (ascii B) !! !! 0xEAA8BE = U+AA3E: UNKNOWN CHARACTER !! !! 0xEFBCBC = U+FF3C: FULLWIDTH REVERSE SOLIDUS (ascii \) !! !! script = ASCII String !! !! 0xEAA8BE = U+AA3E: UNKNOWN CHARACTER !! !! !! !! Not sure how that is interpreted by the browser as you did not say !! which !! !! one on which platform :) !! !! !! if we do: !! !! UTF-8-decode(url-decode("%EF%BC%A2%EA%A8%BE%EF%BC%BCscript%EA%A8%BE")) !! !! we get what Brian described. If you then look at the hex representation !! of the !! result, you see: !! ff22 aa3e ff3c 73 63 72 69 70 74 aa3e !! !! Depending on the used filter/sanitation/transformation and/or best-fit !! mapping !! we may finally get: !! 22 3e 3c 73 63 72 69 70 74 3e !! which is !! "><script> !! !! Any more doubts now? !! !! I guess it's the same as: !! %ea%88%a2%22%eb%b8%be%3e%eb%b0%bc%3c !! !! Will be a challenge to detect this without knowing what charsets are in !! use. !! !! ;-) Achim |