Re: [sqlmap-users] Back-end DBMS charset encoding

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

hi all.

as i was really interested into this issue i had to set up a testing
environment to find out what's going on :)))

i've choose simplest (disposable) testing environment: XAMPP

two tables: users_utf8 & users_latin
two vulnerable GET pages: get_int_utf8.php & get_int_latin.php

well, conclusion and my answer to the given question: "What's should
be the general "consensus" for data retrieval":

priority among all charsets is the encoding of the web page, and
that's because three reasons:

1) connection from the web server to the backend DBMS will be most
certainly set to some "compatible" charset with the one at the page
itself - that means that all the data from DBMS to the web server will
be automatically converted to connection's charset
2) once the web server has replied with the data, in case that the
data is not compatible with it's current character set it will in most
cases just do a simple replacement with '?' for problematic characters
(like in case from latin1 -> utf8) - which means a big screw up for
our data in "error" and "union" techniques as the data is irreversibly
lost
3) finding out "proper" collation is a futile in a sense that in MySQL
for example you can put collation to everything (column, table,
connection, user, ...), and there is no "magic" bullet to know the
final collation of the retrieved data in a "time constrained" manner.

interesting thing that should be pointed out is that you'll most
probably have problems with character sets of retrieved data here and
there for one obvious reason:
web page's connection to the backend DBMS dictates character set used
for retrieved data, we "violently" use it in sql injection attacks for
different tables with different character sets/collations which were
most probably not "meant" to be "compatible" with web page itself,
hence you'll lose information irreversibly during the conversion
process.

kr

On Tue, Jan 18, 2011 at 12:13 PM, mitchell <mit...@tu...> wrote:
> Will do :)
>
> # mitchell
>
> On 18 Jan 2011 13:11, "Miroslav Stampar" <mir...@gm...> wrote:
>> hi mitchell.
>>
>> thank you for your answer. i thought that nobody would :)
>>
>> we've done some serious work these days in this field and would like
>> to have it "stabilized". plz report any "strange" behavior in this
>> field if you encounter it.
>>
>> kr
>>
>> On Tue, Jan 18, 2011 at 12:01 PM, mitchell <mit...@tu...> wrote:
>>> Hi Miroslav,
>>>
>>> In say 80% of the cases I delt with Bulgarian sites, the data in the
>>> database used the same encoding as the encoding announced on the webpage,
>>> usually CP-1251. The rest use UTF.
>>>
>>> # mitchell
>>>
>>> On 17 Jan 2011 16:52, "Miroslav Stampar" <mir...@gm...>
>>> wrote:
>>>> Hi all.
>>>>
>>>> I have a general question to all those pentesters that are retrieving
>>>> data
>>>> from sites with "funny" charset encodings (...russian, chinese...).
>>>>
>>>> What's should be the general "consensus" for data retrieval:
>>>>
>>>> A) assume that the backend DBMS uses the "utf8" charset encoding
>>>> or
>>>> B) treat data retrieved with the same encoding as used in the page
>>>> or
>>>> C) find out the proper collation used and use that one? (i am not a fan
>>>> of
>>>> this one :)
>>>> or
>>>> D) don't care (some people tend to use mixed collations which is quite
>>>> romantic)
>>>>
>>>> Also, I would like to ask you all to try out the latest revision with
>>>> cases
>>>> that could be problematic and report impressions.
>>>>
>>>> Kind regards
>>>
>>
>>
>>
>> --
>> Miroslav Stampar
>>
>> E-mail / Jabber: miroslav.stampar (at) gmail.com
>> Mobile: +385921010204 (HR 0921010204)
>> PGP Key ID: 0xB5397B1B
>> Location: Zagreb, Croatia
>

-- 
Miroslav Stampar

E-mail / Jabber: miroslav.stampar (at) gmail.com
Mobile: +385921010204 (HR 0921010204)
PGP Key ID: 0xB5397B1B
Location: Zagreb, Croatia