Re: [sqlmap-users] Back-end DBMS charset encoding
Brought to you by:
inquisb
From: Miroslav S. <mir...@gm...> - 2011-01-19 14:26:00
|
hi all. as i was really interested into this issue i had to set up a testing environment to find out what's going on :))) i've choose simplest (disposable) testing environment: XAMPP two tables: users_utf8 & users_latin two vulnerable GET pages: get_int_utf8.php & get_int_latin.php well, conclusion and my answer to the given question: "What's should be the general "consensus" for data retrieval": priority among all charsets is the encoding of the web page, and that's because three reasons: 1) connection from the web server to the backend DBMS will be most certainly set to some "compatible" charset with the one at the page itself - that means that all the data from DBMS to the web server will be automatically converted to connection's charset 2) once the web server has replied with the data, in case that the data is not compatible with it's current character set it will in most cases just do a simple replacement with '?' for problematic characters (like in case from latin1 -> utf8) - which means a big screw up for our data in "error" and "union" techniques as the data is irreversibly lost 3) finding out "proper" collation is a futile in a sense that in MySQL for example you can put collation to everything (column, table, connection, user, ...), and there is no "magic" bullet to know the final collation of the retrieved data in a "time constrained" manner. interesting thing that should be pointed out is that you'll most probably have problems with character sets of retrieved data here and there for one obvious reason: web page's connection to the backend DBMS dictates character set used for retrieved data, we "violently" use it in sql injection attacks for different tables with different character sets/collations which were most probably not "meant" to be "compatible" with web page itself, hence you'll lose information irreversibly during the conversion process. kr On Tue, Jan 18, 2011 at 12:13 PM, mitchell <mit...@tu...> wrote: > Will do :) > > # mitchell > > On 18 Jan 2011 13:11, "Miroslav Stampar" <mir...@gm...> wrote: >> hi mitchell. >> >> thank you for your answer. i thought that nobody would :) >> >> we've done some serious work these days in this field and would like >> to have it "stabilized". plz report any "strange" behavior in this >> field if you encounter it. >> >> kr >> >> On Tue, Jan 18, 2011 at 12:01 PM, mitchell <mit...@tu...> wrote: >>> Hi Miroslav, >>> >>> In say 80% of the cases I delt with Bulgarian sites, the data in the >>> database used the same encoding as the encoding announced on the webpage, >>> usually CP-1251. The rest use UTF. >>> >>> # mitchell >>> >>> On 17 Jan 2011 16:52, "Miroslav Stampar" <mir...@gm...> >>> wrote: >>>> Hi all. >>>> >>>> I have a general question to all those pentesters that are retrieving >>>> data >>>> from sites with "funny" charset encodings (...russian, chinese...). >>>> >>>> What's should be the general "consensus" for data retrieval: >>>> >>>> A) assume that the backend DBMS uses the "utf8" charset encoding >>>> or >>>> B) treat data retrieved with the same encoding as used in the page >>>> or >>>> C) find out the proper collation used and use that one? (i am not a fan >>>> of >>>> this one :) >>>> or >>>> D) don't care (some people tend to use mixed collations which is quite >>>> romantic) >>>> >>>> Also, I would like to ask you all to try out the latest revision with >>>> cases >>>> that could be problematic and report impressions. >>>> >>>> Kind regards >>> >> >> >> >> -- >> Miroslav Stampar >> >> E-mail / Jabber: miroslav.stampar (at) gmail.com >> Mobile: +385921010204 (HR 0921010204) >> PGP Key ID: 0xB5397B1B >> Location: Zagreb, Croatia > -- Miroslav Stampar E-mail / Jabber: miroslav.stampar (at) gmail.com Mobile: +385921010204 (HR 0921010204) PGP Key ID: 0xB5397B1B Location: Zagreb, Croatia |