Re: [sqlmap-users] Back-end DBMS charset encoding

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

i am sending you all update regarding this field together with some screenshots

for example:
"latin for latin blind" means: blind inference used, for retrieving of
latin data via latin (and latin connection to the backend DBMS) page
"latin for utf8 error" means: error approach used, for retrieving of
utf8 data via latin (and latin connection to the backend DBMS) page
"utf8 for latin error" means: error approach used, for retrieving of
latin data via utf8 (and utf8 connection to the backend DBMS) page

...

all data that you see with '???' are lost irreversibly in cases when
utf8 data was retrieved via latin connection/pages, and they are
inherently incompatible (we can't do a shit in those cases as
connection charset is hard coded in the web pages code - like
"mysql_set_charset("latin1", $link)").

so, all in all sqlmap is doing a great job right now in this field :)

p.s. there was a "really nasty" problem when -o switch was used
(--null-connection part) and page encoding was just reset to 'utf8'
which potentially lead to messy results. fixed in last commit.

On Tue, Jan 18, 2011 at 12:11 PM, Miroslav Stampar
<mir...@gm...> wrote:
> hi mitchell.
>
> thank you for your answer. i thought that nobody would :)
>
> we've done some serious work these days in this field and would like
> to have it "stabilized". plz report any "strange" behavior in this
> field if you encounter it.
>
> kr
>
> On Tue, Jan 18, 2011 at 12:01 PM, mitchell <mit...@tu...> wrote:
>> Hi Miroslav,
>>
>> In say 80% of the cases I delt with Bulgarian sites, the data in the
>> database used the same encoding as the encoding announced on the webpage,
>> usually CP-1251. The rest use UTF.
>>
>> # mitchell
>>
>> On 17 Jan 2011 16:52, "Miroslav Stampar" <mir...@gm...> wrote:
>>> Hi all.
>>>
>>> I have a general question to all those pentesters that are retrieving data
>>> from sites with "funny" charset encodings (...russian, chinese...).
>>>
>>> What's should be the general "consensus" for data retrieval:
>>>
>>> A) assume that the backend DBMS uses the "utf8" charset encoding
>>> or
>>> B) treat data retrieved with the same encoding as used in the page
>>> or
>>> C) find out the proper collation used and use that one? (i am not a fan of
>>> this one :)
>>> or
>>> D) don't care (some people tend to use mixed collations which is quite
>>> romantic)
>>>
>>> Also, I would like to ask you all to try out the latest revision with
>>> cases
>>> that could be problematic and report impressions.
>>>
>>> Kind regards
>>
>
>
>
> --
> Miroslav Stampar
>
> E-mail / Jabber: miroslav.stampar (at) gmail.com
> Mobile: +385921010204 (HR 0921010204)
> PGP Key ID: 0xB5397B1B
> Location: Zagreb, Croatia
>

-- 
Miroslav Stampar

E-mail / Jabber: miroslav.stampar (at) gmail.com
Mobile: +385921010204 (HR 0921010204)
PGP Key ID: 0xB5397B1B
Location: Zagreb, Croatia