MySQL for Python / Discussion / Help: Unicode from Web to MySQL

I'm trying to grab a document off the Web and toss it
into a MySQL database, but I keep running into the
various encoding problems with Unicode (that aren't
a problem for me with GB2312, BIG 5, etc.)

What I'd like is something as simple as:

CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8));

import MySQLdb, re,urllib

data = urllib.urlopen('http://localhost/test.html').read()

data2 = ???
...
c.execute(''' INSERT INTO junk ( junklet) VALUES ( '%s') ''' % data2 )

where data2 is somehow the UTF-8 converted version of the original Web page.

Additionally, I'd like to be able to do:

body_expr = re.compile('''(.*)''')

data = urllib.urlopen('http://localhost/test.html').read()

main_body = body_expr.search(data).group(1)

and insert that into the database, and most likely I need to

I'm sitting with a dozen explanations from the Web explaining
how to do this,
0) decode('utf-8','ignore') or 'strict', or 'replace'...
1) using re.compile('''(?u)'''),
re.UNICODE+re.IGNORECASE+re.MULTILINE+re.DOTALL)
2) Convert to unicode before UTF-8
3) replace quotation marks within the SQL statement: data2.replace(u'"',u'\\"')

etc., etc., but after numerous tries in the end I still keep getting either SQL errors or
the dreaded 'ascii' codec can't decode byte ... in position ...' errors.

Can someone give me any explanation of how to do this easily? (5 line example would be great)

PS
Note that I am able to do create Unicode data and insert it
with a carefully controlled unicode string

data = u"Make \u0633\u0644\u0627\u0645, not war"
c.execute ( INSERT INTO junk (junklet) VALUES ('%s') ''' % data.encode('utf-8','ignore')

but this won't work with what I find on the Web.

Thanks,
Bill

Unicode from Web to MySQL

MySQL database connector for Python programming

Forums

Help

Unicode from Web to MySQL

Unicode from Web to MySQL

MySQL database connector for Python programming

Forums

Help

Unicode from Web to MySQL document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Unicode from Web to MySQL