From: Philippe C. <sql...@pa...> - 2008-03-28 12:56:49
|
sql...@li... wrote: > Philippe Chaintreuil wrote: >> > I get errors like the following about once a week: >> > >> > Dec 15 19:51:51 hobbes sqlgrey: dbaccess: warning: couldn't do >> > query: UPDATE from_awl SET last_seen = NOW(), first_seen = >> > first_seen WHERE sender_name = 'schuhb?ckoubut' AND >> > sender_domain = 'rocketmail.com' AND src = '124.106.4.237': >> > Illegal mix of collations (latin1_swedish_ci,IMPLICIT) >> > and (utf8_general_ci,COERCIBLE) for operation '=', >> > reconnecting to DB > Date: Tue, 18 Dec 2007 23:36:43 +0100 Wow. Digest subscription to this list is *slow*. I just got your response today March 28th, 2008! I had almost forgotten I wrote that original e-mail. List owner: You might want to turn on mailman's "digest_send_periodic" option (on the Admin->Digest Options page of the web interface). > Hum, is your database configured to use UTF-8? I think it is. I'm using MySQL, and I'm not that well versed in it's deep inner-workings. Probably about a year ago, I read something that said I should get that utf8_general_ci,COERCIBLE in there. I did that and it obviously didn't seem to do much. > If so you can get such errors, as email adresses are sent from Postfix > as pure ASCII which can (and does regularly) use invalid UTF-8 > sequences. You'll have to convert it to ASCII (latin-1 will do, it's > fine when fed with ASCII data). > > I should really add an entry related to non-ANSI characters > (accentuated mainly) to the FAQ :-) I second the FAQ suggestion on this front -- it should also probably have the answer to my next question as well: Ummm... How do I fix this in MySQL? It should also probably get added to the INSTALL instructions as well. Sorry if this has all been addressed since Lionel replied back in December. -- .----------. ,----------== Peep ==----------, / .-. .-. \ `-== (Philippe Chaintreuil) ==-` / | | | | \ \ `-' `-' _/ FidoNet Netmail Address: /\ .--. / | 1:2613/118 \ | / / / / InterNet Address: / | `--' /\ \ pe...@pa... /`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST-- "We're just two lost souls swimming in a fish bowl..." |
From: Lionel B. <lio...@bo...> - 2008-03-28 13:49:22
|
Philippe Chaintreuil wrote: > > Date: Tue, 18 Dec 2007 23:36:43 +0100 > > Wow. Digest subscription to this list is *slow*. I just got your > response today March 28th, 2008! I had almost forgotten I wrote that > original e-mail. > > List owner: You might want to turn on mailman's "digest_send_periodic" > option (on the Admin->Digest Options page of the web interface). > It was already set. I changed the period to daily instead of monthly. But I suspect a bug... Lionel |
From: Philippe C. <sql...@pa...> - 2008-03-28 13:51:54
|
Lionel Bouton wrote: > It was already set. I changed the period to daily instead of monthly. > But I suspect a bug... Hmmm.... Strange. Anyway, seems like the list isn't high-traffic, so I just dropped the digest mode for myself. That fixes it for me. -- .----------. ,----------== Peep ==----------, / .-. .-. \ `-== (Philippe Chaintreuil) ==-` / | | | | \ \ `-' `-' _/ FidoNet Netmail Address: /\ .--. / | 1:2613/118 \ | / / / / InterNet Address: / | `--' /\ \ pe...@pa... /`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST-- "We're just two lost souls swimming in a fish bowl..." |
From: Philippe C. <sql...@pa...> - 2008-04-02 18:01:10
|
Philippe Chaintreuil wrote: > sql...@li... wrote: > > Philippe Chaintreuil wrote: > >> > I get errors like the following about once a week: > >> > > >> > Dec 15 19:51:51 hobbes sqlgrey: dbaccess: warning: couldn't do > >> > query: UPDATE from_awl SET last_seen = NOW(), first_seen = > >> > first_seen WHERE sender_name = 'schuhb?ckoubut' AND > >> > sender_domain = 'rocketmail.com' AND src = '124.106.4.237': > >> > Illegal mix of collations (latin1_swedish_ci,IMPLICIT) > >> > and (utf8_general_ci,COERCIBLE) for operation '=', > >> > reconnecting to DB Okay, here's some more detail. Fresh log entry: ----------------------------------------------------------------------- Apr 2 12:48:42 hobbes sqlgrey: warning: Use of uninitialized value in concatenation (.) or string at /usr/sbin/sqlgrey line 1139. Apr 2 12:48:42 hobbes sqlgrey: dbaccess: error: couldn't access from_awl table: Apr 2 12:48:42 hobbes sqlgrey: grey: from awl match: updating 212.56.148.57(212.56.148.57), il<E4>nde...@fa...(il<E4>nde...@fa...) Apr 2 12:48:42 hobbes sqlgrey: dbaccess: warning: couldn't do query: UPDATE from_awl SET last_seen = NOW(), first_seen = first_seen WHERE sender_name = 'il<E4>nderkebym' AND sender_domain = 'fastmail.fm' AND src = '212.56.148.57': Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=', reconnecting to DB ----------------------------------------------------------------------- MySQL information: ----------------------------------------------------------------------- mysql> SHOW VARIABLES LIKE '%character%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE '%collation%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ 3 rows in set (0.00 sec) ----------------------------------------------------------------------- So in general, my system is setup to use UTF-8. Keep in mind that client and connection are referring to the mysql command line client here. ----------------------------------------------------------------------- mysql> SHOW CHARACTER SET LIKE 'latin1'; +---------+----------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+----------------------+-------------------+--------+ | latin1 | cp1252 West European | latin1_swedish_ci | 1 | +---------+----------------------+-------------------+--------+ 1 row in set (0.00 sec) ----------------------------------------------------------------------- This shows that latin1 causes the latin1_swedish_ci collation to be set as default. ----------------------------------------------------------------------- mysql> SHOW FULL COLUMNS FROM from_awl ; +---------------------------------------------+ | Field: sender_name | | Type: varchar(64) | | Collation: latin1_swedish_ci | | Null: NO | | Key: PRI | | Default: NULL | | Extra: | | Privileges: select,insert,update,references | | Comment: | +---------------------------------------------+ | Field: sender_domain | | Type: varchar(255) | | Collation: latin1_swedish_ci | | Null: NO | | Key: PRI | | Default: NULL | | Extra: | | Privileges: select,insert,update,references | | Comment: | +---------------------------------------------+ | Field: src | | Type: varchar(39) | | Collation: latin1_swedish_ci | | Null: NO | | Key: PRI | | Default: NULL | | Extra: | | Privileges: select,insert,update,references | | Comment: | +---------------------------------------------+ | Field: first_seen | | Type: timestamp | | Collation: NULL | | Null: NO | | Key: | | Default: CURRENT_TIMESTAMP | | Extra: | | Privileges: select,insert,update,references | | Comment: | +---------------------------------------------+ | Field: last_seen | | Type: timestamp | | Collation: NULL | | Null: NO | | Key: MUL | | Default: 0000-00-00 00:00:00 | | Extra: | | Privileges: select,insert,update,references | | Comment: | +---------------------------------------------+ 5 rows in set (0.01 sec) ----------------------------------------------------------------------- [I reformatted that output so it wouldn't have to wrap.] So the Swedish collation is on all the text fields.... And here are a few of the relevant lines from an mysqldump of my sqlgrey database: ----------------------------------------------------------------------- -- -- Table structure for table `from_awl` -- DROP TABLE IF EXISTS `from_awl`; SET @saved_cs_client = @@character_set_client; SET character_set_client = utf8; CREATE TABLE `from_awl` ( `sender_name` varchar(64) NOT NULL, `sender_domain` varchar(255) NOT NULL, `src` varchar(39) NOT NULL, `first_seen` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, `last_seen` timestamp NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`src`,`sender_domain`,`sender_name`), KEY `from_awl_lseen` (`last_seen`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; SET character_set_client = @saved_cs_client; ----------------------------------------------------------------------- 2nd line from the bottom has "DEFAULT CHARSET=latin1;" which shows that I followed the HOWTO's instructions when creating the databases. (I checked another database and it has "DEFAULT CHARSET=utf8" like I'd expect if I didn't set something explicitly.) So I have two questions: 1.) What's wrong? 2.) How can I fix it? Right now, my leaning is that the connection from sqlgrey is UTF-8 since that's my system's default and that's causing the error about utf8 and swedish fighting. Now I know next to nothing about non-ASCII encodings, MySQL's deep dark secrets, and Perl. But I did find this page about setting the connection's encoding for mysql: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html -- .----------. ,----------== Peep ==----------, / .-. .-. \ `-== (Philippe Chaintreuil) ==-` / | | | | \ \ `-' `-' _/ FidoNet Netmail Address: /\ .--. / | 1:2613/118 \ | / / / / InterNet Address: / | `--' /\ \ pe...@pa... /`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST-- "We're just two lost souls swimming in a fish bowl..." |
From: Lionel B. <lio...@bo...> - 2008-04-02 18:22:10
|
Philippe Chaintreuil wrote: > So I have two questions: > > 1.) What's wrong? > MySQL ? I'm wondering why it defaults to a charset different than the database one. It seems counterintuitive at best. If you set your database charset to a non-default value it's for a damn good reason... > 2.) How can I fix it? > > Without patching SQLgrey, you might have luck changing the my.cnf file to change the default client charset (assuming you don't have other clients needing to connect with a different charset to the same system). If you want to hack SQLgrey, look for the end of the connectdb method, it already has a special case for MySQL (auto-reconnect as MySQL routinely drops conections by default...). You can then send the SQL query setting the character set you need at this point : $self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1") The MySQL doc isn't clear about what happens when the client auto-reconnects... so your mileage may vary. It would be great if the DBD driver would allow to set the character set at connection-time, but I had no luck browsing it's documentation either. Lionel. |
From: Philippe C. <sql...@pa...> - 2008-05-31 11:12:55
|
Lionel Bouton wrote: > Without patching SQLgrey, you might have luck changing the my.cnf file > to change the default client charset (assuming you don't have other > clients needing to connect with a different charset to the same > system). If you want to hack SQLgrey, look for the end of the > connectdb method, it already has a special case for MySQL > (auto-reconnect as MySQL routinely drops conections by default...). > You can then send the SQL query setting the character set you need at > this point : > > $self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1") > > The MySQL doc isn't clear about what happens when the client > auto-reconnects... so your mileage may vary. It would be great if the > DBD driver would allow to set the character set at connection-time, > but I had no luck browsing it's documentation either. So I made this change back on April 3rd: ------------------------------------------------------------------------ # mysql drops the connection, we have some glue code # to reinit the connection, but better use mysql DBD code if ($self->MySQL()) { $self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1") or $self->mylog('dbaccess', 0, "can't set connection character set to LATIN1: $DBI::errstr"); $self->{sqlgrey}{dbh}->{mysql_auto_reconnect} = 1; } ------------------------------------------------------------------------ and yesterday I got this: ------------------------------------------------------------------------ May 30 14:23:24 hobbes postfix/smtpd[28381]: warning: 190.172.137.78: address not listed for hostname 190-172-137-78.speedy.com.ar May 30 14:23:24 hobbes postfix/smtpd[28381]: connect from unknown[190.172.137.78] May 30 14:23:26 hobbes postfix/pickup[28070]: 73E491AFEA9: uid=102 from=<sqlgrey> May 30 14:23:26 hobbes postfix/cleanup[28424]: 73E491AFEA9: message-id=<200...@ho...> May 30 14:23:26 hobbes postfix/qmgr[15197]: 73E491AFEA9: from=<sq...@ho...>, size=490, nrcpt=1 (queue active) May 30 14:23:26 hobbes sqlgrey: warning: Use of uninitialized value in concatenation (.) or string at /usr/sbin/sqlgrey line 1143. May 30 14:23:26 hobbes sqlgrey: dbaccess: error: couldn't access from_awl table: May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating 190.172.137.78(190.172.137.78), m<FC>ll...@ly...(m<FC>ll...@ly...) ------------------------------------------------------------------------ So I got a new warning about a bad concatenation/string when I got a non [A-Za-z] letter (the "<FC>"). The line referenced above is the mylog() line below from is_in_from_awl(): ------------------------------------------------------------------------ if (!defined $sth or !$sth->execute($sender_name, $sender_domain, $host)) { $self->db_unavailable(); $self->mylog('dbaccess', 0, "error: couldn't access $from_awl table: $DBI::errstr"); return 1; # in doubt, accept } else { ------------------------------------------------------------------------ I went back through my logs trying to see if I could find any other non [A-Za-z] characters in e-mail addresses in sqlgrey lines since April and I couldn't find any. (Note that does not mean that they're weren't any -- I had 26 megs of sqlgrey lines from that time and I don't quite know what to grep for to find 'em.) So the two questions I have at this point are: 1.) Lionel, any suggestions what to do to debug this warning so I can get the real error? 2.) Anybody know a grep regexp for pulling out e-mail addresses with those non [A-Za-z] characters? -- .----------. ,----------== Peep ==----------, / .-. .-. \ `-== (Philippe Chaintreuil) ==-` / | | | | \ On The Other Side Of The Wall! \ `-' `-' _/ FidoNet Netmail Address: /\ .--. / | 1:2613/118 \ | / / / / InterNet Address: / | `--' /\ \ pe...@pa... /`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST-- "We're just two lost souls swimming in a fish bowl..." |
From: Dan F. <da...@ha...> - 2008-05-31 11:48:27
|
Is it even legal to use international chars in an email-address? Well anyway. I assume its this line you want to grep out: May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating 190.172.137.78(190.172.137.78), m<FC>ll...@ly...(m<FC>ll...@ly...) If so, try this: $ grep "from awl match:" mail.log | grep -viE "\([[:graph:]]+@" If it fails, use this as a "less accurate" alternative: $ grep "from awl match:" mail.log | grep -viE "\([a-z?~*=.#%&+/_0-9-]+@" (replace "mail.log" with whatever file youre grepping) - Dan Philippe Chaintreuil wrote: > Lionel Bouton wrote: > > Without patching SQLgrey, you might have luck changing the my.cnf file > > to change the default client charset (assuming you don't have other > > clients needing to connect with a different charset to the same > > system). If you want to hack SQLgrey, look for the end of the > > connectdb method, it already has a special case for MySQL > > (auto-reconnect as MySQL routinely drops conections by default...). > > You can then send the SQL query setting the character set you need at > > this point : > > > > $self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1") > > > > The MySQL doc isn't clear about what happens when the client > > auto-reconnects... so your mileage may vary. It would be great if the > > DBD driver would allow to set the character set at connection-time, > > but I had no luck browsing it's documentation either. > > So I made this change back on April 3rd: > > ------------------------------------------------------------------------ > # mysql drops the connection, we have some glue code > # to reinit the connection, but better use mysql DBD code > if ($self->MySQL()) { > $self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1") > or $self->mylog('dbaccess', 0, > "can't set connection character set to LATIN1: $DBI::errstr"); > > $self->{sqlgrey}{dbh}->{mysql_auto_reconnect} = 1; > } > ------------------------------------------------------------------------ > > and yesterday I got this: > > ------------------------------------------------------------------------ > May 30 14:23:24 hobbes postfix/smtpd[28381]: warning: 190.172.137.78: > address not listed for hostname 190-172-137-78.speedy.com.ar > May 30 14:23:24 hobbes postfix/smtpd[28381]: connect from > unknown[190.172.137.78] > May 30 14:23:26 hobbes postfix/pickup[28070]: 73E491AFEA9: uid=102 > from=<sqlgrey> > May 30 14:23:26 hobbes postfix/cleanup[28424]: 73E491AFEA9: > message-id=<200...@ho...> > May 30 14:23:26 hobbes postfix/qmgr[15197]: 73E491AFEA9: > from=<sq...@ho...>, size=490, nrcpt=1 (queue active) > > May 30 14:23:26 hobbes sqlgrey: warning: Use of uninitialized value in > concatenation (.) or string at /usr/sbin/sqlgrey line 1143. > May 30 14:23:26 hobbes sqlgrey: dbaccess: error: couldn't access > from_awl table: > May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating > 190.172.137.78(190.172.137.78), > m<FC>ll...@ly...(m<FC>ll...@ly...) > ------------------------------------------------------------------------ > > So I got a new warning about a bad concatenation/string when I got a > non [A-Za-z] letter (the "<FC>"). The line referenced above is the > mylog() line below from is_in_from_awl(): > > ------------------------------------------------------------------------ > if (!defined $sth or !$sth->execute($sender_name, $sender_domain, > $host)) { > $self->db_unavailable(); > $self->mylog('dbaccess', 0, "error: couldn't access $from_awl > table: $DBI::errstr"); > return 1; # in doubt, accept > } else { > ------------------------------------------------------------------------ > > I went back through my logs trying to see if I could find any other non > [A-Za-z] characters in e-mail addresses in sqlgrey lines since April and > I couldn't find any. (Note that does not mean that they're weren't any > -- I had 26 megs of sqlgrey lines from that time and I don't quite know > what to grep for to find 'em.) > > So the two questions I have at this point are: > > 1.) Lionel, any suggestions what to do to debug this warning so I can > get the real error? > 2.) Anybody know a grep regexp for pulling out e-mail addresses with > those non [A-Za-z] characters? > > > |
From: Dan F. <da...@ha...> - 2008-05-31 13:35:50
Attachments:
dbloadreduction.patch
|
Hey all.. During "bounce storms" like backscatter and such, we startet getting timeouts between postfix and sqlgrey. After a little poking, i loosly concluded that sql hangs for a bit and thus sqlgrey hans for a bit. The sql-hang occurs, afaik, when the sql-master (we run in clustering mode) gets hammered with inserts and updates. Im gonna upgrade the master sql-server at some point, but i also looked a bit at what "write" statements that hammers the master. Of course inserts into the connect (new connection triplet) is by far the top hitter when it comes to writes. But during a bounce storm, UPDATES to From_AWL actually makes out around 20-25% of all writes. But its not necessary to update EVERY time, since in a bounce scenario, most sender adresses will be blank. (ie NULL senders) and many IPs will return often. So yesterday i whipped up a way of not updating every time for the same record. And it turns out it made a huge difference. From_AWL writings are now reduced by 97% What ive done: is_in_from_awl normally SELECT's "1" if record exists in AWL. I now select the "last_seen" timestamp. The function now also returns this timestamp, instead of 1 for all "true" scenarios. It will still return 0 on false, thus no logic needs changing elsewhere due to this change. The timestamp is compared to time() to see how long ago last update was done. If enough time has passed, it makes the UPDATE to the table. Right now the UPDATE delay is configured as $dflt{awl_age}(in seconds) * 0.004167. This makes ~6 hours, if awl_age=60 days and ~3 hours if awl_age=30 days and so forth. I dont really see a downside to this approach. All From_AWL entries will still be updated on the first hit within the time period. so worst case scenario, would be that a from_awl entry would be cleaned out 3 hours early. Compared to a period of 30 days, it doenst really matter. In the log it will say: "from awl match: updating" (as before) when actually updating the table and when NOT updating it will simply say "from awl match:" (without the word "updating") Oh yeah, and the feature can be enabled/disabled with the flag "db_load_reduction". So.. This is a "request for comments" to make sure i didnt miss any pitfalls and such. A patch for consideration is attached. - Dan Faerch |