From: John F. <jf...@ms...> - 2009-09-07 06:31:35
|
(Sorry for the rant.) This touches on a pet peeve of mine. I hate the way UTF-8 decoders and encoders shove incredibly aggressive errors in your face when they encounter some binary garbage. Python is especially bad about this. Why should the program crash because of one wrong character in a file? If you are really interested in preserving garbage, there should indeed be a nice way of dealing with it. But it should not be an error. At most a warning, which one can choose to ignore or handle as one sees fit. In cl-irregsexp (fast UTF8 encode/decode) I simply put out (code-char #xfffd) (Unicode invalid) when something horrible happens. On CCL the implementation checks for invalid code points by itself but on SBCL, the encoder lets them through. I agree that it is nice to deal with it, but the sensible default option is not `error'. |