Re: [PATCH] Encrypting and signing non-ascii charakters with gpg and gnus

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> First I found that the failure of the CS.latin1.s1v testcase is due to a
> trivial error in the file. A patch is attached below. With this the
> 'make onetest' as suggested by Brian a bit earlier in the thread should
> succeed. Unfortunately only without my charset.patch. So we are where we
> started.

Ah, thanks for the catch. I forgot that clearsigned messages must *always*
end with a newline, and if necessary GPG will add one to your message before
signing it. Fixed.

> The function which causes all the trouble is (standard-display-european).

Ahhhh. That helps immensely. The docstring says this:

  Semi-obsolete way to toggle display of ISO 8859 European characters.

  This function is semi-obsolete; if you want to do your editing with
  unibyte characters, it is better to `set-language-environment' coupled
  with either the `--unibyte' option or the EMACS_UNIBYTE environment
  variable, or else customize `enable-multibyte-characters'.

  With prefix argument, this command enables European character display
  if arg is positive, disables it otherwise.  Otherwise, it toggles
  European character display.

  When this mode is enabled, characters in the range of 160 to 255
  display not as octal escapes, but as accented characters.  Codes 146
  and 160 display as apostrophe and space, even though they are not the
  ASCII codes for apostrophe and space.

  Enabling European character display with this command noninteractively
  from Lisp code also selects Latin-1 as the language environment, and
  selects unibyte mode for all Emacs buffers (both existing buffers and
  those created subsequently).  This provides increased compatibility
  for users who call this function in `.emacs'.

I think it's the "selects unibyte mode for all emacs buffers" that's the big
difference between your environment and what the test cases were doing. The
\201 bytes are what emacs uses in multibyte buffers to mark latin-1
characters. If that same bytestream were interpreted in unibyte mode, you'd
probably see the spurious \201 bytes that you get.

The "right" fix will probably involve handling unibyte buffers in some
special manner (perhaps when the language environment is set to something
like Latin-1), or maybe dealing specially with the conversion from unibyte
to multibyte and back. The temporary buffer used to read the output of GPG
is probably the critical point. I'll try to look into this over the next
week.

You might also see if there is a less-obsolete replacement for
standard-display-european that meets your needs. If you use 'C-x RET l' to
set-language-environment to Latin-1 (which would leave emacs in multibyte
mode), does Gnus blow up? Do your latin-1 encoded documents look correct? I
have a feeling it will be easier to make everything work correctly if we can
keep the buffers in multibyte mode. Unibyte mode loses the meta-data that
declares which character set is being used.

cheers,
 -Brian