From: Teemu L. <tli...@ik...> - 2010-07-05 19:56:59
|
What is the best way to detect the current character encoding in Linux systems? Or maybe even portably across (POSIX) systems? I'm currently reading the output of command "locale charmap" which works but doesn't feel particularly elegant. (defun get-current-encoding () (with-open-stream (s (sb-ext:process-output (sb-ext:run-program "locale" '("charmap") :input t :output :stream :search t))) (read-line s nil))) |
From: Leslie P. P. <sk...@vi...> - 2010-07-05 20:30:52
|
Teemu Likonen wrote: > What is the best way to detect the current character encoding in Linux > systems? Or maybe even portably across (POSIX) systems? I'm currently > reading the output of command "locale charmap" which works but doesn't > feel particularly elegant. > > > (defun get-current-encoding () > (with-open-stream > (s (sb-ext:process-output > (sb-ext:run-program "locale" '("charmap") > :input t :output :stream :search t))) > (read-line s nil))) Per locale(1P): (defun getenv (name) (sb-ext:posix-getenv name)) (defun get-current-encoding () (let* ((value (or (getenv "LC_ALL") (getenv "LC_CTYPE") (getenv "LANG") "C")) (has-charset (position #\. value))) (when has-charset (subseq value (1+ has-charset))))) |
From: Cyrus H. <ch...@bo...> - 2010-07-06 00:55:00
|
Now the question is what encoding should we use for the posix dirent tests that fail when folks have non-latin characters in filenames in their root directories? On Jul 5, 2010, at 1:30 PM, Leslie P. Polzer wrote: > > Teemu Likonen wrote: >> What is the best way to detect the current character encoding in Linux >> systems? Or maybe even portably across (POSIX) systems? I'm currently >> reading the output of command "locale charmap" which works but doesn't >> feel particularly elegant. >> >> >> (defun get-current-encoding () >> (with-open-stream >> (s (sb-ext:process-output >> (sb-ext:run-program "locale" '("charmap") >> :input t :output :stream :search t))) >> (read-line s nil))) > > Per locale(1P): > > (defun getenv (name) > (sb-ext:posix-getenv name)) > > (defun get-current-encoding () > (let* ((value (or (getenv "LC_ALL") > (getenv "LC_CTYPE") > (getenv "LANG") > "C")) > (has-charset (position #\. value))) > (when has-charset > (subseq value (1+ has-charset))))) > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Sbcl-help mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-help |
From: Teemu L. <tli...@ik...> - 2010-07-06 05:07:20
|
* 2010-07-05 22:30 (+0200), Leslie P. Polzer wrote: > (defun getenv (name) > (sb-ext:posix-getenv name)) > > (defun get-current-encoding () > (let* ((value (or (getenv "LC_ALL") > (getenv "LC_CTYPE") > (getenv "LANG") > "C")) > (has-charset (position #\. value))) > (when has-charset > (subseq value (1+ has-charset))))) Obviously this can't work. For example, when LC_CTYPE=fi_FI the charset is ISO-8859-1 and when LC_CTYPE=fi_FI@euro it's ISO-8859-15. In practice only xx_XX.UTF-8 locales have the charset always in the locale name. "locale charmap" command gets the encoding right (even when locale is named using locale aliases) so I guess I'll just use that. |
From: Nikodemus S. <nik...@ra...> - 2010-08-17 15:12:01
|
On 5 July 2010 22:56, Teemu Likonen <tli...@ik...> wrote: > What is the best way to detect the current character encoding in Linux > systems? Or maybe even portably across (POSIX) systems? I'm currently > reading the output of command "locale charmap" which works but doesn't > feel particularly elegant. SBCL internally uses (alien-funcall (extern-alien "nl_langinfo" (function (c-string :external-format :latin-1) int)) sb-unix:codeset) "LATIN-1") to pick the default external format -- which is to say that you should be able to just use :default as the external format and have the right thing happen, unless someone has munged SB-IMPL::*DEFAULT-EXTERNAL-FORMAT*. Cheers, -- Nikodemus |
From: Teemu L. <tli...@ik...> - 2010-08-21 07:10:38
|
* 2010-08-17 18:11 (+0300), Nikodemus Siivola wrote: > SBCL internally uses > > (alien-funcall > (extern-alien "nl_langinfo" > (function (c-string :external-format :latin-1) > int)) > sb-unix:codeset) > "LATIN-1") > > to pick the default external format -- which is to say that you should > be able to just use :default as the external format and have the right > thing happen, unless someone has munged > SB-IMPL::*DEFAULT-EXTERNAL-FORMAT*. Thanks. I'm a bit confused, though. It seems that I can get the current character encoding (as a string) with a function like this: (defun current-character-encoding () (alien-funcall (extern-alien "nl_langinfo" (function (c-string :external-format :default) int)) sb-unix:codeset)) But what effect does "munging" the variable SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* have? Who might munge it? The variable itself seems to point to the current encoding (as a keyword) so I'm not sure why this (ALIEN-FUNCALL ...) stuff is needed. I would guess that SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* is not a public API and thus not recommended. Care to clarify? |
From: Nikodemus S. <nik...@ra...> - 2010-08-21 10:26:49
|
On 21 August 2010 10:10, Teemu Likonen <tli...@ik...> wrote: > Thanks. I'm a bit confused, though. It seems that I can get the current > character encoding (as a string) with a function like this: > > (defun current-character-encoding () > (alien-funcall > (extern-alien "nl_langinfo" > (function (c-string :external-format :default) > int)) > sb-unix:codeset)) > > But what effect does "munging" the variable > SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* have? Who might munge it? The > variable itself seems to point to the current encoding (as a keyword) so > I'm not sure why this (ALIEN-FUNCALL ...) stuff is needed. I would guess > that SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* is not a public API and thus not > recommended. Care to clarify? SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* is not a supported interface, and you are not supposed to munge it. SBCL sets it at startup so that it doesn't have to call nl_langinfo after startup. Currently SBCL never munges it afterwards. Unofficially, it specifies the meaning of :DEFAULT external format (which is specified to exist by the standard, and is unsurprisingly the default). Munging it works, isn't terribly dangerous but definitely not supported and likely to break sooner or later. Reading it isn't supported either, and will break when the internal interface changes. Reading *D-E-F* instead of using the alien call should work just fine, but if eg. some library does the nasty and munges it, then it will no longer specify what the OS thinks is the default encoding -- whereas calling nl_langinfo will always retrieve that information. Whatever you should do depends on what you want the external format for. If you just want an external format argument to use that corresponds to the OS's idea of the default encoding, you can just use :DEFAULT, unless you have reason to believe a library that is misbehaving and munging SB-IMPL::*DEFAULT-EXTERNAL-FORMAT*, in which case you may want to call nl_langinfo. Then again, you want know whatever the default encoding is, instead of just passing it as an argument, currently your best bet is to read SB-IMPL::*DEFAULT-EXTERNAL-FORMAT*, as that tells you what :DEFAULT means -- or if you want to know what the OS's idea of the default encoding is, then nl_langinfo which tells you exactly that and is future-proof to boot. So, my order of preference is :DEFAULT, nl_langinfo, *D-E-F*. By the by, don't use SB-UNIX:CODESET constant if you want your code to remain future-proof. Either grovel the local CODESET #define from langinfo, or assume that it will always be zero everywhere -- which is what SBCL currently does currently. Cheers, -- Nikodemus |