From: Bruno H. <br...@cl...> - 2011-04-11 22:24:00
|
Hi Sam, > does libiconv/glibc have facilities to handle composed/decomposed > unicode? No, glibc and libiconv don't implement composition/decomposition of Unicode strings. GNU libunistring does, *but* 1. the canonical decomposition done by MacOS X HFS+ uses a fixed, old version of Unicode, not the newest version of Unicode (currently 6.0). 2. this canonical decomposition is specific to that file system. I believe that it does not occur on NFS or SMB/CIFS mounts. 3. it is not needed for programs like clisp to perform this conversion. It's like the lowercase -> uppercase conversion done by some other file systems (like FAT): it is done by the file system silently. I.e. when you ask to create a file "readme" it creates "README" and reports success. When you then iterate through the directory entries you find "README" but not "readme". It's the same way with the UTF-8 -> UTF-8/NFD mapping done in HFS+. > should we introduce macosx-specific pathname preprocessors? You should not deal with the decomposition of file names in clisp. But you can well ensure that the file names are all UTF-8. For example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. Bruno -- In memoriam The victims of the Katyn massacre <http://en.wikipedia.org/wiki/Katyn_massacre> <http://www.solidarni.waw.pl/ssw/pdf/Katyn1940/Katyn-unresolved_genocide_in_Europe.pdf> |
From: Pascal J. B. <pj...@in...> - 2011-04-11 22:39:30
|
On 12/04/2011, at 00:32, Sam Steingold <sd...@gn...> wrote: > Hi Bruno, > >> * Bruno Haible <oe...@py...t> [2011-04-12 00:23:49 +0200]: >> >> But you can well ensure that the file names are all UTF-8. For >> example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. > > That solution sounds _very_ cool. > Since any character can be represented in UTF-8, forcing > *PATHNAME-ENCODING* to UTF-8 seems to be a win-win. > > MacOS users, do you agree? of course. That's how I configure my clisp. -- __Pascal Bourguignon__ (Sent from my iPad) |
From: Sam S. <sd...@gn...> - 2011-04-11 22:50:17
|
> * Pascal J. Bourguignon <cwo@vasbezngvzntb.pbz> [2011-04-12 00:40:25 +0200]: > On 12/04/2011, at 00:32, Sam Steingold <sd...@gn...> wrote: >>> * Bruno Haible <oe...@py...t> [2011-04-12 00:23:49 +0200]: >>> >>> But you can well ensure that the file names are all UTF-8. For >>> example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. >> >> That solution sounds _very_ cool. >> Since any character can be represented in UTF-8, forcing >> *PATHNAME-ENCODING* to UTF-8 seems to be a win-win. >> >> MacOS users, do you agree? > > of course. That's how I configure my clisp. does this actually work on macos? (setq *pathname-encoding* CHARSET:UTF-8) (loop :for i :from 0 :to char-code-limit :for path = (string (code-char i)) :nconc (handler-case (progn (delete-file (open path :direction :probe :if-does-not-exist :create)) nil) (error (c) (format t "~:D ~A: ~A~%" i (char-name (code-char i)) c) (list i)))) 0 Null: PARSE-NAMESTRING: syntax error in filename "" at position 0 42 ASTERISK: OPEN: wildcards are not allowed here: #P"*" 46 FULL_STOP: OPEN: "/home2/sds/src/clisp/current/build-g/." names a directory, not a file 47 SOLIDUS: OPEN: No file name given: #P"/" 63 QUESTION_MARK: OPEN: wildcards are not allowed here: #P"?" 126 TILDE: OPEN: No file name given: #P"/home/sds/" ==> (0 42 47 63 126) -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.5 (Final) X http://pmw.org.il http://truepeace.org http://openvotingconsortium.org http://dhimmi.com http://mideasttruth.com http://iris.org.il Lisp is a way of life. C is a way of death. |
From: Sam S. <sd...@gn...> - 2011-04-11 22:33:04
|
Hi Bruno, > * Bruno Haible <oe...@py...t> [2011-04-12 00:23:49 +0200]: > > But you can well ensure that the file names are all UTF-8. For > example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. That solution sounds _very_ cool. Since any character can be represented in UTF-8, forcing *PATHNAME-ENCODING* to UTF-8 seems to be a win-win. MacOS users, do you agree? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.5 (Final) X http://openvotingconsortium.org http://honestreporting.com http://thereligionofpeace.com http://camera.org http://ffii.org Do not tell me what to do and I will not tell you where to go. |
From: <Joe...@t-...> - 2011-04-12 14:37:40
|
Hi, >You should not deal with the decomposition of file names in clisp. The devil lies in the details. For instance, there's a bug which I've not chased down in the area of ssh, fuse-ssh(on Linux), Emacs and readline. Somewhere, when logging remotely from Linux into MacOS, or when copying files via fuse-ssh (on Linux) to MacOS, bash's readline got confused by Umlauts like äöü. It appears that for some files they were decomposed, for others not. As a consequence, completion and backspace behaved oddly. The Finder showed all of them alike, but IIRC Emacs noticed a difference (or was it the terminal that did not display everything as expected?). I'm sorry I don't remember the details. (MacOS 10.5.8) Preliminary conclusion: decomposition is not transparent to the app... >I believe that it does not occur on NFS or SMB/CIFS mounts. The Mac documentation that Pascal recently linked to says NFS: don't know, not specified [I guess it depends on the host]; SMB: no decomposition. >For example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. This sounds very reasonable. Regards, Jörg Höhle |
From: Sam S. <sd...@gn...> - 2011-04-12 18:25:33
|
> * <Wbret-Plevy.Ubruyr@g-flfgrzf.pbz> [2011-04-12 16:37:27 +0200]: > >>For example, by forcing *PATHNAME-ENCODING* to be UTF-8 on MacOS X. > This sounds very reasonable. Implemented. Please pull from hg, build and test (make check). *pathname-encoding* should always be utf-8 now and you should not be able to change it; this should be the only difference. Thanks. -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.5 (Final) X 11.0.60900031 http://www.PetitionOnline.com/tap12009/ http://mideasttruth.com http://honestreporting.com http://ffii.org http://iris.org.il http://pmw.org.il Lisp: Serious empowerment. |
From: Carlos U. <car...@gm...> - 2011-04-12 20:38:21
|
Hello, On Tue, Apr 12, 2011 at 8:25 PM, Sam Steingold <sd...@gn...> wrote: > Implemented. > Please pull from hg, build and test (make check). > *pathname-encoding* should always be utf-8 now and you should not be > able to change it; this should be the only difference. It compiles if I make a little modification (I think Symbol_value(S(utf-8)) should be Symbol_value(S(utf_8)), see below), but it fails later: ./lisp.run -B . -E UTF-8 -Epathname 1:1 -Emisc 1:1 -norc -m 2MW -lp ../src/ -x '(and (load "../src/init.lisp") (sys::%saveinitmem) (ext::exit)) (ext::exit t)' GNU CLISP: invalid argument: '-Epathname' Cheers, Carlos diff -r c8f10bb068a0 src/lispbibl.d --- a/src/lispbibl.d Tue Apr 12 14:23:01 2011 -0400 +++ b/src/lispbibl.d Tue Apr 12 22:33:40 2011 +0200 @@ -290,7 +290,7 @@ /* MacOSX pathnames are UTF-8 strings, not byte sequences http://thread.gmane.org/gmane.lisp.clisp.general/13725 http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html */ - #define CONSTANT_PATHNAME_ENCODING Symbol_value(S(utf-8)) + #define CONSTANT_PATHNAME_ENCODING Symbol_value(S(utf_8)) #endif #ifdef AMIX #define UNIX_AMIX /* Amiga UNIX */ |
From: Sam S. <sd...@gn...> - 2011-04-12 21:22:54
|
> * Carlos Ungil <pneybf.hatvy@tznvy.pbz> [2011-04-12 22:38:13 +0200]: > On Tue, Apr 12, 2011 at 8:25 PM, Sam Steingold <sd...@gn...> wrote: >> Implemented. >> Please pull from hg, build and test (make check). >> *pathname-encoding* should always be utf-8 now and you should not be >> able to change it; this should be the only difference. > > It compiles if I make a little modification (I think > Symbol_value(S(utf-8)) should be Symbol_value(S(utf_8)), see below), indeed. thanks. > but it fails later: > > ./lisp.run -B . -E UTF-8 -Epathname 1:1 -Emisc 1:1 -norc -m 2MW -lp > ../src/ -x '(and (load "../src/init.lisp") (sys::%saveinitmem) > (ext::exit)) (ext::exit t)' > GNU CLISP: invalid argument: '-Epathname' hmmmm..... could you please edit Makefile and add "-verbose" to the makemake invocation in the Makefile: target and then regenerate Makefile and send me the output? thanks. e.g., ### msys/mingw # # host system: # hostname = "stnt067" # HSYS = "win32gcc" # HSYSOS = "win32gcc" # HOS = "win32" # host_cpu = "i686" # cpu = "i386" # host_os = "mingw32" # host = "i686-pc-mingw32" # # target system: # TSYS = "win32gcc" # TSYSOS = "win32gcc" # TOS = "win32" what does macosx print? thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.5 (Final) X 11.0.60900031 http://honestreporting.com http://openvotingconsortium.org http://dhimmi.com http://iris.org.il http://ffii.org http://camera.org If you try to fail, and succeed, which have you done? |
From: Carlos U. <car...@gm...> - 2011-04-12 21:43:12
|
Hello, On Tue, Apr 12, 2011 at 11:22 PM, Sam Steingold <sd...@gn...> wrote: > hmmmm..... > could you please edit Makefile and add "-verbose" to the makemake > invocation in the Makefile: target and then regenerate Makefile and send > me the output? I get the following (32 and 64 bits): ./makemake -verbose --with-dynamic-ffi --srcdir=../src > Makefile.tmp with_dynamic_ffi=yes inferred: fsstnd = gnu_ext EXPORT_DYNAMIC_FLAG_SPEC= module_configure_flags= '--disable-option-checking' '--cache-file=config.cache' 'CC=gcc -m32' # host system: hostname = "iMac.local" HSYS = "i386" HSYSOS = "darwin" HOS = "unix" host_cpu = "i386" host_ABI = "i386" cpu = "i386" host_os = "darwin10.7.0" host = "i386-apple-darwin10.7.0" # target system: TSYS = "i386" TSYSOS = "darwin" TOS = "unix" XCC_CREATESHARED = ${CC} -dynamiclib -o $lib $libs ${CLFLAGS} ${CFLAGS} -install_name /$dll BUILD_AUX = config.guess config.rpath config.sub depcomp arg-nonnull.h c++defs.h warn-on-use.h ./makemake -verbose --srcdir=../src > Makefile.tmp inferred: fsstnd = gnu_ext EXPORT_DYNAMIC_FLAG_SPEC= module_configure_flags= '--disable-option-checking' '--cache-file=config.cache' # host system: hostname = "iMac.local" HSYS = "i386" HSYSOS = "darwin" HOS = "unix" host_cpu = "x86_64" host_ABI = "x86_64" cpu = "x86_64" host_os = "darwin10.7.0" host = "x86_64-apple-darwin10.7.0" # target system: TSYS = "i386" TSYSOS = "darwin" TOS = "unix" XCC_CREATESHARED = ${CC} -dynamiclib -o $lib $libs ${CLFLAGS} ${CFLAGS} -install_name /$dll BUILD_AUX = config.guess config.rpath config.sub depcomp arg-nonnull.h c++defs.h warn-on-use.h |
From: Sam S. <sd...@gn...> - 2011-04-13 14:23:10
Attachments:
macosx-path-enc-15340.diff
|
Hi, > * Carlos Ungil <pneybf.hatvy@tznvy.pbz> [2011-04-12 23:43:06 +0200]: > > TSYSOS = "darwin" thanks! please do "hg roll" to discard my broken patch and apply the attached one. |
From: Carlos U. <car...@gm...> - 2011-04-13 19:12:11
|
Hi Sam, On Wed, Apr 13, 2011 at 4:22 PM, Sam Steingold <sd...@gn...> wrote: > please do "hg roll" to discard my broken patch and apply the attached one. There is a problem with the check: $ cat path.erg Form: (T NIL T T) CORRECT: (1234 T) CLISP : ERROR I think instead of --- a/tests/path.tst Tue Apr 12 10:49:33 2011 -0400 +++ b/tests/path.tst Wed Apr 13 10:21:05 2011 -0400 @@ -1309,7 +1309,7 @@ NIL (ext:default-directory))))) #+(and clisp win32) T -#+(and clisp unicode) +#+(and clisp unicode (not macos)) (block test-weird-pathnames (handler-bind ((parse-error ;; http://article.gmane.org/gmane.lisp.clisp.devel:18772 @@ -1338,7 +1338,7 @@ NIL ;; DOS attack: bad pathnames in search can break LOAD ;; http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=443520 ;; http://thread.gmane.org/gmane.lisp.clisp.devel/18532 -#+(and clisp unicode) +#+(and clisp unicode (not macos)) (letf* ((custom:*pathname-encoding* charset:iso-8859-1) ; 1:1 (weird (concatenate 'string "weird" (string (code-char 160)))) (good "path-tst-good-file") (dir "path-tst-load-weird-dir/") you meant --- a/tests/path.tst Tue Apr 12 10:49:33 2011 -0400 +++ b/tests/path.tst Wed Apr 13 21:05:01 2011 +0200 @@ -1309,7 +1309,7 @@ (ext:default-directory))))) #+(and clisp win32) T -#+(and clisp unicode) +#+(and clisp unicode (not macos)) (block test-weird-pathnames (handler-bind ((parse-error ;; http://article.gmane.org/gmane.lisp.clisp.devel:18772 @@ -1333,12 +1333,12 @@ (equal (directory "weird*") dir)) (eq custom:*pathname-encoding* charset:iso-8859-1)))) (delete-file weird))))) -#+(and clisp unicode) (T NIL T T) +#+(and clisp unicode (not macos)) (T NIL T T) ;; DOS attack: bad pathnames in search can break LOAD ;; http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=443520 ;; http://thread.gmane.org/gmane.lisp.clisp.devel/18532 -#+(and clisp unicode) +#+(and clisp unicode (not macos)) (letf* ((custom:*pathname-encoding* charset:iso-8859-1) ; 1:1 (weird (concatenate 'string "weird" (string (code-char 160)))) (good "path-tst-good-file") (dir "path-tst-load-weird-dir/") @@ -1356,7 +1356,7 @@ *load-var*) (eq custom:*pathname-encoding* charset:iso-8859-1)) (rmrf dir))) -#+(and clisp unicode) (1234 T) +#+(and clisp unicode (not macos)) (1234 T) #+clisp ;; bug#3124200 (let* ((dir "tmp-dir/") |
From: Sam S. <sd...@gn...> - 2011-04-13 19:20:24
|
Hi, > * Carlos Ungil <pneybf.hatvy@tznvy.pbz> [2011-04-13 21:12:04 +0200]: > On Wed, Apr 13, 2011 at 4:22 PM, Sam Steingold <sd...@gn...> wrote: >> please do "hg roll" to discard my broken patch and apply the attached one. > > There is a problem with the check: > $ cat path.erg > Form: (T NIL T T) > CORRECT: (1234 T) > CLISP : ERROR > > I think instead of .... yes, of course. does it work with your change? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.5 (Final) X 11.0.60900031 http://truepeace.org http://dhimmi.com http://iris.org.il http://camera.org http://www.PetitionOnline.com/tap12009/ http://mideasttruth.com Whom computers would destroy, they must first drive mad. |
From: Carlos U. <car...@gm...> - 2011-04-13 19:53:05
|
On Wed, Apr 13, 2011 at 9:20 PM, Sam Steingold <sd...@gn...> wrote: > yes, of course. > does it work with your change? Yes, with those tests disabled everything is ok. Carlos |