From: Richard M K. <kr...@pr...> - 2007-12-03 04:23:00
|
Hello, In February there was some discussion of realpath(3) and truenames [1]. It doesn't seem that any determinate decision was taken at the time. As I'm currently in the process of getting SBCL's innards to agree with SB-POSIX by replacing UNIX-NAMESTRING with NATIVE-NAMESTRING, I have to touch TRUENAME and PROBE-FILE, so I'd like to proceed to rewrite PROBE-FILE and TRUENAME in terms of realpath(3). If you don't care about this detail, feel free to skip the rest of this message. Here are the bullet points: * There are some flaws in how realpath(3) is specified in SUSv3, but these are unlikely to affect SBCL in reality. * SUSv3's realpath(3)'s notion of canonicality is better than our current implementation (see [2]). For absolute filenames that don't involve symlinks, realpath(3) and our current implementation agree, but: ** SBCL's current PROBE-FILE and TRUENAME return different results from realpath(3) in both pedestrian and pathological cases involving symlinks, and ** there are cases where realpath(3) is required to return results where our PROBE-FILE and TRUENAME lose, but some of our other file system functions don't. * I don't have access to a Solaris, OSF/1, OpenBSD, or Windows host at the moment to check whether realpath(3) is available and usable there. I'm only seriously concerned about Windows, however. AFAICT, the chief concerns with realpath(3) were raised by Christophe Rhodes, citing a realpath manual page on Linux: > [T]here are a couple of mismatches [between realpath(3) and TRUENAME], > of which probably the most serious is (from my man page): > > Avoid using this function. It is broken by design since (unless > using the non-standard resolved_path == NULL feature) it is > impossible to determine a suitable size for the output buffer, > resolved_path. According to POSIX a buffer of size PATH_MAX > suffices, but PATH_MAX need not be a defined constant, and may have > to be obtained using pathconf(). And asking pathconf() does not > really help, since on the one hand POSIX warns that the result of > pathconf() may be huge and unsuitable for mallocing memory. And on > the other hand pathconf() may return -1 to signify that PATH_MAX is > not bounded. > > so we have to audit every platform's implementation of realpath(). ISTM that there are three issues here: (1) whether and how the spec for realpath(3) is broken, (2) whether we might use realpath(3) anyway, (3) whether and how it differs from what TRUENAME does now and what it ought to do. I'll address these in order. (1) First the brokenness. The caveat above is a misleading way to describe some bugs in the SUSv3 specification, rather than a real issue; and it's almost completeley irrelevant for SBCL. The bugs in SUSv3 can be summarized as follows: SUSv3 implementations aren't required to define PATH_MAX (see [3]), but the specification for realpath(3) says | The generated pathname shall be stored as a null-terminated string, | up to a maximum of {PATH_MAX} bytes, in the buffer pointed to by | resolved_name. So it's unclear what a SUSv3 implementation that doesn't define PATH_MAX is supposed to do to implement realpath(3), and also what a programmer is supposed to do to allocate a buffer to pass to realpath(3) on such an implementation. GNU/Hurd is such an implementation, AFAIK, the only one; and I suspect that the cited text originated with a zealous GNU libc documenter. However, on a SUSv3 implementation that PATH_MAX and implements realpath(3) correctly these problems don't exist either for the OS implementor or the programmer. Now, SBCL already depends on PATH_MAX on FreeBSD, Darwin, Linux, and Windows, so we know PATH_MAX is defined there; (2) I've checked that PATH_MAX is defined on NetBSD; (3) if SBCL is ever ported to GNU/Hurd, we can take advantage of the documented extension to realpath there. (2) Now, do we have to audit realpath(3) everywhere? I don't know, but it's not unfeasible: create a file whose truename would be longer than PATH_MAX (in general you'll have to do this with several directories, because NAME_MAX is smaller than PATH_MAX everywhere), try to get its name with realpath(3), and ensure that that fails. I've run this on Linux/ppc, Linux/x86-64, NetBSD/x86, and Darwin/ppc, and have found that the realpath(3) implementations do correctly stop filling the buffer at PATH_MAX bytes (or fewer: Darwin/ppc doesn't seem to resolve a pathname with more than 256 levels of directory). (3) There are mismatches between realpath and SBCL's current implementation of TRUENAME. (a) realpath(3) returns the canonical name of a directory without a trailing slash. I've already introduced a feature to let PARSE-NATIVE-NAMESTRING to parse such filenames as directory names. (b) SBCL defines the truename of a dangling or self-referring symlink to be the pathname that denotes the symlink file name, whereas realpath(3) returns NULL, sets errno to ELOOP, and leaves the contents of the buffer undefined. This is easy to code around, in fact. (c) realpath(3) is required to resolve any symlink in a directory segment, so that no segment of the returned filename names a symlink. SBCL's implementation of TRUENAME doesn't try to resolve symlinks in directories, though it happens to do so in some cases: ;; In general, SBCL does not resolve symlinks in the directory part ;; of a filename. * (sb-posix:symlink "/bin" "/tmp/bin") 0 * (probe-file "/tmp/bin/ls") #P"/tmp/bin/ls" * (sb-unix::unix-realpath "/tmp/bin/ls") "/bin/ls" ;; But if a filename names a symlink, SBCL does resolve and merge ;; the target of the symlink. Sometimes, this means that SBCL's ;; TRUENAME agrees with realpath, sometimes it doesn't. * (ensure-directories-exist "/tmp/proper-directory/") "/tmp/proper-directory/" T * (sb-posix:chdir "/tmp/proper-directory") 0 * (sb-posix:symlink "/bin/ls" "proper-symlink") 0 * (probe-file #P"/tmp/proper-directory/proper-symlink") #P"/bin/ls" * (sb-unix::unix-realpath "/tmp/proper-directory/proper-symlink") "/bin/ls" * (sb-posix:symlink "." "symlink-to-dot") 0 * (probe-file "/tmp/symlink-to-proper-directory") #P"/tmp/proper-directory/" * (sb-unix::unix-realpath "/tmp/proper-directory/symlink-to-dot") "/tmp/proper-directory" * (sb-posix:chdir "/tmp") 0 * (sb-posix:symlink "proper-directory" "symlink-to-proper-directory") 0 * (probe-file "/tmp/symlink-to-proper-directory/symlink-to-dot") #P"/tmp/symlink-to-proper-directory/" * (sb-unix::unix-realpath "/tmp/symlink-to-proper-directory/symlink-to-dot") "/tmp/proper-directory" * (probe-file "/tmp/symlink-to-proper-directory/symlink-to-dot/symlink-to-dot") #P"/tmp/symlink-to-proper-directory/symlink-to-dot/" * (sb-unix::unix-realpath "/tmp/symlink-to-proper-directory/symlink-to-dot/symlink-to-dot") "/tmp/proper-directory" As you can see, the result of realpath is significantly more canonical than SBCL's current implementation, and so switching to it would be a win. (But notice that if you are willing to believe that a user might manage to wrap his mind around what SBCL does at present, then you might suppose that such a user could depend on the interaction between these corners of SBCL's TRUENAME and pathnames whose directory components start with '(:RELATIVE :BACK). I'm not convinced that such a user can exist, however.) (4) When *DEFAULT-PATHNAME-DEFAULTS* is a relative pathname, PROBE-FILE and TRUENAME lose when supplied a relative pathname, but other functions of SBCL's file system interface and realpath(3) win: * (sb-posix:chdir "/tmp") 0 * (setf *default-pathname-defaults* (make-pathname :directory '(:relative))) #P"" NIL * (open "proper-directory") #<SB-SYS:FD-STREAM for "file proper-directory" {5186A8C1}> * (close *) T * (file-author "proper-directory") "kreuter" * (file-write-date "proper-directory") 3405619811 * (probe-file "proper-directory") WARNING: *DEFAULT-PATHNAME-DEFAULTS* is a relative pathname. (But we'll try using it anyway.) failed AVER: "(NOT (RELATIVE-UNIX-PATHNAME? PATHNAME))" This is probably a bug in SBCL itself. (Alternatively, SBCL might have been corrupted by bad user code, e.g. by an undefined Lisp operation like (FMAKUNBOUND 'COMPILE), or by stray pointers from alien code or from unsafe Lisp code; or there might be a bug in the OS or hardware that SBCL is running on.) If it seems to be a bug in SBCL itself, the maintainers would like to know about it. Bug reports are welcome on the SBCL mailing lists, which you can find at <http://sbcl.sourceforge.net/>. [Condition of type SB-INT:BUG] * (sb-unix::unix-realpath "proper-directory") "/tmp/proper-directory" So taking advantage of realpath(3) gets us a slightly more consistent file system interface. * The following text occurs in the realpath(3) man page on NetBSD and Darwin: | CAVEATS | This implementation of realpath() differs slightly from the | Solaris implementation. The 4.4BSD version always returns | absolute pathnames, whereas the Solaris implementation will, | under certain circumstances, return a relative resolved_path | when given a relative pathname. I don't have access to a Solaris host to test on, so I can't say whether this is correct. It's probable that this comment is wrong, however: the publically available Solaris documentation for realpath(3) is identical to the text from the SUSv3 [4]. Any remaining comments or objections are welcome. Thanks, RmK [1] http://thread.gmane.org/gmane.lisp.steel-bank.devel/8555/focus=8561 [2] http://www.opengroup.org/onlinepubs/000095399/functions/realpath.html [3] http://www.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html [4] http://docs.sun.com/app/docs/doc/816-5168/realpath-3c?a=view |