From: Richard M K. <kr...@pr...> - 2022-08-28 19:57:03
|
Thanks, Doug. Douglas Katzman <do...@go...> wrote: > I take it that the code in upp.lisp is logic you employed to produce the > experiments and results, mixed with actual patches? Do you have the patches > separated out also? There are no patches per se: everything in those files are structured for the purpose of encapsulations around MAKE-PATHNAME, PARSE-NAMESTRING, etc., for people to "try things out". > Are you willing to shepherd these through after the release or are > you asking that one of the frequent committers take some ownership? Sure, I'd be happy to implement any/all of the proposals within SBCL, but didn't want to dive in and do it before getting a steer from maintainers on feasibility. These are incompatible, though maybe marginal, changes, after all. Please let me know if some/all changes ought to be dependent on a new build-time feature, or otherwise be made backwardly compatible? I think all the stuff I'm talking about shouldn't be too disruptive, but I've no idea what users do out in the world. > I'm slightly curious as to what drives your passion for hammering out > pathname-related bugs - do you hit snags around them often? What drives me? Probably I'm just peculiar. But also, I write a lot of "boring" programs in Lisp. I'm fairly productive at reading and writing files once I've got them open, but I do hit snags at the pathname level. I would say that that most snags start with me doing something wrong, but eventually result in a pathname that something else, often in SBCL, handles weirdly. I'm also pretty bad at not getting worked up about edge cases in code I'm able to understand; and there are loads of edge cases in pathname routines due to the representational issues. [*] Anyhow, I don't believe there's a uniquely correct way to do pathnames & files in a CL implementation; there's inherent arbitrariness to some decisions. However, I do believe that the system gets easier to work with when there are fewer possible filename representations to cope with. Regards, Richard [*] A couple messages back you wrote: > SB-COVER has what I presume is the right idea in its function > pathname-as-directory. I'm sorry to belabor this, but AFAICT there are zero correct implementations of this function around anywhere. Here's the routine from SB-COVER: (defun pathname-as-directory (pathname &optional (errorp t)) (let ((pathname (merge-pathnames pathname))) (if (and (member (pathname-name pathname) '(nil :unspecific)) (member (pathname-type pathname) '(nil :unspecific))) pathname (if errorp (error "~S does not designate a directory" pathname) (make-pathname :directory (append (or (pathname-directory pathname) (list :relative)) (list (file-namestring pathname))) :name nil :type nil :version nil :defaults pathname))))) Here's a list of problems I can spot: 1. By treating NIL and :UNSPECIFIC as equivalent for the name and type, for any given directory component, there are 4 pathnames this can return, and the 4 work differently as arguments to MERGE-PATHNAMES, or to PATHNAME-MATCH-P, etc. IMO this routine should return a normalized result whose name, type, and version are all NIL. (Oddly enough, if it didn't test the name and type components at all, it would have that result for those 4 combinations, since they all have a FILE-NAMESTRING of "", which SBCL canonicalizes out.) 2. If the result is only meant for use in (make-pathname :name <string> :type <string> :defaults <directory-pathname>) then I don't know why it's only testing for 4 cases: there are 9 ways to be a pathname that are equivalent for that; here are the 5 it's missing: name type ---- ---- "" nil "" :unspecific "." nil "." :unspecific "" "" And, actually, the routine would be better off if it avoided the conversion for the cases where the name is "", because... 3. FILE-NAMESTRING always errors for pathnames having "" for the name. (Every dotfile can be represented, albeit abnormally, as a pathname whose name is "" and whose type is a string.. OPEN will open them, but FILE-NAMESTRING will error for all of them; so will this routine.) 4. But FILE-NAMESTRING is the wrong thing here even when it doesn't error. In CMUCL-descended pathname libraries, strings in pathname components are in "native" syntax, not namestring syntax. These are incorrect to mix. For instance * (sb-cover::pathname-as-directory (make-pathname :name "a*" :type "txt" :defaults #P"/tmp/") nil) #P"/tmp/a\\\\\\*.txt/" * (sb-cover::pathname-as-directory (make-pathname :name :wild :type :wild :defaults #P"/tmp/") nil) #P"/tmp/\\*.\\*/" The input in first case isn't a wild pathname, but does contain a wildard character as a literal. So FILE-NAMESTRING escapes the asterisk, and PATHNAME-AS-DIRECTORY's result is a pathname that won't name the same file as its input (and its namestring has an extra round of escaping, of course). The input in second case is a wild pathname. It's unclear what "pathname-as-directory" should mean for wild pathnames (a pathname that matches all directories whose as-file names would match, maybe?), but there's no semantics-preserving way to to convert all of SBCL's wild pathnames between "as file" and "as directory" forms (I'll spare you that analysis). Most routines that purport to convert from "as file" form into "as directory" just error when the arugment is wild; that seems as good as anything else until somebody squares that semantic circle. 5. In one case where FILE-NAMESTRING gives an acceptable string to use as a directory level, the result turns out to be one of the broken cases for other operators: * (pathname-directory (sb-cover::pathname-as-directory "/tmp/.." nil)) (:ABSOLUTE "tmp" "..") The ".." ends up meaning something different in DIRECTORY than :UP does; possibly a problem for some callers. Arguably that's not this routine's fault, exactly, but this kind of thing is one way for snags to get introduced into a program. 6. So let's say that we don't care about pathnames with "" for the name, filenames with characters that are wildcards or escape characters in SBCL's namestring syntax, "dotdot", or using the resulting pathname for anything other than the DEFAULTS argument to MAKE-PATHNAME. Is this routine correct under those constraints? Nope. FILE-NAMESTRING is still the wrong thing. For any filename containing N dots, there are N+2 distinct name/type pairs that will map to that filename, and when N>0, most of them have (or should have?) namestrings containing escape characters: ;; All of these map to "the filename a.b.c" * (make-pathname :name "a.b" :type "c") #P"a.b.c" * (make-pathname :name "a.b.c" :type nil) #P"a\\.b\\.c" * (make-pathname :name "a.b.c" :type :unspecific) #P"a.b.c" ;; this is probably missing backslashes, I guess * (make-pathname :name "a" :type "b.c") #P"a.b\\.c" All of these will end up converting to the wrong thing in SB-COVER's PATHNAME-AS-DIRECTORY... and also Osicat's, CL-FAD's, and UIOP's ENSURE-DIRECTORY-PATHNAME all for the sorts of reasons; and I'd guess they'll all have these result on SBCL, CMUCL, Clasp, and SCL if that's still around. Anyhow, I'm sorry to have gone on this long. Here's a rendition of what I think would be a more correct implementation for SBCL today; under my proposed changes it'll continue to work, but some bits will be unnecessary. (FILE-NAMESTRING remains the wrong thing to use however, because of the native vs. namestring syntax issue.) IMO it still needs a lot of comments because there are a ton of orthogonal details to keep in mind: (defun pathname-as-directory (pathname) "If PATHNAME maps to a filename in OPEN, return a pathname representing a variant of that filename that can only resolve to a directory. The return value's name, type and version components are all NIL." ;; SB-COVER's version of PATHNAME-AS-DIRECTORY merges is argument. ;; I'm not sure if this operation ought to do that: I think of this ;; as a purely "syntactic" transformation, orthogonal to filling in ;; defaults. But it is worthwhile to coerce, to avoid repeated ;; re-parsing if PATHNAME is a namestring. (setq pathname (pathname pathname)) ;; SBCL's wildcard matching for the name & type components under ;; DIRECTORY work differently than for directory level elements, so ;; we can't convert to a directory level in a meaningfully ;; semantics-preserving way. We don't actually care about whatever's ;; in the directory, though, so we check only the "basename" ;; components. (The version checking is just pedantry, however.) (assert (not (or (wild-pathname-p pathname :name) (wild-pathname-p pathname :type) (wild-pathname-p pathname :version))) (pathname) "Can't convert wild pathname ~S into as-directory form." pathname) (let* ((basename ;; The only robust way to try to combine the name & type ;; into a string for the directory is "native" unparsing: ;; everything else, including trying to "reason" about what ;; NATIVE-NAMESTRING will do, will inevitably disagree with ;; the rest of SBCL now or as it evolves. But there's no ;; FILE-NAMESTRING counterpart for NATIVE-NAMESTRING. In ;; this routine we don't want to error in case there's ;; something un-unparseable in the device or directory, so ;; we can't just call NATIVE-NAMESTRING and parse the ;; basename out. We need to make a pathname having just the ;; name & type, and convert that into a string. (This is ;; all to say, there really ought to be more "official" ;; support for the conversions among portions of pathnames, ;; namestrings, and filenames) (let ((basename-pathname (make-pathname :device nil :directory nil :defaults pathname))) ;; NATIVE-NAMESTRING can error; that's SBCL's de facto ;; definition of which pathnames "do not map to ;; filenames" in 19.1.2 terminology. If we exposed N-N's ;; errors, though, the user might be confused whether ;; that's a caller error or a bug in this routine. (It's ;; not a bug; you might consider it a defect, but all the ;; same it's deliberate.) Rather than forcing them to read ;; this comment, let's just signal our own error ;; saying why we're stopping.. (handler-case (sb-ext:native-namestring basename-pathname) (sb-kernel:no-native-namestring-error (e) (declare (ignore error)) (error 'sb-kernel:no-native-namestring-error basename-pathname "Can't convert the name and type of ~S into a directory level:~ they don't map to a basename." pathname))))) ;; Canonicalize or canonicalize away a few possible basename ;; strings, to make the result less dangerous to use. (tail (cond ((member basename '("" ".") :test #'equal) ()) ((equal basename "..") (list :up)) (t (list basename))))) (make-pathname :defaults pathname :directory ;; Don't use (:RELATIVE) when NIL would do. (if tail (append (or (pathname-directory pathname) '(:relative)) tail) (pathname-directory pathname)) :name nil :type nil :version nil))) |