|
From: Sam S. <sd...@gn...> - 2005-05-27 19:01:43
|
Bruno,
we have a problem with get-charset-range:
on mingw during bootstrap people have been complaining of the error:
(ERROR_INVALID_HANDLE): The handle is invalid.
during load of subtypep.lisp.
this has been traced down to iconv_range().
this patch fixes the problem:
--- stream.d 19 May 2005 19:05:33 -0400 1.519
+++ stream.d 27 May 2005 14:26:56 -0400
@@ -4335,7 +4335,11 @@
var size_t res = iconv(cd,&inptr,&insize,&outptr,&outsize);
if (res == (size_t)(-1)) {
var int my_errno = OS_errno;
- if (my_errno == EILSEQ) { /* invalid input? */
+ if (my_errno == EILSEQ
+ #ifdef WIN32_NATIVE
+ || my_errno == ERROR_INVALID_HANDLE
+ #endif
+ ) { /* invalid input? */
end_system_call();
# ch not encodable -> finish the interval
if (have_i1_i2) {
(apparently that's what is set by iconv() on failure: I got it for
MULELAO-1 encoding, start=0=i1, i=161)
OTOH, the whole notion of precomputing these ranges is suspect.
the only way they are used is when you call (SUBTYPEP ENC1 ENC2) which
is an extremely rare operation anyway.
there appears to be no real reason to pre-compute the cache during
bootstrap:
con:
1. longer bootstrap
2. larger memory image (16K - 1%)
3. risk of error for no gain: most people do not need this cache.
thus I propose the additional patch:
--- subtypep.lisp 09 Feb 2005 09:39:53 -0500 1.12
+++ subtypep.lisp 27 May 2005 14:50:13 -0400
@@ -1170,9 +1170,6 @@
type)))
;; Conversion of an encoding to a list of intervals.
#+UNICODE
-(let ((table (make-hash-table :key-type '(or string symbol) :value-type 'simple-string
- :test 'stablehash-equal :warn-if-needs-rehash-after-gc t)))
- ;; cache: charset name -> list of intervals #(start1 end1 ... startm endm)
#| ; Now in C and much more efficient.
(defun charset-range (encoding start end)
(setq start (char-code start))
@@ -1192,19 +1189,16 @@
;; Return the definition range of a character set. If necessary, compute it
;; and store it in the cache.
(defun get-charset-range (charset &optional maxintervals)
+ (let ((table #.(make-hash-table :key-type '(or string symbol)
+ :value-type 'simple-string
+ :test 'stablehash-equal
+ :warn-if-needs-rehash-after-gc t)))
+ ;; cache: charset name -> list of intervals #(start1 end1 ... startm endm)
(or (gethash charset table)
(setf (gethash charset table)
(charset-range (make-encoding :charset charset)
(code-char 0) (code-char (1- char-code-limit))
- maxintervals))))
- ;; Fill the cache, but cache only the results with small lists of intervals.
- ;; Some iconv based encodings have large lists of intervals (up to 5844
- ;; intervals for ISO-2022-JP-2) which are rarely used and not worth caching.
- (do-external-symbols (sym (find-package "CHARSET"))
- (let* ((charset (encoding-charset (symbol-value sym)))
- (computed-range (get-charset-range charset 100))
- (intervals (/ (length computed-range) 2)))
- (when (>= intervals 100) (remhash charset table)))))
+ maxintervals)))))
#| ;; Older code for a special case.
;; Test whether all characters encodable in encoding1 are also encodable in
;; encoding2.
--
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://pmw.org.il/> <http://www.iris.org.il> <http://www.camera.org>
<http://www.jihadwatch.org/> <http://www.palestinefacts.org/>
A slave dreams not of Freedom, but of owning his own slaves.
|
|
From: Yaroslav K. <kav...@je...> - 2005-05-30 10:52:09
|
Sam Steingold:
<skip>
> --- stream.d 19 May 2005 19:05:33 -0400 1.519
> +++ stream.d 27 May 2005 14:26:56 -0400
> @@ -4335,7 +4335,11 @@
> var size_t res = iconv(cd,&inptr,&insize,&outptr,&outsize);
> if (res == (size_t)(-1)) {
> var int my_errno = OS_errno;
> - if (my_errno == EILSEQ) { /* invalid input? */
> + if (my_errno == EILSEQ
> + #ifdef WIN32_NATIVE
> + || my_errno == ERROR_INVALID_HANDLE
> + #endif
> + ) { /* invalid input? */
> end_system_call();
> # ch not encodable -> finish the interval
> if (have_i1_i2) {
>
<skip>
> --- subtypep.lisp 09 Feb 2005 09:39:53 -0500 1.12
> +++ subtypep.lisp 27 May 2005 14:50:13 -0400
> @@ -1170,9 +1170,6 @@
> type)))
> ;; Conversion of an encoding to a list of intervals.
> #+UNICODE
> -(let ((table (make-hash-table :key-type '(or string symbol) :value-type
> 'simple-string
> - :test 'stablehash-equal
> :warn-if-needs-rehash-after-gc t)))
> - ;; cache: charset name -> list of intervals #(start1 end1 ... startm
> endm)
> #| ; Now in C and much more efficient.
> (defun charset-range (encoding start end)
> (setq start (char-code start))
> @@ -1192,19 +1189,16 @@
> ;; Return the definition range of a character set. If necessary,
> compute it
> ;; and store it in the cache.
> (defun get-charset-range (charset &optional maxintervals)
> + (let ((table #.(make-hash-table :key-type '(or string symbol)
> + :value-type 'simple-string
> + :test 'stablehash-equal
> + :warn-if-needs-rehash-after-gc t)))
> + ;; cache: charset name -> list of intervals #(start1 end1 ...
> startm endm)
> (or (gethash charset table)
> (setf (gethash charset table)
> (charset-range (make-encoding :charset charset)
> (code-char 0) (code-char (1-
> char-code-limit))
> - maxintervals))))
> - ;; Fill the cache, but cache only the results with small lists of
> intervals.
> - ;; Some iconv based encodings have large lists of intervals (up to
> 5844
> - ;; intervals for ISO-2022-JP-2) which are rarely used and not worth
> caching.
> - (do-external-symbols (sym (find-package "CHARSET"))
> - (let* ((charset (encoding-charset (symbol-value sym)))
> - (computed-range (get-charset-range charset 100))
> - (intervals (/ (length computed-range) 2)))
> - (when (>= intervals 100) (remhash charset table)))))
> + maxintervals)))))
> #| ;; Older code for a special case.
> ;; Test whether all characters encodable in encoding1 are also
> encodable in
> ;; encoding2.
RUN-ALL-TESTS: grand total: 1 error out of 10,422 tests
$ cat /home/src/clisp/clisp/build-full/tests/*.erg
Form: (IF *NO-ICONV-P* "AZ" (CONVERT-STRING-FROM-BYTES '#(255 254 65 0
13) (MAKE-ENCODING :CHARSET "utf-16" :INPUT-ERROR-ACTION #\Z)))
CORRECT: "AZ"
CLISP : ERROR
Win32 Error 6 (ERROR_INVALID_HANDLE): The handle is invalid.
Thanks.
--
WBR, Yaroslav Kavenchuk.
|
|
From: Bruno H. <br...@cl...> - 2005-05-30 12:57:06
|
Sam wrote:
> 1. longer bootstrap
get-charset-range is not the bottleneck regarding the bootstrap times.
If you had done a profiling of the production of interpreted.mem and
halfcompiled.mem, you would have noticed that
- it spends 1/4 of its time expanding the code of
%expand-special-declarations in init.lisp,
- while loading condition.lisp, it spends most of its time expanding
CALL-NEXT-METHOD and NEXT-METHOD-P macros.
Lesson of the day: Concentrate on the important things, not on the unimportant
ones.
Bruno
|
|
From: Sam S. <sd...@gn...> - 2005-06-06 18:48:14
|
> * Bruno Haible <oe...@py...> [2005-05-30 14:55:13 +0200]: > > Sam wrote: >> 1. longer bootstrap > > get-charset-range is not the bottleneck regarding the bootstrap times. the only file that loads visibly slowly is subtype.lisp. it might not be the real bottleneck, but it surely is a noticeably waste of time and space. (and its importance will only go up as you speed up other places) -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://pmw.org.il/> <http://www.dhimmi.com/> <http://ffii.org/> <http://www.iris.org.il> <http://www.jihadwatch.org/> Lisp suffers from being twenty or thirty years ahead of time. |
|
From: Bruno H. <br...@cl...> - 2005-06-06 19:52:01
|
Sam wrote: > the only file that loads visibly slowly is subtype.lisp. Yes. And in some configurations, gcc needs 5 minutes to compile spvw.d. > but it surely is a noticeably waste of time and space. The time is not wasted. As I explained, during this time it detects whether the iconv() implementation is safe for use on all the advertised encodings. Bruno |
|
From: Sam S. <sd...@gn...> - 2005-06-06 20:56:07
|
> * Bruno Haible <oe...@py...> [2005-06-06 21:50:50 +0200]: > >> but it surely is a noticeably waste of time and space. > > The time is not wasted. As I explained, during this time it detects > whether the iconv() implementation is safe for use on all the > advertised encodings. it is a waste of boot time. this should be in test, not boot. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://pmw.org.il/> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.jihadwatch.org/> If it has syntax, it isn't user friendly. |
|
From: Bruno H. <br...@cl...> - 2005-05-30 12:57:13
|
Sam wrote:
> we have a problem with get-charset-range:
> on mingw during bootstrap people have been complaining of the error:
> (ERROR_INVALID_HANDLE): The handle is invalid.
> during load of subtypep.lisp.
> this has been traced down to iconv_range().
> this patch fixes the problem:
>
> --- stream.d 19 May 2005 19:05:33 -0400 1.519
> +++ stream.d 27 May 2005 14:26:56 -0400
> @@ -4335,7 +4335,11 @@
> var size_t res = iconv(cd,&inptr,&insize,&outptr,&outsize);
> if (res == (size_t)(-1)) {
> var int my_errno = OS_errno;
> - if (my_errno == EILSEQ) { /* invalid input? */
> + if (my_errno == EILSEQ
> + #ifdef WIN32_NATIVE
> + || my_errno == ERROR_INVALID_HANDLE
> + #endif
> + ) { /* invalid input? */
> end_system_call();
> # ch not encodable -> finish the interval
> if (have_i1_i2) {
This patch is nonsense. It replaces a broken code with another broken code.
I explained twice already: iconv() is a function specified in POSIX, and
sets its error value in 'errno', not in 'GetLastError()'. It's broken in
two ways currently:
1. It uses OS_errno where it should use errno,
2. It uses OS_error() for signalling the error, where is should use
ANSIC_error() or POSIX_error().
I'm now committing a patch that fixes both of these.
Now, if it still doesn't work for you or Yaroslav Kavenchuk, you must be
using an iconv() library that is not POSIX compliant or that was built with
incompatible flags. After more than two month of hassles, I don't want to
hear any more about it. It's not worth the time, since clisp supports most
important encodings already natively without iconv(). So I'm adding a
"#undef HAVE_ICONV" in win32.d.
> OTOH, the whole notion of precomputing these ranges is suspect.
?! It's the principle of ahead-of-time compilation.
> there appears to be no real reason to pre-compute the cache during
> bootstrap:
> con:
> 3. risk of error for no gain: most people do not need this cache.
On the contrary: this is a PRO. It allows us to guarantee that the binding
with the iconv() doesn't lead to a clisp that crashes. It allows us to add
the right #ifdefs with __GLIBC_MINOR__ without waiting for 10 users to
report crashes.
> thus I propose the additional patch:
>
> --- subtypep.lisp 09 Feb 2005 09:39:53 -0500 1.12
> +++ subtypep.lisp 27 May 2005 14:50:13 -0400
> @@ -1170,9 +1170,6 @@
> type)))
> ;; Conversion of an encoding to a list of intervals.
> #+UNICODE
> -(let ((table (make-hash-table :key-type '(or string symbol) :value-type 'simple-string
> - :test 'stablehash-equal :warn-if-needs-rehash-after-gc t)))
> - ;; cache: charset name -> list of intervals #(start1 end1 ... startm endm)
> #| ; Now in C and much more efficient.
> (defun charset-range (encoding start end)
> (setq start (char-code start))
> @@ -1192,19 +1189,16 @@
> ;; Return the definition range of a character set. If necessary, compute it
> ;; and store it in the cache.
> (defun get-charset-range (charset &optional maxintervals)
> + (let ((table #.(make-hash-table :key-type '(or string symbol)
> + :value-type 'simple-string
> + :test 'stablehash-equal
> + :warn-if-needs-rehash-after-gc t)))
> + ;; cache: charset name -> list of intervals #(start1 end1 ... startm endm)
> (or (gethash charset table)
> (setf (gethash charset table)
> (charset-range (make-encoding :charset charset)
> (code-char 0) (code-char (1- char-code-limit))
> - maxintervals))))
> - ;; Fill the cache, but cache only the results with small lists of intervals.
> - ;; Some iconv based encodings have large lists of intervals (up to 5844
> - ;; intervals for ISO-2022-JP-2) which are rarely used and not worth caching.
> - (do-external-symbols (sym (find-package "CHARSET"))
> - (let* ((charset (encoding-charset (symbol-value sym)))
> - (computed-range (get-charset-range charset 100))
> - (intervals (/ (length computed-range) 2)))
> - (when (>= intervals 100) (remhash charset table)))))
> + maxintervals)))))
> #| ;; Older code for a special case.
Don't do this. This is bad.
1. It removes a build-time verification, leading to possible crashes at
runtime.
2. #. and #, are BAD. Never use them if you can avoid them.
Bruno
|
|
From: Sam S. <sd...@gn...> - 2005-06-06 18:53:40
|
> * Bruno Haible <oe...@py...> [2005-05-30 14:54:30 +0200]: > > This patch is nonsense. It replaces a broken code with another broken code. > I'm now committing a patch that fixes both of these. OK, so I now I know: to get you to fix something, I have to try to fix it twice :-) >> OTOH, the whole notion of precomputing these ranges is suspect. > ?! It's the principle of ahead-of-time compilation. it makes sense only if you are computing something useful. these ranges are useless. how about about storing a table of first 1000 primes or the first 1000 digits of pi in every memory image? they are just as useful. >> 3. risk of error for no gain: most people do not need this cache. > > On the contrary: this is a PRO. It allows us to guarantee that the > binding with the iconv() doesn't lead to a clisp that crashes. It > allows us to add the right #ifdefs with __GLIBC_MINOR__ without > waiting for 10 users to report crashes. Why not move this to the test suite then? > Don't do this. This is bad. > 1. It removes a build-time verification, leading to possible crashes > at runtime. we already have a test suite, and we should definitely expand it. > 2. #. and #, are BAD. Never use them if you can avoid them. why? this just puts a literal into the fas file. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.mideasttruth.com/> <http://www.jihadwatch.org/> <http://pmw.org.il/> <http://www.palestinefacts.org/> <http://www.memri.org/> Of course, I haven't tried it. But it will work. - Isaak Asimov |
|
From: Bruno H. <br...@cl...> - 2005-06-06 19:59:19
|
Sam wrote: > how about about storing a table of first 1000 primes or the first 1000 > digits of pi in every memory image? they are just as useful. The first 600 digits of pi are already stored there. Because otherwise, the first time a user tries setting the LONG-FLOAT-DIGITS to 100 or 500 digits, the recomputation of pi causes some noticeable delay. Bruno |