Maxima -- GPL CAS based on DOE-MACSYMA / Bugs / #4692 maxima-index.lisp provides wrong lengths

Here is a replacement for read-info-text in cl-info.lisp that appears to work for me with gcl. "? build_info" isn't truncated.

(defun read-info-text (dir-name parameters)
  (let*
    ((value (cdr parameters))
     (filename (car value))
     (byte-offset (cadr value))
     (char-count (caddr value))
     (octet-buffer (make-array char-count
                               :fill-pointer 0
                               :element-type '(unsigned-byte 8)))
     (path+filename (merge-pathnames (make-pathname :name filename) dir-name)))
    (handler-case
    (with-open-file (in path+filename :direction :input :element-type '(unsigned-byte 8))
      (unless (plusp byte-offset)
        ;; If byte-offset isn't positive there must be some error in
        ;; the index.  Return nil and let the caller deal with it.
        (return-from read-info-text nil))
      (file-position in byte-offset)
          (flet ((read-utf8-sequence (stream)
                   ;; Read one UTF-8 byte sequence from STREAM,
                   ;; pushing each byte onto OCTET-BUFFER (an
                   ;; adjustable vector with a fill pointer). Returns
                   ;; T on success, or NIL at end of stream.
                   (flet ((next-byte ()
                            (let ((b (read-byte stream nil nil)))
                              (when b
                                (vector-push-extend b octet-buffer))
                              b))
                          (read-continuation ()
                            (let ((b (read-byte stream nil nil)))
                              (when (null b)
                                (error "UTF-8: unexpected end of stream in multi-byte sequence"))
                              (unless (= (logand b #xC0) #x80)
                                (error "UTF-8: invalid continuation byte #x~2,'0X" b))
                              (vector-push-extend b octet-buffer))))

                     (let ((b0 (next-byte)))
                       (when (null b0)
                         (return-from read-utf8-sequence nil))

                       (cond
                         ((< b0 #x80)) ; 1-byte sequence, nothing more to read
                         ((< b0 #xC0)
                          (error "UTF-8: unexpected continuation byte #x~2,'0X" b0))
                         ((< b0 #xE0) (read-continuation)) ; 2-byte
                         ((< b0 #xF0) (read-continuation) (read-continuation)) ; 3-byte
                         ((< b0 #xF8) (read-continuation) (read-continuation) ; 4-byte
                          (read-continuation))
                         (t
                          (error "UTF-8: invalid leading byte #x~2,'0X" b0)))

                       t))))
            ;; Read the requested number of characters.
            (handler-case
                (unless (= char-count
                           (loop with read-count = 0
                                 for count from 0 below char-count
                                 while (read-utf8-sequence in)
                                 do (incf read-count)
                                 finally (return read-count)))
                  (maxima::merror "End of file reading info file"))
              (error ()
                (maxima::merror "Bad utf-8 encoding")))
            ;; Got all the code points.  Convert the octets into characters.
        (map 'string #'code-char octet-buffer)))
      (error () (maxima::merror "Cannot find documentation for `~M': missing info file ~M~%"
                (car parameters) (namestring path+filename))))))

This needs more work, but seems ok. Of course, this assumes the info files are utf-8 encoded. We need to make sure all the info files are utf-8 encoded.

Robert Dodier - 2026-04-08

@rtoy Thanks for working on it, but I dunno, reworking read-info-text looks like a bridge too far for me. If there is some desire to make Unicode stuff work for GCL, maybe you can invest the same time and energy in GCL. Or maybe not, as you wish, either way is A-OK by me.

@rtoy @jgmbenoit @villate The Interwebs claim that makeinfo has a command line flag --no-utf-8. If someone wants to package Maxima with GCL, that flag could be enabled. The examples in the Texinfo documentation are already using ASCII art pretty printing (i.e., using the old ASCII art pretty printer instead of the current Unicode-enabled pretty printer). There might be a few Unicode characters here and there, but not many, I believe.

This is, of course, ignoring languages which use Unicode characters for diacritics. That's just not going to work for GCL. If someone feels strongly about it, maybe they can work on GCL. I agree it's undesirable but under the circumstances I don't believe it is the Maxima project's responsibility to fix it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Raymond Toy - 2026-04-08
  
  Are you going ;to leave gcl broken then? FWIW, I have offered to help Camm with unicode, but he wants strings to be utf-8 and I don't see how that can possibly work when you can do arbitrary writes to anywhere in the string. Maybe I lack imagination.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Robert Dodier - 2026-04-08
    
    About what Maxima should do for GCL, it makes sense to me to enable --no-utf-8 for makeinfo (and also inspect the Texinfo files to replace any stray Unicode characters) so that GCL can handle the resulting output.
    
    About GCL's Unicode support, my advice is just go ahead and try to do it the way Camm wants to. On the face of it, it seems like a fixed width will be simpler to work with, but maybe it will work out, and you can always back up and try again if it doesn't.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Raymond Toy - 2026-04-08
      
      I'm ok with this as long as it's the default. I guess all the translations can use latin1 encoding. Well, maybe not for the Japanese translation, but I don't know if that builds anymore. I don't normally build any of the translations.
      
      If gcl uses utf-8 strings, I'm not likely to help. It seems overly complicated and prone to bugs. Or maybe I'm just not clever enough to know how to do this efficiently and transparently.
      
      Anyway, that's a gcl problem, not a maxima problem (yet?).
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jerome Benoit - 2026-04-05

I play with the latest unstable packages of gcl27 and maxima. The current texinfo version in unstable is 7.3-2.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leo Butler - 2026-04-06

build_index.pl (and update_examples) are anachronisms.

Irregardless of how we fix this bug report, I think the best way forward
would be to re-write this perl script in lisp and generate the index
file at run-time, not build-time.

BTW, it is an embarrassment that we use perl and not lisp to do this
simple text stuff. There may have been a time when that made sense, but
that is long gone.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Raymond Toy - 2026-04-06
  
  I think many years ago Rupert Swarbrik (?) started on a lisp replacement for build_index.pl. Never finished. I'm not sure how that would work if a Lisp doesn't have unicode support, but it would probably be easier today since we have pregexp support.
  
  I have thought about using m4 to do update_examples, so that everytime the manual is generated, the examples are too. But not sure if that's a good idea because it would probably really slow down generation of the docs. Plus, someone would have to check the manual that all the examples were converted correctly.
  
  Also, people complained that my changes to generate grad results at startup took too long (a second or two extra?). Generating the index at start-up would probably take even longer.
  
  But maybe we can get AI to convert build_index.pl and update_examples to Lisp? I would be ok with that.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Robert Dodier - 2026-04-08
  
  Whether or not the index generator is Lisp or Perl is beside the point for the purpose of resolving the bug report -- the problem is that GCL doesn't understand multi-byte characters. @l_butler, with all due respect, can you please open a separate ticket to pursue the reimplementation of build_index.pl, should you choose to take up that topic.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jerome Benoit - 2026-04-10
    
    One possibility is to generate a maxima-index.lisp data file per Lisp implementation beside the utf-8 one. For example, maxima-index-gcl.lisp can be generated for the GCL implementation.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Jaime E. Villate - 2026-04-10
      
      Generating maxima-index-gcl.lisp, without utf-8 characters, would not solve the problem. As Raymond has explained, texinfo introduces utf-8 characters that were not present in the Maxima documents sources. I did get correct info files with GCL using an older version of texinfo that didn't add extra utf-8 characters.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Jerome Benoit - 2026-04-10
        
        But generating maxima-index-gcl.lisp with lengths as counted by gcl would solve the issue if the info file contains non-ASCII character.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jaime E. Villate - 2026-04-08

I agree with Robert, and I'm still puzzled by the fact that Maxima 5.49 + GCL works fine for me but fails for Raymond and Jerome.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Raymond Toy - 2026-04-09
  
  Figured it out. The difference is texinfo. IIRC, you use 6.8. I was using 7.3. With 6.8, "? build_info" is not truncated. The last line is "'maxima_frontend_version' accordingly.", which is correct. Presumably, somewhere between 6.8 and 7.3, texinfo switched to using the left backquote character (non-ASCII) for @code in info files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jerome Benoit - 2026-04-10
    
    Other non-ASCII characters are around as in the description page of carlson_rj. In the current Sid, echo '? carlson_rj' | maxima gives
    
    Maxima 5.49.0 https://maxima.sourceforge.io
    using Lisp GNU Common Lisp (GCL) GCL 2.7.1 git tag Version_2_7_2pre13
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
    (%i1)
    
    -- Function: carlson_rj (<x>, <y>, <z>, </z></y></x>
    )
    Carlson's RJ integral is defined by
    
    R_J(x,y,z) = 1/2*integrate(1/(sqrt(t+x)*sqrt(t+y)*sqrt(t+z)*(t+p)), t, 0, inf) See Numerical Computation of Real or Complex Elliptic Integrals (https://arxiv.org/pdf/math/9409227) for more information. It is related to the elliptic integral of the third kind (_elliptic_pi_) by phi _ _ 1 _ ─────────────────────────────────────────── dtheta = _ 2 2 _ sqrt(1 - m sin (theta)) (n sin (theta) + 1) 0 carlson_rj(c - 1, c - m, c, n + c) n carlson_rf(c - 1, c - m, c) - ────────────────────────────────────
    
    (%o1) true
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Jaime E. Villate - 2026-04-10
      
      In this case I think it is a bug of the m4 macros that Raymond introduced to parse mathematical equations. The original expression in the manual source was an ASCII only equation. I think the m4 macro should use only ASCII for the 2d representation of the equation in the info manual.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Jerome Benoit - 2026-04-10
        
        Note that the info files also contains author names and paper titles with UTF-8 characters.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Raymond Toy - 2026-04-10
        
        Ah. I think I regenerated the examples using update_examples and it used unicode characters for the integral signs. We need to modify update_examples not to use unicode. Or just bite the bullet and add a utf-8 decoder for gcl so we can read the file.
        
        I certainly prefer this approach because it's localized to just fixing read-info-text for gcl instead of forcing somewhat arbitrary conditions on what the user manual can use. No one is going to remember and we'll end up debugging this again, and again, and again.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Dodier - 2026-04-21

@jgmbenoit I notice that Debian bugreport #1131495 is marked "closed". From your point of view, is there any further action needed on the part of the Maxima project?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

maxima-index.lisp provides wrong lengths

Computer Algebra System written in Common Lisp

Group

Searches

Help

#4692 maxima-index.lisp provides wrong lengths

Discussion