gnuplot / Bugs / #2676 Build fails on macOS due to invalid byte sequence in sed

A portable, multi-platform, command-line driven graphing utility

#2676 Build fails on macOS due to invalid byte sequence in sed

Milestone: None

Status: closed-fixed

Owner: nobody

Labels: None

Priority:

Updated: 2024-05-28

Created: 2023-12-30

Creator: Anonymous

Private: No

Build of 6.0.0 fails on macOS with:

  for e in `grep -E "^[     ]*START_HELP" ja/term/ai.trm ja/term/aquaterm.trm ja/term/be.trm ja/term/block.trm ja/term/caca.trm ja/term/cairo.trm ja/term/canvas.trm ja/term/cgm.trm ja/term/context.trm ja/term/debug.trm ja/term/djsvga.trm ja/term/dumb.trm ja/term/dxf.trm ja/term/emf.trm ja/term/epson.trm ja/term/estimate.trm ja/term/fig.trm ja/term/gd.trm ja/term/gpic.trm ja/term/grass.trm ja/term/hpgeneric.h ja/term/hpgl.trm ja/term/imagen.trm ja/term/linux-vgagl.trm ja/term/latex_old.h ja/term/lua.trm ja/term/pbm.trm ja/term/pict2e.trm ja/term/pm.trm ja/term/post.trm ja/term/pslatex.trm ja/term/pstricks.trm ja/term/qt.trm ja/term/regis.trm ja/term/svg.trm ja/term/t410x.trm ja/term/tek.trm ja/term/texdraw.trm ja/term/tgif.trm ja/term/tkcanvas.trm ja/term/webp.trm ja/term/win.trm ja/term/wxt.trm ja/term/x11.trm ja/term/xlib.trm |\
         LC_ALL=C sort -f -t':' -k2` ; do \
      f=`echo $e |cut -d\: -f1` ; s=`echo $e | cut -d\: -f2` ;\
      sed -n "/^[   ]*$s/,/^[   ]*END_HELP/p" $f ; \
    done >allterm.tmp
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  […]

This is typically a BSD/macOS sed vs. GNU sed difference: when it encounters a byte sequence that is not valid UTF-8, macOS sed errors out, while GNU sed just passes it through.

Full build logs available at Homebrew: https://github.com/Homebrew/homebrew-core/pull/158555

Discussion

The Japanese documentation uses encoding EUC-JP. That makefile rule is converting it to UTF-8 for further processing in latex. So yeah, anything inside that makefile rule must necessarily accept non UTF-8 input - that's the entire point of it.

Does it help to add LC_ALL=C at the start of the sed command?

diff --git a/docs/Makefile.am b/docs/Makefile.am
index 85e82e563..a1e45b2e7 100644
--- a/docs/Makefile.am
+++ b/docs/Makefile.am
@@ -244,7 +244,7 @@ allterm-ja.h: $(CORETERM) $(LUA_HELP)
     $(AM_V_GEN) for e in `grep -E "^[   ]*START_HELP" $(JATERM) |\
          LC_ALL=C sort -f -t':' -k2` ; do \
       f=`echo $$e |cut -d\: -f1` ; s=`echo $$e | cut -d\: -f2` ;\
-      sed -n "/^[   ]*$$s/,/^[  ]*END_HELP/p" $$f ; \
+      LC_ALL=C sed -n "/^[  ]*$$s/,/^[  ]*END_HELP/p" $$f ; \
     done >allterm.tmp
     iconv -f EUC-JP -t UTF-8 < allterm.tmp >$@
     rm allterm.tmp

Ethan Merritt - 2024-01-25

status: open --> pending-fixed

Group: -->

Priority: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Selina Kovacek - 2024-04-08

Does adding "LC_ALL=C" at the start of the sed command have any impact on the conversion process?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2024-05-28

Status: pending-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Build fails on macOS due to invalid byte sequence in sed

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2676 Build fails on macOS due to invalid byte sequence in sed

Discussion