Menu

#2676 Build fails on macOS due to invalid byte sequence in sed

None
closed-fixed
nobody
None
2024-05-28
2023-12-30
Anonymous
No

Build of 6.0.0 fails on macOS with:

  for e in `grep -E "^[     ]*START_HELP" ja/term/ai.trm ja/term/aquaterm.trm ja/term/be.trm ja/term/block.trm ja/term/caca.trm ja/term/cairo.trm ja/term/canvas.trm ja/term/cgm.trm ja/term/context.trm ja/term/debug.trm ja/term/djsvga.trm ja/term/dumb.trm ja/term/dxf.trm ja/term/emf.trm ja/term/epson.trm ja/term/estimate.trm ja/term/fig.trm ja/term/gd.trm ja/term/gpic.trm ja/term/grass.trm ja/term/hpgeneric.h ja/term/hpgl.trm ja/term/imagen.trm ja/term/linux-vgagl.trm ja/term/latex_old.h ja/term/lua.trm ja/term/pbm.trm ja/term/pict2e.trm ja/term/pm.trm ja/term/post.trm ja/term/pslatex.trm ja/term/pstricks.trm ja/term/qt.trm ja/term/regis.trm ja/term/svg.trm ja/term/t410x.trm ja/term/tek.trm ja/term/texdraw.trm ja/term/tgif.trm ja/term/tkcanvas.trm ja/term/webp.trm ja/term/win.trm ja/term/wxt.trm ja/term/x11.trm ja/term/xlib.trm |\
         LC_ALL=C sort -f -t':' -k2` ; do \
      f=`echo $e |cut -d\: -f1` ; s=`echo $e | cut -d\: -f2` ;\
      sed -n "/^[   ]*$s/,/^[   ]*END_HELP/p" $f ; \
    done >allterm.tmp
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  sed: RE error: illegal byte sequence
  []

This is typically a BSD/macOS sed vs. GNU sed difference: when it encounters a byte sequence that is not valid UTF-8, macOS sed errors out, while GNU sed just passes it through.

Full build logs available at Homebrew: https://github.com/Homebrew/homebrew-core/pull/158555

Discussion

  • Ethan Merritt

    Ethan Merritt - 2023-12-30

    The Japanese documentation uses encoding EUC-JP. That makefile rule is converting it to UTF-8 for further processing in latex. So yeah, anything inside that makefile rule must necessarily accept non UTF-8 input - that's the entire point of it.

    Does it help to add LC_ALL=C at the start of the sed command?

    diff --git a/docs/Makefile.am b/docs/Makefile.am
    index 85e82e563..a1e45b2e7 100644
    --- a/docs/Makefile.am
    +++ b/docs/Makefile.am
    @@ -244,7 +244,7 @@ allterm-ja.h: $(CORETERM) $(LUA_HELP)
         $(AM_V_GEN) for e in `grep -E "^[   ]*START_HELP" $(JATERM) |\
              LC_ALL=C sort -f -t':' -k2` ; do \
           f=`echo $$e |cut -d\: -f1` ; s=`echo $$e | cut -d\: -f2` ;\
    -      sed -n "/^[   ]*$$s/,/^[  ]*END_HELP/p" $$f ; \
    +      LC_ALL=C sed -n "/^[  ]*$$s/,/^[  ]*END_HELP/p" $$f ; \
         done >allterm.tmp
         iconv -f EUC-JP -t UTF-8 < allterm.tmp >$@
         rm allterm.tmp
    
     
  • Ethan Merritt

    Ethan Merritt - 2024-01-25
    • status: open --> pending-fixed
    • Group: -->
    • Priority: -->
     
  • Selina Kovacek

    Selina Kovacek - 2024-04-08

    Does adding "LC_ALL=C" at the start of the sed command have any impact on the conversion process?

     
  • Ethan Merritt

    Ethan Merritt - 2024-05-28
    • Status: pending-fixed --> closed-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.