#17 Toc differs if utf8 locale - missing backslash in sed expr


In locale LANG=en_US.UTF-8 (e.g.) file names with non-ascii characters are reported by tar as octal escapes.

Those file names are reported as errors when comparing the file list with the archive toc.

The sed expression that aims at replacing octal escapes with question marks fails because it needs four backslashes where it has only two. The reason is that the string is first assigned to a shell variable. At that point the string is part of a double-quoted string:

FILTER_UNIFY_NAME="sed 's#\\[0-7]\{3\}#?#g; s#[^a-zA-Z0-9_ .$%:~/=+\#\-]#?#g'"

In double-quoted strings two backslashes become one, and the shell store only one backslash in the contents of variable FILTER_UNIFY_NAME.

At this point, the single quotes are just characters in the interior of the string value. The single quotes only take effect when the variable is used in 'eval' statements. At that point one backslash has already been lost.

To have 'sed' see both backslashes, there must be four of them in the script:

FILTER_UNIFY_NAME="sed 's#\\\\[0-7]\{3\}#?#g; s#[^a-zA-Z0-9_ .$%:~/=+\#\-]#?#g'"


  • Enrique Perez-Terron

    Patch adding missing backslash to FILTER_UNIFY_NAME

  • Gundolf Kiefer

    Gundolf Kiefer - 2009-12-13
    • status: open --> closed-fixed
  • Gundolf Kiefer

    Gundolf Kiefer - 2009-12-13

    fixed in 1.5.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks