Using Schily-tools 2019-03-29 on Linux (Ubuntu 18.04 LTS).
The following command:
pbosh -c 'v=foo eq== IFS==; echo A=A "$v"=\$bar'
Produces A=A foo $bar in [p]bosh, while all other shells I tested (bash, dash, ksh93, mksh, pdksh, OpenBSD sh, and others) produce A=A foo=$bar
The 2018 spec says (2.6.5):
"... the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result."
So it seems that all other shells interpret "results" as only the new text which came from replacing the $<thing> part in a word, while [p]bosh interprets it as words which had any expansions/substitutions in them.
I tend to think the same as other shells, as otherwise it means that IFS affects literal shell input, which I think should never happen.
Also, here's a slightly different command which shows another case and also the actual resulting fields (I don't know if the extra field should be considered a bug):
All shells I have except pdksh and [p]bosh produce:
pdksh:
[p]bosh:
Last edit: Avi Halachmi 2019-04-25
Hi,
bosh produces the same output as you get from ksh88 that has been used as the reference implementation for POSIX and ksh88 based POSIX platforms like Solaris pass the POSIX certification tests with that behavior.
Bourne Shell:
sh -c 'v=foo eq== IFS==; echo A=A "$v"=\$bar'
A A foo $bar
ksh88
ksh -c 'v=foo eq== IFS==; echo A=A "$v"=\$bar'
A=A foo $bar
bosh:
bosh -c 'v=foo eq== IFS==; echo A=A "$v"=\$bar'
A=A foo $bar
For your second example you get:
Bourne Shell:
[A][A][B][foo][$bar]
ksh88:
[A=A][B][][foo][$bar]
bosh:
[A=A][B][][foo][$bar]
Note that both ksh88 and bosh are based on a slightly modified Bourne Shell source.
The behavior of ksh93 ist most likely correct, but implementing the ksh93 behavior
requires a complete rewrite of the macro expansion code.
ksh88 and bosh modified the macro expansion code in a way that is small enough
to avoid bigger problems that result from a complete rewrite.
The behavior of ksh88 and bosh is to do field splitting in arguments that had a variable
expansion, while the ksh93 behavior is to do field splitting on characters that resulted
from a macro replacement.
Do you have a real world usage for that difference in behavior?
There is a plan to do a complete rewrite of the macro expansion for bosh, but
I did not yet find the time to do so. The background for the rewrite is that it would
make bosh the fastest shell if you use "configure" as the test scenario as I expect a
performance win of approx. 20%.
I don't.
Because I'm a relatively new to shell programming, I prefer to quote only whhere required, and if I'm not sure, then research and learn. I was evaluating to which extent the (my) paradigm of
eval $foo=\$baris susceptible to weird IFS values, and realized that [p]bosh behaves differently than other shells.To be on the safe side one should use
eval "$foo=\$bar", eventhough in all other current shells (and ksh93) it's enough to quote only$foo.Interesting. I'm generally interested in shell performance, and contributed some patches to the ffmpeg project which speed up its
configureconsiderably (it's a custom script, no autotools etc).FWIW, in my experience, the fastest shells are dash and in general ash based shells, and ksh93 to some extent (depending if it can apply its no-fork-subshell optimizations). Bash tends to be on the slow side, though there are slower. I didn't actually try to evaluate [p]bosh in terms of performance.
If you make dash POSIX compliant by adding multi byte character support, it would be the slowest shell ever. bosh is currently approx. 5% faster than dash even though it supports multi byte characters and 30% of the CPU time used by bosh is used for multi byte character handling.
The original ksh93 is approx. 10% faster than bosh but the RedHat variant already has been made slower than the original by replacing code from David Korn with what the Redhat people believe is "standard code".
I knew that dash doesn't do multibyte, but to be honest I didn't encounter scripts which require multibyte and performance together, though I'm sure there are such, and I think it's great to support it.
FWIW, I tested few shells on the same Ubuntu 18.04 LTS system (all built by me with default settings and fairly recent code base, except ksh93 which is an Ubuntu package binary), and used
<sh> ./configureat the ffmpeg source tree root. Here are my results:I know that busybox ash is ~5-10% slower than dash, and FreeBSD sh (running on FreeBSD) is roughly similar to dash.
lokshis supposed to be OpenBSD sh ported to linux ( https://github.com/dimkr/loksh ), but I don't know how it performs on an actual OpenBSD system.I didn't try to analyze where the time is spent within
configurewith each shell.Well, ffmpeg does not use autoconf but a hand written shell script that it not "compliant"
as it does not quote '^' (as required by the POSIX standard) and as it calls programs like sed
with non-standard options.
For this reason, it is not possible to check the ffmpeg script on an arbitrary certified UNIX
and the script seems to be an example how scripts should not be written.
Looking closer at the script, shows that this shell script causes the shell to spend most
of the time in macro expansion, which is untypical for average shell scripts. So this script
can be mainly seen as a testcase for macro expansion performance. This is where the
mentioned rewrite of bosh will happen, so a future version of bosh will be faster.
If you compare shells with autoconf based shell scripts, like the "configure" from the
schilytools, you get different results. Here is what I get on a Opteron based UNIX
system from 2006 (newer CPUs typicalls show less differences between the various
shells and since Linux does not implement a true
vfork(), bosh and ksh93 areslower on Linux than they are on a typical UNIX system).
Given that sh and obosh mainly differ in the fact that obosh uses a malloc() based string stack replacement, the performance difference is caused by that change.
If you compare obosh and bosh, you see the performance win from a better pipe construction method in the interpreter and in the performance win from using
vfork().The performance advantage with bosh compared to dash does not exist on Linux as linux does not come with a working
vfork(), there is a vfork emulation on Linux that just implements all pitfalls of vfork without implementing the advantages that come from the fact that vfork does not need to copy the address space description in the kernel. This makes vfork 3x faster than fork on a UNIX system.ksh88 and ksh93 differ in the fact that ksh93 implements virtual sub-shells and uses
vfork().mksh is interesting since it is the only shell that spends less than 28 seconds with system CPU time even though it does not use
vfork().As busybox is not portable, I could not test it on UNIX but on Linux it is faster than dash.
Note that you need to call:
CONFIG_SHELL=$shell $shell ./configurefor a correct test that always uses the shell under test.
Last edit: Jörg Schilling 2019-04-29
Thank you for the detailed information.
The only reference I could find in the spec to "^" or "circumflex" is that it's unspecified behavior to use it as negation in a character pattern, but I did not see such usage in ffmpeg's configure. I can't see it specified as a special char which needs to be quoted either. I can send patches if you show me an example where it's incorrectly unquoted.
I'm not familiar enough with sed to comment, but if you could point out an example for me then I can send a patch.
Judging by your earlier comments, I'm assuming "macro" means parameter/command substitutions? Anyway, nice to know that it will be improved.
Yeah, I'm familiar with it and its general not-up-to-par behavior.
Thanks for the enlightening information about where performance goes, and the vfork issue.
'^' was an alias for pipes in the 1970s to allow pipes on upper case only terminals. bosh still supports it as Solaris did come with
/bin/shbeing the classical Bourne Shell for a long time and as POSIX definitely does not make any assumptions on what shell/bin/shis. It can be disabled in bosh viaset -o posixand it is disabled by default inpbosh.We recently added '^' to the set of characters that need quoting in the POSIX standard because of some reported issues with globbing and pattern matching.
sed is called with
-Ebut there is more (e.g. calling grep with wrong options).Note that I did not check whether the options for grep may be valid in POSIX mode, but the script does not contain the standard calling procedure to go into POSIX mode which is to set up
PATHfrom the results ofgetconf PATHfollowed by callingsh. For a portable script it is not easy to be compilant, as e.g. Mac OS does not mention how to get into POSIX mode and as you should not assume compliance with the most recent POSIX standard.Oracle Solaris 11.4 is e.g. the only UNIX certified for the most recent POSIX standard but it will not be able to run the ffmpeg script.
The term macro expansion is defined in the POSIX standard and is related to sell variable expansion with additional features such as
${var-value}.I'm unable to find any public reference to either of those. Neither in what I think is the latest version: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html nor otherwise with google. The only pages I get for "posix macro" are related to the C preprocessor.
I believe you that configure won't work on Solaris. The closest I could put my hands on is OpenIndiana (SunOS Release 5.11), which appears to be using AT&T ksh93t+ (2010) as its default
sh, where "^" is not interpreted as a pipe (ffmpeg is available for OpenIndiana, but I haven't looked at any patches they may be applying).I guess. I don't mind spending some time helping to make configure more compliant, especially with the shell code parts (in contrast to sed/grep/etc compliant usage, which I'm less familiar with), but I probably won't do that without seeing a spec which mentions it ("^").
Anyway, I know and understand that ffmpeg's configure is different than autoconf configure, and that their performance depends on different shell features.
Sorry, macro is the internal name in the Bourne Shell source, see macro.c POSIX calls it parameter expansion.
Then also look at: http://austingroupbugs.net/view.php?id=1190 and
http://austingroupbugs.net/view.php?id=1191#c4278
ksh is no problem on Solaris, the problem is that ffmpegs script is non-portable:
... and if you would remove that, it would complain about
grep -qbecause it does not set up a PATH to use the POSIX variants of the binaries. Note that/usr/bin/grepis callled instead of/usr/xpg4/bin/grep.Also note:
This is not a broken shell, but just a non-POSIX Bourne Shell acting on a non-portable script.
The problem with that script seems to be that it assumes Linux behavior on all possible platforms.
Last edit: Jörg Schilling 2019-04-29
Thanks. It is quite recent (few weeks ago). I guess it should make it into the next version of the spec.
As far as I can tell it does try to find a sutable shell, and i know for a fact that it works out of the box also on the different BSDs, AIX, and OSX.
Apparently it fails on some cases where it could do better, though because I don't have access to such systems (practically only Solaris detivatives?), I can't try it out.
Though as I mentioned, I saw that OpenIndiana does have an ffmpeg package. I don't know if they applies some patches or whether it configures/builds out of the box.
Anyway, if you can afford to send patches to ffmpeg to make configure more compliant, I'm sure no one would object.
Thanks again for your time, patience and good info.
The hint to quote ^ is recent, but it is just a bugfix to the POSIX standard.
Thanks for the hint on finding a suitable shell. Sometimes it is a good idea to recheck things in a real window with more lines. I first checked the code at weekend on a laptop and did not see the check for /usr/xpg4/bin. So it seems that some time ago, the authors of ffmpeg did care about the standard and switched to the POSIX environment and my fear that
grep -qcould fail later is not correct.Now I would guess that unless you use bosh, there is only one problem with
sed -Ein the script and the problem withboshcould not be known by them since in former times, Solaris did always come with a true old Bourne Shell in/bin/shwhich is 100% compatible tooboshand the check for POSIX (ksh88) enhanced parameter susbstitution with${foo%%bar}did work on previous Solaris versions.Now the Bourne Shell has been enhanced to support POSIX but in a way that does not break old existing scripts and OpenSolaris may have
boshinstalled as/bin/sh. For the related problem, it would help to quote'^'.Since ^ is quoted in most of the cases already (except for line 4381, 4401, 4521, 6513), it would be simple to add quoting for that too after
sed -Ehas been removed.As far as I can tell,
sed -Eis only used only in one line, and only if$target_osisdarwin, which is OS-specific and apparently works (even if not robust, e.g. cross-compiling to darwin). I don't think there's any urgency in changing it, and personally I don't know how to change extended regex to standard one before further research on my part.So basically the only thing preventing it from being reasonably compliant (to the extent which you noticed so far), including with
bosh, is quoting "^"?My configure file has different line numbers (I guess we look at different versions - I'm looking at ffmpeg master git repository), but I do see the 3 first cases which clearly need quoting in bosh, though the 4th is a here document which doesn't need it as far as I know.
I can confirm that configure with bosh failed before quoting the first 3 instances, and succeeds after quoting them. As expected, configure with pbosh succeeds also with them unquoted. On both success cases I also compared the output of configure with other shells.
There are two diffs of the same "class" (SAMPLES and TARGET_SAMPLES in
ffbuild/config.mak), and I think it's a bug with [p]bosh. The following:prints
[$y]in all shells, except [p]bosh which print[\$y].Other than these two diffs, all the output files are identical as with dash or ksh93.
I'll try to find the time to send a patch to ffmpeg which quotes these 3 instances of "^".
Last edit: Avi Halachmi 2019-04-30
I get two
sed -Eoccurrences:If you know a fix for that -E problem, I could test it on Solaris.
Regarding your parameter expansion issue, I would need some time to investigate in that.
I am not sure whether there is a bug in
bosh, since using${x:-$\y}prints
[$\y]with all shells.setting up
R=\$ybefore and then use:${x:-$R}prints
[$y]The second of those is the
darwinone. The first was fixed 5 months ago here https://github.com/FFmpeg/FFmpeg/commit/2f6b1806 .So configure in ffmpeg git master should not try to use sed -E anywhere except on Darwin/OSX.
As for the here-document backslash, I can only say that all other shells I have agree between themselves and think differently than [p]bosh. On the face of it I think they're right, and I don't think your example demonstrates an inconsistency, but there could always be some edge cases which I didn't interpret correctly.
Last edit: Avi Halachmi 2019-04-30
The -E fix seems to work on the first view, but causes a "Terminated" message. I checked the whole with
truss -fand it seems this is from a ffmpeg program/tmp/ffconf.XXl7aO2F/testwhatever this is...it is not a bosh problem. Bosh just reports it in contrary to bash.
BTW: neither ksh88 nor ksh93 work with that on Solaris:
so bosh seems to be better ;-)
Given that both:
echo \$yand
echo $\yprints
$ywith any shell, I would expect similar orthogonal behavior in the parameter expansion from your example as well. Do you have a portion in the POSIX standard that requires non-orthogonal behavior in the case you reported?Last edit: Jörg Schilling 2019-04-30
Is there a system I can install in virtualbox and is similar to yours? Would OpenIndiana suffice?
Yes, because your examples are outside of double quote, so in
\$ythe backslash removes the special meaning of$, and in$\yit's simply removed, but then not parameter-expanded because it's not a$<valid-thing-to-expand>(quote removal happens after parameter expansion). it's the same asprintf %s%s $ y. So they end up the same but due to different reasons.According to posix, in an "expanding" here-document, a backslash behaves like a backslash inside double quotes.
Last edit: Avi Halachmi 2019-04-30
If you make sure that
/usr/gnu/bin/is not in front of the PATH, it should work on OpenIndiana. There is a problem:sedwas closed source as a collaborative delevopment with IBM and OpenIndiana does not have the Solaris sed but rather the FreeBSD sed that supports -E.So if you like to check for the possible problems on a certified POSIX issue 7 tc2 UNIX, you should try to fetch the free version of Oracle Solaris-11.4.
SchilliX may be available in a newer variant late this year.
Your claim with double quote like behavior for here documents looks reasonable. I'll investigate whether I could change the behavior for bosh.
For reference, in this command:
Except for the final argument, all shells agree on:
While for the final argument, some shells think it should print
[6$\x], while other shells print[6$x]. (EDITED, accidentally replaced bosh with posh. see actual output below).The here-document case of ffmpeg's configure is most similar to
5"${e:-\$x}"where all shells, including [p]bosh, agree on5$xwhen inside double quotes, but [p]bosh interprets it differently than others while inside a here-document.Here's how they split:
shcmpis https://github.com/avih/shcmpLast edit: Avi Halachmi 2019-04-30
Slightly off topic:
Is multibyte relevant anywhere other than to calculate
${#foo}, while matching?in a pattern outside of a bracket expression, and while matching chars in a pattern inside bracket expression (both literals and classes)?I can't think of other places where it would matter for anything...
If you are on a UTF-8 based locale, try:
There are three arguments for echo (looks like two spaces between the visible words) the middle argument is empty.
If you are in a UTF-8 terminal this works to compare:
Try this with any shell that could get a UNIX branding and you get only one space between J and rg.
There are many more cases where it is important that the shell knows the margins of true characters (not bytes).
Thanks. Right,
IFSis also char-based so it's affected as well.So
${#foo},IFS,?wildcard in a pattern, and inside pattern bracket expression, and that's it?Last edit: Avi Halachmi 2019-05-21