We have been evaluating rp-l2tp for use in an embedded (slow) Linux platform.
We have been stress-testing it by creating/data-xfer/destroying tunnels.
After alot of tunnel creates/destroys, pppd sometimes fails with:
ppp-c2811.log:Failed to open /dev/pts/3: No such file or directory
We applied all the existing rp-l2tp patches and the problem persisted.
By searching the Internet we found that the SourceForge "Poptop" project
(an open source implementation of a PPTP server) was reporting the same
failure of pppd in their TODO list (v1.27). They applied a workaround
in version 1.18 of their file pptpctrl.c. Their "fix" was to wait for
data to arrive from the pty after starting pppd.
Applying this "fix" to rp-l2tp appeared to eliminate the "Failed
to open" problem and tunnel usage was more reliable.
Instrumenting their "fix" indicates that the select() would occasionally
fail with "Interrupted system call" (but the tunnel would continue to
operate normally). The "fix" seems to be actually working by functioning
as a "sacrifical system call". The select() would take the "hit" of a
signal instead of some nearby system call that wasn't properly
checking for EINTR and retrying. This seems to indicate that there is
a race between some (unknown: SIGCHLD?) signal and the code that follows
after the pty creation and the exec of pppd.
Note that none of the system calls (close(), dup2(), fcntl(),...) in /handlers
are protected by a loop that checks for EINTR failure. Any of the
system calls could fail with EINTR and not perform the desired action.
We have tried instrumenting all of these system calls to test this
theory but the only place we seem to reliably get EINTR is in the "fix".
Since the "fix" seems to eliminate the "Failed to open..." problem for us,
we are stopping investigations...if someone finds the root-cause, please
post the real fix.
Attached is a patch to sync-pppd.c that originated from the
poptop project.
Patch for sync-pppd.c