From: Tapani T. <tt...@it...> - 2003-05-02 07:52:00
|
Before more radical changes, I have a small bug report and fix suggestion on amavis-milter. I had amavisd crash on me yesterday rather spectacularly, first time with amavisd-0.1, and having just been reading amavis-milter.c I tracked it down (I think). The key hint was this log file entry: "Failed to create temp dir : No such file or directory" Looking at the code this was obviously result of mktemp() failure. I'm not sure if this is HP-UX specific, but mktemp(3) man page says that mktemp() replaces the Xs with a letter and the current process ID. ... mktemp() returns its argument except when it runs out of letters, in which case the result is a pointer to the empty string "". which means it cannot create more than 26 names for one process. Thus a burst of messages can easily cause amavis-milter to run out of temporary directory names. In itself that would not be fatal, but there's also a bug that causes directories occasionally to be left in place after they're no longer needed, and eventually they will run out. (In systems with different mktemp() this might simply result in unnecessary directories accumulating without critical failures.) The critical code is here: ------------------------------------------------------------------ static sfsistat mlfi_cleanup(SMFICTX *ctx, bool ok) { sfsistat rstat = SMFIS_CONTINUE; struct mlfiPriv *priv = MLFIPRIV; char *p; char host[512]; if (!priv) return rstat; /* close the archive file */ if (priv->mlfi_fp != NULL && fclose(priv->mlfi_fp) == EOF) { /* failed; we have to wait until later */ rstat = SMFIS_TEMPFAIL; (void) unlink(priv->mlfi_fname); } else if (ok) { /* add a header to the message announcing our presence */ if (gethostname(host, sizeof(host) - 1) < 0) strlcpy(host, "localhost", sizeof host); p = strrchr(priv->mlfi_fname, '/'); if (p == NULL) p = priv->mlfi_fname; else p++; } else { /* message was aborted -- delete the archive file */ (void) unlink(priv->mlfi_fname); *(strrchr(priv->mlfi_fname, '/')) = 0; rmdir(priv->mlfi_fname); } /* return status */ mydebug(DBG_INFO, "cleanup called"); return clearpriv(ctx, rstat, 0); } ------------------------------------------------------------------ If mlfi_cleanup is called while the file is open, it will close and unlink it but leave the directory intact. If the file is closed at the time, directory is duly removed. It is only called in two situations: (1) When writing the body fails (e.g., disk full). In this case it is always open, but it should be rare. (2) When the message is aborted. In this case the file can be open or not, depending on when it happens. It seems to me there's a race condition in (2): if a message is aborted after mlfi_body() but before mlfi_eom(), the file will be open and consequently the directory will be left in place. That also explains why directories are left in place rather unpredictably. Anyway, if my analysis is correct, the critical fix is to add directory removal after the first unlink() call above. The attached patch should do it. (BTW, as far as I can see the "else if (ok)..." part does nothing useful whatsoever and could be removed.) Replacing the crummy mktemp() call with something more useful would also be worthwhile - although just undefining HAVE_MKTEMP when compiling in systems with such mktemp() might be enough. Incidentally, I'd also suggest having amavis-milter log its messages via syslog rather than to a special file, but that's no big thing. -- Tapani Tarvainen |