From: Nikolay S. <nik...@re...> - 2007-05-03 18:08:26
|
All, In addition to the previous letter and CORE-1242, there are two more notes re bad errors. 1. Some system_call_failed::raise() things should probably be converted to abort() as there is normally no safe recovery path from failed mutex release or failed free() call, for example. GLIBC tends to ignore failed critical syscalls, for example: static void munmap_chunk(mchunkptr p) { ... int ret __attribute__ ((unused)) = munmap((char *)block, total_size); /* munmap returns non-zero on failure */ assert(ret == 0); } I think logging a message and abort() is better in such or similar situation. If exception is raised, engine is likely to hang, crash or consume all the memory later, which will result in condition that is extremely difficult to debug based on coredumps only. 2. Replacing std::bad_alloc with Firebird::BadAlloc has an effect that compiler is no longer aware of low memory condition. C++ compiler by standard can rely on this particular exception type for handling low memory condition. But I do not think it matters anyway, engine doesn't handle critically low memory conditions well either. Recovery paths for failed allocations are not tested and thus not reliable. Even more, when a process eats too much memory via moderate size increments the OS kernel (such as in Linux) can kill the process via OOM killer before the process receives mmap syscall failures. Or even more often, user kills it earlier because the engine goes down due to excessive slowdown and swapping. I guess the proper way to handle low memory conditions would be to follow OS design, set (A) low memory condition soft watermark and whenever watermark is reached kill the attachment (or shutdown database, if database pool is the problem) that consumed so much memory. Also listen to OS LMS signals and process memory trashing stats to determine proper "default" soft watermark location and prevent the process from being killed by OS. We may also set (B) hard memory usage watermark and kill the process if is reached, preventing server from going down due to trashing, but instead let it restart nicely. And (C) if memory allocation syscall fails despite all attempts to free memory, kill the process rather than risk to have future unstable behavior. I think Yaffil did it this way. Points (A), (B), (C) are just thoughts, no immediate plans to implement the thing, yet. Alex, what do think? Other opinions? Nikolay |