[Firebird-devel] Bad errors - failed system calls and low memory conditions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

All,

In addition to the previous letter and CORE-1242, there are two more 
notes re bad errors.

1. Some system_call_failed::raise() things should probably be converted 
to abort() as there is normally no safe recovery path from failed mutex 
release or failed free() call, for example. GLIBC tends to ignore failed 
critical syscalls, for example:
static void  munmap_chunk(mchunkptr p) {
  ...
  int ret __attribute__ ((unused)) = munmap((char *)block, total_size);

  /* munmap returns non-zero on failure */
  assert(ret == 0);
}

I think logging a message and abort() is better in such or similar 
situation. If exception is raised, engine is likely to hang, crash or 
consume all the memory later, which will result in condition that is 
extremely difficult to debug based on coredumps only.

2. Replacing std::bad_alloc with Firebird::BadAlloc has an effect that 
compiler is no longer aware of low memory condition. C++ compiler by 
standard can rely on this particular exception type for handling low 
memory condition.

But I do not think it matters anyway, engine doesn't handle critically 
low memory conditions well either. Recovery paths for failed allocations 
are not tested and thus not reliable. Even more, when a process eats too 
much memory via moderate size increments the OS kernel (such as in 
Linux) can kill the process via OOM killer before the process receives 
mmap syscall failures. Or even more often, user kills it earlier because 
the engine goes down due to excessive slowdown and swapping.

I guess the proper way to handle low memory conditions would be to 
follow OS design, set (A) low memory condition soft watermark and 
whenever watermark is reached kill the attachment (or shutdown database, 
if database pool is the problem) that consumed so much memory. Also 
listen to OS LMS signals and process memory trashing stats to determine 
proper "default" soft watermark location and prevent the process from 
being killed by OS. We may also set (B) hard memory usage watermark and 
kill the process if is reached, preventing server from going down due to 
trashing, but instead let it restart nicely.

And (C) if memory allocation syscall fails despite all attempts to free 
memory, kill the process rather than risk to have future unstable 
behavior. I think Yaffil did it this way. Points (A), (B), (C) are just 
thoughts, no immediate plans to implement the thing, yet.

Alex, what do think? Other opinions?

Nikolay

[Firebird-devel] Bad errors - failed system calls and low memory conditions

A powerful, cross platform, SQL database system

[Firebird-devel] Bad errors - failed system calls and low memory conditions