libaco
A blazing fast and lightweight C asymmetric coroutine library
...The benchmark part shows that a context switch between coroutines only takes about 10 ns (in the case of the standalone stack) on the AWS c5d.large machine. User could choose to create a new coroutine with a standalone stack or with a shared stack (could be shared with others). It is extremely memory-efficient, 10,000,000 coroutines simultaneously to run cost only 2.8 GB physical memory (run with tcmalloc, each coroutine has a 120B copy-stack size configuration). The phrase "fastest" in above means the fastest context switching implementation which complies to the Sys V ABI of Intel386 or AMD64.