libaco
A blazing fast and lightweight C asymmetric coroutine library
...It is extremely memory-efficient, 10,000,000 coroutines simultaneously to run cost only 2.8 GB physical memory (run with tcmalloc, each coroutine has a 120B copy-stack size configuration). The phrase "fastest" in above means the fastest context switching implementation which complies to the Sys V ABI of Intel386 or AMD64.