chThdCreateFromHeap() allocates the thread working area with
chHeapAllocAligned(), which aligns only the returned base address. The
function then computes wend as wbase + size, so a caller-provided size that is
not a multiple of PORT_STACK_ALIGN can produce a misaligned initial stack top.
Debug checks may catch this later, but unchecked builds can pass the
misaligned wend to PORT_SETUP_CONTEXT(). Round the requested heap thread size
up to PORT_STACK_ALIGN before allocation and before computing wend.