I haven't had much time to look further into the code however thought I would provide a minor update.  The race condition appears to be related to argb overlay processing - even if the argb buffer is not used.  If I comment out the call in video_out/video_out_vdpau.c:vdpau_overlay_end to vdpau_process_argb_ovls then I don't seem to have the memory problems.  If the overlay is being freed after the overlay blending is done but before the overlay_end routine is called then perhaps this would explain the problem but I cannot yet confirm this is the case.  If the osd is explicitly hidden before being freed rather than being hidden during the osd_free call I suspect the delay may be enough on some machines to skip over the problem.  I will investigate further if I get a chance during the week.