From: Franz B. <de...@ka...> - 2010-10-01 14:04:35
|
Hello there, I've sent a previous mail already with an earlier version of the patch. Maybe someone still monitors the list... I got some of my movies denoised pretty nicely using yuvdenoise. The quality is ok, my only concern was the performance. Attached is a patch which contains SSE2-accelerated versions of the (non MC-) functions for temporal and spatial filtering (which I mainly use). Additionally I've reenabled the shortcircuiting of temporal_filter_planes_MC, otherwise it fails with divide by zero when using level 0 (e.g. to only filter the luma plane). The temporal filter function now processes a block of 14 pixels and the spatial function does 4 pixels at a time, using the effect that adjacent pixels share many of their neighbours that now only need to be examined once. Other than that, the computations are duplicated from the original functions. With this revised patch, temporal filtering runs about 8 times as fast and spatial filtering 4 to 5 times with this patch, at least on 64-bit machines; tested on a recent Xeon and Opteron processor. Now more than realtime-filtering is possible on my machine. A 32-bit Pentium M doesn't seem to like my SSE2-version of the spatial filtering, I may however find the time to optimize the function a bit further. For now it is not enabled on non-x86-64 processors. Since there are some rounding issues to be taken into account here, I've added a define "OLD_ROUNDING", which produces the exact same output as the old filter (I've checked the denoise-results for errors using md5sum). If it's not set, the filtering should be a bit faster and more accurate, since floating point rounding towards nearest is used then. Thank you all for your work on mjpegtools, I hope you may find the patch worth incorporating. Regards, Franz Brauße |