I have ported my mpeg1 decoding routines to a dowitchers filter. But its performance is poor compared to my stand alone solution and I can't seem to find the cause. After all, both are pure C# code.
The dowitchers filter version has an extra colorspace transform (yv12 -> yuy2) and it's still poorly written; so let's add a 20% overhead. Yet, my original code ran around 30-40% of cpu usage for my test files. The dowitchers version used 90-95%, so I can't account for at least 35% and all of it is performing privileged instruction (ie kernel mode).
Any ideas where this overhead might come from?
I haven't looked at the mpeg splitting code, I just grab the payload.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I thought it was the problem cause originally, I didn't use it. I changed it to use the samples pool and it didn't fix the problem.
Then I thought it might had to do with the main thread being STA, so I used the dump filter for the renderer (passing a texture or surface between the streaming and application thread would have made for a good culprit) but it still not the problem.
The thing I can't explain, at the moment, is the perf counters "cache faults/sec" and "page faults/sec" are much higher than my original code (~50 and 70 in the dowitchers filter vs ~5 and 15, in my original code, for the same test file).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I might have been chasing a ghost afterall. I was comparing the dowitchers filter implementation with my original code. But now, I ran the dotwitchers testapp and I noticed that the "% privileged time" is also ~30-40% for avis with xdiv.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have ported my mpeg1 decoding routines to a dowitchers filter. But its performance is poor compared to my stand alone solution and I can't seem to find the cause. After all, both are pure C# code.
The dowitchers filter version has an extra colorspace transform (yv12 -> yuy2) and it's still poorly written; so let's add a 20% overhead. Yet, my original code ran around 30-40% of cpu usage for my test files. The dowitchers version used 90-95%, so I can't account for at least 35% and all of it is performing privileged instruction (ie kernel mode).
Any ideas where this overhead might come from?
I haven't looked at the mpeg splitting code, I just grab the payload.
Maybe you are just reallocating too much memory for the output. Are you using MemorySamplePool?
I thought it was the problem cause originally, I didn't use it. I changed it to use the samples pool and it didn't fix the problem.
Then I thought it might had to do with the main thread being STA, so I used the dump filter for the renderer (passing a texture or surface between the streaming and application thread would have made for a good culprit) but it still not the problem.
The thing I can't explain, at the moment, is the perf counters "cache faults/sec" and "page faults/sec" are much higher than my original code (~50 and 70 in the dowitchers filter vs ~5 and 15, in my original code, for the same test file).
I might have been chasing a ghost afterall. I was comparing the dowitchers filter implementation with my original code. But now, I ran the dotwitchers testapp and I noticed that the "% privileged time" is also ~30-40% for avis with xdiv.
WaitForVerticalBlank looks an expensive call (at least on my poor vid card).