I'm working on assumptions here so beg me pardon if I got it wrong; the SDM codec runs thru an external Intel DLL which does the encoding/decoding.
The affinity seems to be set free, I guess it's simply instantiated.
My proposal would be to implement CPUSet parsing and controlled affinity during instantiation.
The problem I have, guess also suffered by many others as I can see from the complaints about it, is the high CPU usage.
In my case is even exaggerated cause I have an AMD CPU, not that old and quite powerful it's a 5950X with 16c/32t; seems the CPU usage on Intel CPU is much lower, about 50% less.
On my CPU transcoding to DSD265 needs 30-40% of 2 cores (I guess it's one thread for audio channel) and DSD512 needs 60-70% of 2 cores.
This means there's not enough overhead for DSD512; high load on the CPU by other foreground tasks will crash the transcoding.
The affinity, at least in my case, seems to go to the first 2 cores which is already an issue; the threads are bouncing back and forth the first 4 threads; they do not stick to a specific thread for the 1st or 2nd core.
They jump from one thread or the other, probably moved by the windows scheduler based on load predictions.
This is already a problem by itself as it can trigger issues like desyncing or else.
Since I use my PC and it's not a dedicated audio player it can be tricky to listen to music while using it.
In general it would be really helpful to parse CPUSet and control affinity to support new CPUs which are heavily multithreaded and non heterogeneous.
It's not terribly difficult to implement, you can have a look at this patch on llama.cpp which I'll have to resume working on some day (it's terribly difficult to merge on llama.cpp);
https://github.com/mann1x/llama.cpp/tree/mannix-win32-cpuset
Basically once CPUSet is parsed you can set the affinity considering:
- Windows scheduler cores priority order
- Filter and prioritize for L3 cache (control of scheduling on AMD CCDs)
- Filter of efficiency cores (Intel E-Cores and AMD c cores)
That depends of course on if you can set the affinity.
Additionally, if possible I'd like to know which SDM order is used to transcode to , 4th or 5th or?
Anonymous
The DSD Processor plugin is built with Intel C++ compiler and IPP library and it has no dependencies on other Intel libraries. Actually, plugin performance can be improved by pipelining each audio channel in 2 or 3 stages. Now it's the only one thread for each channel. SDM type orders: A - 4, B - 5, C - 3, D - 5, E - 5.
Okay, so my assumption was bad of course :)
Awesome thanks for the info.
Threading in 2 or 3 stages would be a godsend.
Do you think you could implement CPUSet for the affinity?
Is there any way I can help if needed?