Hello. Would you consider turning the program into a threaded solution? For example, make it use a thread pool with one image per thread or one algorithm per thread (the latter would be the preferred option as it would affect one file compressions as well). This would speed the process up considerably.
Yes it is planned since long time ago, and some progress has been made at source level, you can take a look to:
https://sourceforge.net/p/nikkhokkho/code/335/
https://sourceforge.net/p/nikkhokkho/code/337/
https://sourceforge.net/p/nikkhokkho/code/345/
https://sourceforge.net/p/nikkhokkho/code/348/
Unfortunately VCL is not multi-threaded, which implies a major rewrite on the GUI logic to be able to deal with it. It is not a small task at all.
As a temporary measure, can you dispatch compression jobs to a "worker process" that will handle the threading / multiprocess? I don't know what it would require to work on Windows, but I make heavy use of GNU Parallel from the command line when I'm just trying to run a lot of files through zopflipng.
Parallel provides regular feedback on jobs completed / remaining, estimated ETA, and so forth. It doesn't need to understand the particular sort of jobs it's executing; it simply invokes whatever third-party processes it's told to.
This wouldn't be as nice as a truly multithreaded / multiprocess UI and backend job handler, but it would ensure that multicore machines are taken full—or nearly full—advantage of.
This is why I spend a lot of time compressing things on remote machines instead of using FileOptimizer, in fact.