I am working on an MPI Implementation for exciting. So far the scl is implemented to use k-point parallelism where possible. Speedups are at  0.6-0.8 per processor. Relaxation should in principle work but I did not test the position broadcast very well. One thing with mpi is that you get nothing for free. Such every single task that may call the gndstate routine has to be made parallel explicitly. Namely the data consistency of the processes has to be managed manually. So it is not realistic to expect a mpi version that supports mpi in all of its parts.  The expense to expand the parallelism however may be acceptable. I would appreciate any help in pointing out features that would improve much from mpi parallelization or are of current interest to somebody. Im not at a point where I want to release things but if you are interested in trying it out anyway -- contact me