I am working on a parallel rendering project based on chromium. Our goal is to build a real time and multi-project system which can work well on various projected background. Briefly speaking, we load the source image and transmit it from one client where the app faker runs and process the image on several servers. Now we have implemented several algorithms within the SPU. 
However, to improve the performance, we have tried many ways and finally we transplant the algorithm from CPU to GPU and the result is promising. Actually, we can get more than 50 fps on a single PC with Geforce 8800 now. However, when we start to combine PCs using the local network, the fps drop down dramatically. It fell to nearly 24 fps when combine only two servers and one client. 
I'm sure it's due to the network performance and I tend to reconstruct the relative module of chromium. I found that the tilesort spu synchronizely swaps the buffer and every times not until all the servers have returned the application will be blocked. I wanna make the network image transfering and the image processing asynchronizely. Nevertheless, this is still a tough work for me. The only thing I know is that I can make a change upon the tilesort spu but I'm still not sure. Could anyone provide me some advice or if you have already make any contribution to this issue, would you mind sharing some resource with me. Thanks a lot.