I was wondering if there was anyway to control the amount of threads created by the multiprocessor capable portions of BRL-CAD without having to recompile?
I ask because I am performing some research involving software performance on a cluster. Since BRL-CAD is only one of the many softwares to be run on this cluster, I initally chose Fedora Core with the openMosix kernel extension. The problem that presents itself with BRL-CAD is that since openMosix load balances jobs across a cluster (does not split up jobs), BRL-CAD only sees the cpu's available on the node the job is executed from and therefore only fires up (in the case of single core, single cpu nodes) a single thread. openMosix is unable to split threads, so the process never migrates to other nodes. If one were able to control the number of threads spawned by rt, rtcheck, etc, then BRL-CAD would be able to take full advantage of this type of cluster.
I guess this post turned into a question AND a suggestion!
Thanks in advance for any help,
Yes there is, though spawning *threads* won't help you much unless Mosix has changed drasticly since the last time I worked with it. As you noted, Mosix works with "jobs", i.e. processes, and process-level parallelism, which is not what you get with threading in general. If Mosix has changed and will migrate threads in addition to processes (which would be astounding for various technical reasons), then you would have to recompile to take advantage of this since by default the ray-trace library will not spawn more workers than there are available processing units.
That said, there is something that can be done without recompiling and something that is possible with a little bit of coding. First the compile-time option -- there is support in BRL-CAD's parallel processing library interface for kicking off forked jobs instead of using posix threads. The SGI interface uses sprocs for example, which forks processes instead of threads since they result in higher performance among other benefits. It conceivably wouldn't take much effort to add in support to use forked processes instead of threads.
The non-compile code option is to use remrt/rtsrv as those tools are designed in a client/server fashion to work over distributed computing and cluster environments.
Otherwise, for the other tracers, the -P option generally controls how many threads/processes are spawned. Without disabling a safeguard check and recompiling, that option will limit the -P processors so that they don't exceed the number available.
My apologies for the in consistencies in my terminology... too early in the morning and not enough caffeine. As for Mosix, its just a test platform and I intend to try out several different Single Server Image solutions. My goal is to take a cluster and make it emulate a 8+ processor SMP. Basically I want users to be able to use BRL-CAD on the master node in the cluster and have all processors in the cluster available to it. The way Mosix has reacted for me so far is that if a process forks into 1+ processes then both the parent and child processes are all qualified to be migrated to less loaded processors in the cluster.
Granted, I am talking pure theory here, but if I were able to use a -P6 on a machine that technically only has 1 (or 2) cores in it, then they should migrate across the cluster. Is there a serious danger is having more than one of the same process on the same CPU? (aka the period of time in my hypothetical situation after the process forked and before Mosix migrates some processes)
Again, the issue gets down to whether the app forks or uses threads, which are two entirely different parallel mechanisms. BRL-CAD can use either, though forking has never been used on a non-IRIX system as threads usually considerably outperform forking. Mosix, though, only distributes forked processes so this capability to fork instead of using pthreads would have to be added to BRL-CAD before such a behavior of using something like -P6 would be automatic. If it did work, however, the capability would extend to all of the BRL-CAD commands that are SMP-aware including the ray-tracers (rt, rtarea, rtcheck, rtweight, rtedge, etc) and tools like g_qa.
There's no real "danger" of having more than one of the same process on the same CPU. It's just incredibly inefficient if there is only one CPU or if Mosix was slow in it's ability to migrate processes quickly once they start. I'd imagine you'd want to start large forking sets incrementally to not swamp the initiating node. It'll definitely require some development and testing though. The -P option is only going to use threads as things currently stand implemented.
Good info, thanks a bunch!
Resurrecting an older thread, but where is a good tutorial/reference on the use of remrt and rtsrv? I have looked over the man pages and they are confusing/incomplete to say the least... any help?
Hm, good question. In general, it's been used "in house" by developers and power users that are familiar with its use. Probably the best documentation to date on the topic is the remrt manual page. From what I can tell, though, nobody got around to writing a rtsrv.1 manual page yet either so it might be missing some minor details about how to set up a distributed ray-trace.
The 2-cent introduction overview, if I can get this right hopefully, is that remrt is run on a controlling node where you specify what machines are allowed to connect, where the geometry is, what times you allow clients to connect, and whether to automatically invoke remote processing daemons for you automatically (via rsh only). The rtsrv binary is the remote processing daemon that connects from a remote host to the remrt process and requests/processes jobs.
The simple testing setup is to read remrt's manual page and run it somewhere. Remrt's options are pretty much the same as rt's options, just instead of ray-tracing it sits there waiting for remote connections. Once remrt is running, you can manually or automatically log into a remote host and run rtsrv, providing the hostname of where remrt is running. Repeat as needed for other remote hosts and they'll all begin processing jobs fed to them by remrt. If you were on an HPC asset or other cluster machine, there are remote node execution commands that you'd use to invoke the rtsrv processors.
It would be really cool if someone wrote up a nice brief tutorial on how to do this... :-) Hope that gives you an idea of where to begin using remrt, though.