From: Sander v. D. <sgv...@gm...> - 2011-03-02 17:41:18
|
Hey guys, Another update: currently, with the ODE version I put now at my page [1], and current SVN head of rcssserver3d, I am able to run 9 vs 9 in real time on my i7 740QM laptop, with the agents and monitor also running on the same machine. Having said that: - This is when the number of collisions is not too high. Because of the extra constraints and the fact that they result in agents being put into 1 one island, collisions are even worse in multithreaded version. The select-and-move functionality of RoboViz comes in quite handy here, but it's still a pain. - Even worse: with many collisions ODE crashes. This seems to be a memory issue, probably because ODE allocates everything on the stack and runs out of space there. Indeed, configuring ODE with -DdUSE_MALLOC_FOR_ALLOCA seems to prevent this issue by using the heap instead. However, using the stack frequently is suboptimal for multithreading, it might be worth it looking into TBB's options to increase thread stack size. - Multithreading in Simspark is still sometimes unstable, s with 18 agents, eemingly on the communication/predicate generation/parsing side. Looking into that (after the rather enjoying 18 robot wrestling match currently going on on my screen ;) Sander On Wed, Feb 23, 2011 at 7:43 PM, Sander van Dijk <sgv...@gm...>wrote: > Hey all, > > I have worked on multithreading ODE. First of all thanks for sending me > Hesham Ebrahimi's work. He took basically the same approach as I had, so it > confirmed my ideas and it helped identifying and verifying problematic > places. I also adopted his use if Intel's Thread Building Blocks library. It > is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran > into at my first attempt. My implementation is slightly different, to make > sure that all tasks are created before spawning any of them, which can be an > issue when small tasks (like stepping the ball) finish fast. > > I have uploaded the current result at[1]. Using it I got similar results as > Hesham with the same test he, i.e. significantly less cpu cycles spent doing > the world stepping. However, although I double checked each part multiple > times, the physics seem a little less stable. During most runs it is fine, > but at some, especially with a lot of agents, one blows up and the server > crashes. I am still digging into it, It is hard to reproduce, and not sure > yet if the multi threading makes it worse, but if any of you want to do some > test runs and see how it works for you, that may help. Especially if it does > blow up and you see a pattern in when it happens. > > Thanks all, > > Sander > > [1] http://homepages.feis.herts.ac.uk/~sv08aav/ode-0.11.1-tbb.tar.gz > > > PS don't forget to install tbb dev packages (though configure should warn > you about that too) and remake and install simspark > PS2 gmail won't let me tar.gz, that's why it's targz > > On Wed, Feb 23, 2011 at 7:20 PM, Sander van Dijk <sgv...@gm...>wrote: > >> Hey all, >> >> I have worked on multithreading ODE. First of all thanks for sending me >> Hesham Ebrahimi's work. He took basically the same approach as I had, so it >> confirmed my ideas and it helped identifying and verifying problematic >> places. I also adopted his use if Intel's Thread Building Blocks library. It >> is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran >> into at my first attempt. My implementation is slightly different, to make >> sure that all tasks are created before spawning any of them, which can be an >> issue when small tasks (like stepping the ball) finish fast. >> >> I have attached the current result here. Using it I got similar results as >> Hesham with the same test he, i.e. significantly less cpu cycles spent doing >> the world stepping. However, although I double checked each part multiple >> times, the physics seem a little less stable. During most runs it is fine, >> but at some, especially with a lot of agents, one blows up and the server >> crashes. I am still digging into it, It is hard to reproduce, and not sure >> yet if the multi threading makes it worse, but if any of you want to do some >> test runs and see how it works for you, that may help. Especially if it does >> blow up and you see a pattern in when it happens. >> >> Thanks all, >> >> Sander >> >> PS don't forget to install tbb dev packages (though configure should warn >> you about that too) and remake and install simspark >> PS2 gmail won't let me tar.gz, that's why it's targz >> >> >> On Sun, Feb 20, 2011 at 3:28 AM, Hedayat Vatankhah <hed...@gm...>wrote: >> >>> Hi, >>> >>> On ۱۱/۰۲/۲۰ 09:06, Sander van Dijk wrote: >>> >>> .... >>> >>> Yes, good point. I will make sure to record all test details. For now: >>> I am mostly using valgrind (with the callgrind and helgrind tools in >>> specific). I first tried gprof, but then everything was very unstable, but >>> maybe that's helped with the current fixes. >>> >>> Thanks. >>> >>> >>> >>> ...Also, Andreas Seekircher has reported his experience with >>>> multi-threaded which also confirms that multi-threaded mode is faster, but >>>> also he has faced a problem which deserves some attention: >>>> >>>> However there was a strange behavior, that the simulation was running >>>> quite fast on my laptop with up to 9 agents and the simulator was using more >>>> than one core. When I started the 10th agent it was getting much slower and >>>> it seemed that the simulator was then using only one core (it was then again >>>> the same speed like without multi-threading). This happened with 4 cores. On >>>> a dual core system already the 5th agent slowed down the simulation... Is >>>> this a known issue? >>>> >>>> I guess that in this situation, ODE is the main bottleneck. But that's >>>> just a guess. >>>> >>> >>> Yes, Andreas notified me of that, too. This happens when agents and >>> server are run on the same machine with multiple cores.I have found that at >>> some point, the system's scheduler assigns an agent to the same core as the >>> server (even though in practice there is still room on another core), so the >>> server can't run at full speed. With taskset(1) it is possible to explicitly >>> set a process' CPU affinity, and by starting the agents with e.g. 'taskset >>> 2 ./start.sh localhost' the server is able to take up 100% again. >>> >>> Great, thanks for the info. We might be able to design a more general >>> framework for running agents using Linux CGroups and maybe taking advantage >>> of perf tools. But I should investigate more before being able to comment >>> more on this issue. >>> >>> >>> >>> ... >>>> >>>> Great. But it'd be probably nice if we parallelize the collision >>>> detection too. Specially, it's computation time will increase considerably >>>> when two or more robots collide (fortunately the new referee doesn't allow >>>> many robots to collide at the same place, but with more players it is more >>>> likely that we'll have collisions in different part of the field). >>>> >>> >>> From what I can tell so far it seems that the main part of the speed >>> reduction due to collisions is not caused by the collision detection, but by >>> the fact that it adds many new constraints to the LCP problem that ODE >>> solves to step the physics. However, I still have to make a team of robots >>> that just run into each other to be able to say that with certainty. And you >>> are right, it would be nice in any case ;) >>> >>> Thanks for the clarification. Yes IIRC solving the LCP problem was a real >>> bottleneck. If it doesn't worth the effort, we might skip this part for now. >>> >>> >>> >>> So far about what I am doing. Now, I would also like something from >>>> you guys ;-) First of all, give the new stuff I committed a good test. >>>> Behaviour of the simulator should still be the same, but it could be that I >>>> missed something and that timing of messages is slightly different, breaking >>>> agents. Also, give the multi threaded mode a good test, see if you can make >>>> it crash. And, finally, I will be working full time on the simulator for 1 >>>> 1/2 months more, if you think there is anything that I may be able to >>>> squeeze in there, do let me know! >>>> >>>> Certainly! :) >>>> I noticed something in your recent commit: you've removed the ugly >>>> busy-waiting loop in SimControlThread, but wouldn't it result in a faster >>>> simulation when ODE has not much work to do? The loop was there to make sure >>>> that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it >>>> is now possible for a cycle to finish too soon. >>>> >>> >>> I think you refer to: >>> >>> if (isInputControl) >>> { >>> while (int(mSumDeltaTime*100) < int(mSimStep*100)) >>> controlNode->StartCycle(); // advance the time >>> } >>> >>> ? The only use I saw of that was to keep updating the InputControl to >>> get messages from the monitor while the physics are updated. The time check >>> is there to stop doing this when the physics are done. Without this check, >>> the InputControl (and the other controls, AgentControl and MonitorControl) >>> wait for the physics to be done anyway at the next barrier, so the cycle >>> can't be finished too soon. And this loop caused the most problems with >>> multi threading, because it allowed the scene graph to be changed while the >>> physics were still running on it. >>> >>> Yes, I was referring to this loop. Unfortunately InputControl doesn't >>> exactly do what it is supposed to. Beside handing input, it also functions >>> as the simulator timer, which is very ambiguous (and I'm going to replace it >>> with another timer implementation). This loop is not intended to receive >>> messages from the monitor (you'll see the same loop in >>> SimulationServer::Cycle() which is run in single threaded mode); it is a >>> busy-waiting loop to make sure that a cycle will not last less than >>> mSimStep. On of the functions which InputControl::StartCycle does is to >>> inspect SDL's timer and call SimulationServer::AdvanceTime(), which in turn >>> updates mSumDeltaTime. Yes, doesn't look good and was really problematic for >>> me to understand what happens completely :P >>> >>> Personally, I'm planning to remove the use of SDL timer altogether and >>> switch to using Boost's timing facilities, which is much more cleaner and >>> also doesn't need a busy-waiting loop. >>> >>> Thanks, >>> Hedayat >>> >>> >>> >>> >>>> There are some collaboration opportunities with your work and what I'm >>>> planning to do, but I'll talk about them in a separate email soon. >>>> >>> >>> Looking forward to it :) >>> >>> >>>> >>>> Thanks, >>>> Hedayat >>>> >>> -- >>> Adaptive Systems Research Group >>> Department of Computer Science >>> University of Hertfordshire >>> United Kingdom >>> >>> >> >> >> -- >> Adaptive Systems Research Group >> Department of Computer Science >> University of Hertfordshire >> United Kingdom >> > > > > -- > Adaptive Systems Research Group > Department of Computer Science > University of Hertfordshire > United Kingdom > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |