From: Sander v. D. <sgv...@gm...> - 2011-02-18 20:23:31
|
Hello MC, As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing. The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work: * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster. * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does. However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful. So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know! Cheers, Sander [1] http://www.cs.cmu.edu/~mpa/ode/ -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Hedayat V. <hed...@gm...> - 2011-02-18 21:09:05
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Sander,<br> Thanks for the report. :)<br> I'll post the real reply soon (with some details about what should be done for RoboCup 2011 in MC). <br> Just something about ODE islands: I might be wrong but IIRC if two agents collide with each other, they'll become one island until they are completely separated again. As I said, I'm in doubt but it'll probably need some consideration if I'm right.<br> <br> Good luck,<br> Hedayat<br> <br> <span> <style type="text/css">blockquote {color: navy !important; background-color: RGB(245,245,245) !important; padding: 0 15 10 15 !important; margin: 15 0 0 0; border-left: #1010ff 2px solid;} blockquote blockquote {color: maroon !important; background-color: RGB(235,235,235) !important; border-left-color:maroon !important} blockquote blockquote blockquote {color: green !important; background-color: RGB(225,225,225) !important; border-left-color:teal !important} blockquote blockquote blockquote blockquote {color: purple !important; background-color: RGB(215,215,215) !important; border-left-color: purple !important} blockquote blockquote blockquote blockquote blockquote {color: teal !important; background-color: RGB(205,205,205) !important; border-left-color: green !important}</style><i><b>Sander van Dijk <a class="moz-txt-link-rfc2396E" href="mailto:sgv...@gm..."><sgv...@gm...></a></b></i> wrote on 02/18/2011 11:53:24 PM +0350:</span><br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite">Hello MC,<br> <br> As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing.<br> <br> The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work:<br> <br> * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster.<br> <br> * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does.<br> <br> However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful.<br> <br> So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know!<br> <br> Cheers,<br> Sander<br> <br> [1] <a moz-do-not-send="true" href="http://www.cs.cmu.edu/%7Empa/ode/">http://www.cs.cmu.edu/~mpa/ode/</a><br> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> <pre wrap=""> <fieldset class="mimeAttachmentHeader"></fieldset> ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. <a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/intel-dev2devfeb">http://p.sf.net/sfu/intel-dev2devfeb</a></pre> <pre wrap=""> <fieldset class="mimeAttachmentHeader"></fieldset> _______________________________________________ Simspark Generic Physical MAS Simulator simspark-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:sim...@li...">sim...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/simspark-devel">https://lists.sourceforge.net/lists/listinfo/simspark-devel</a> </pre> </blockquote> </body> </html> |
From: Sander v. D. <sgv...@gm...> - 2011-02-18 21:22:14
|
Hey Hedayat, Looking forward to your real reply ;-) Thanks for your point! It is indeed true that they will become a single island then. However, this doesn't change the problem much. At every physics step the islands are redetermined, after which they can be spread over different threads. So the only situation where there is no advantage is when everything is in a single island (note that this is not the case because all agents are connected through the ground, the ground is disabled and therefore not included in islands), which is not likely to happen. What could perhaps use some consideration is that because of this islands may not have the same size and that one can do some load balancing to distribute them over different threads correctly. But let's first see if it can work at all :) Sander On Fri, Feb 18, 2011 at 4:08 PM, Hedayat Vatankhah <hed...@gm...>wrote: > Hi Sander, > Thanks for the report. :) > I'll post the real reply soon (with some details about what should be done > for RoboCup 2011 in MC). > Just something about ODE islands: I might be wrong but IIRC if two agents > collide with each other, they'll become one island until they are completely > separated again. As I said, I'm in doubt but it'll probably need some > consideration if I'm right. > > Good luck, > Hedayat > > *Sander van Dijk <sgv...@gm...> <sgv...@gm...>* wrote on > 02/18/2011 11:53:24 PM +0350: > > Hello MC, > > As you may know, the RC federation put out a call for proposals for > projects, among others to work on the competition infrastructure. I sent in > a proposal together with Ubbo Visser, which got accepted. So now I am with > Ubbo to work on simspark for 2 months, and I would like to keep you up to > date with what we're doing. > > The main aim of our project is to make the simulator usable for bulk > training. One part of that means debugging the simulator and make it faster > and more stable, the other part is to make some external tools. On the > second point we are still working out the details, but on the first part I > did some work: > > * Did a lot of profiling, which a.o. showed that the server spent more than > 10% of the time on dynamic casting alone. This was mostly because of > continuous searches for nodes in the scene tree. I have put in some caching > here to alleviate it, reducing the time spent casting to 1%. This extra 10% > has now gone to ODE. I still have to create some performance tests to see if > this made stuff faster. > > * Multi threading mode is fixed (but see below). Although at first I was > doubtful of whether the current way it is done would help, it should, > because now the second and third most costly things, gathering perception > data (20%) and gathering monitor data (8%) can now be done in parallel. > However, while running a 6vs6 game there is not a real noticeable speed-up. > But again, I still have to do proper performance tests to see what it does. > > However, the biggest opportunity to optimise is ODE, which now eats up > 67-70% of all computation time. There was a project at CMU in 2007 to > parallelize ODE [1], where they made the collision detection parallel. > Profiling shows however that this will not help: collision detection takes > up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. > Luckily, ODE already splits this work into different parts, updating > 'islands' seperately, where in our case each island is one agent. I am now > working on parallelising this, and if that works we can in theory cut up the > 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, > hopefully making having 8 cores actually useful. > > So far about what I am doing. Now, I would also like something from you > guys ;-) First of all, give the new stuff I committed a good test. Behaviour > of the simulator should still be the same, but it could be that I missed > something and that timing of messages is slightly different, breaking > agents. Also, give the multi threaded mode a good test, see if you can make > it crash. And, finally, I will be working full time on the simulator for 1 > 1/2 months more, if you think there is anything that I may be able to > squeeze in there, do let me know! > > Cheers, > Sander > > [1] http://www.cs.cmu.edu/~mpa/ode/ > > -- > Adaptive Systems Research Group > Department of Computer Science > University of Hertfordshire > United Kingdom > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance.http://p.sf.net/sfu/intel-dev2devfeb > > > _______________________________________________ > Simspark Generic Physical MAS Simulator > simspark-devel mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/simspark-devel > > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Hedayat V. <hed...@gm...> - 2011-02-18 21:26:53
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> <br> <br> <span> <style type="text/css">blockquote {color: navy !important; background-color: RGB(245,245,245) !important; padding: 0 15 10 15 !important; margin: 15 0 0 0; border-left: #1010ff 2px solid;} blockquote blockquote {color: maroon !important; background-color: RGB(235,235,235) !important; border-left-color:maroon !important} blockquote blockquote blockquote {color: green !important; background-color: RGB(225,225,225) !important; border-left-color:teal !important} blockquote blockquote blockquote blockquote {color: purple !important; background-color: RGB(215,215,215) !important; border-left-color: purple !important} blockquote blockquote blockquote blockquote blockquote {color: teal !important; background-color: RGB(205,205,205) !important; border-left-color: green !important}</style><i><b>Sander van Dijk <a class="moz-txt-link-rfc2396E" href="mailto:sgv...@gm..."><sgv...@gm...></a></b></i> wrote on 02/19/2011 12:52:05 AM +0350:</span><br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AANLkTi=ODZ...@ma..." type="cite">Hey Hedayat,<br> <br> Looking forward to your real reply ;-)<br> <br> Thanks for your point! It is indeed true that they will become a single island then. However, this doesn't change the problem much. At every physics step the islands are redetermined, after which they can be spread over different threads. So the only situation where there is no advantage is when everything is in a single island (note that this is not the case because all agents are connected through the ground, the ground is disabled and therefore not included in islands), which is not likely to happen. What could perhaps use some consideration is that because of this islands may not have the same size and that one can do some load balancing to distribute them over different threads correctly. But let's first see if it can work at all :)<br> </blockquote> :) Yes, I didn't want to stand against processing separating islands in parallel; just wanted to help a little if possible (e.g. to prevent an assumption that every agent is a separate island in implementation).<br> In fact, I'm really in favor of processing collision detection of separate islands in parallel, which is also supported by ODE. <br> <br> Thanks,<br> Hedayat<br> <br> <br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AANLkTi=ODZ...@ma..." type="cite"> <br> Sander<br> <br> <div class="gmail_quote">On Fri, Feb 18, 2011 at 4:08 PM, Hedayat Vatankhah <span dir="ltr"><<a moz-do-not-send="true" href="mailto:hed...@gm...">hed...@gm...</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> Hi Sander,<br> Thanks for the report.� :)<br> I'll post the real reply soon (with some details about what should be done for RoboCup 2011 in MC). <br> Just something about ODE islands: I might be wrong but IIRC if two agents collide with each other, they'll become one island until they are completely separated again. As I said, I'm in doubt but it'll probably need some consideration if I'm right.<br> <br> Good luck,<br> Hedayat<br> <br> <span> <i><b>Sander van Dijk <a moz-do-not-send="true" href="mailto:sgv...@gm..." target="_blank"><sgv...@gm...></a></b></i> wrote on 02/18/2011 11:53:24 PM +0350:</span><br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" type="cite"> <div> <div class="h5">Hello MC,<br> <br> As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing.<br> <br> The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work:<br> <br> * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster.<br> <br> * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does.<br> <br> However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful.<br> <br> So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know!<br> <br> Cheers,<br> Sander<br> <br> [1] <a moz-do-not-send="true" href="http://www.cs.cmu.edu/%7Empa/ode/" target="_blank">http://www.cs.cmu.edu/~mpa/ode/</a><br> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </div> </div> <pre><fieldset></fieldset> ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. <a moz-do-not-send="true" href="http://p.sf.net/sfu/intel-dev2devfeb" target="_blank">http://p.sf.net/sfu/intel-dev2devfeb</a></pre> <pre><fieldset></fieldset> _______________________________________________ Simspark Generic Physical MAS Simulator simspark-devel mailing list <a moz-do-not-send="true" href="mailto:sim...@li..." target="_blank">sim...@li...</a> <a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/simspark-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/simspark-devel</a> </pre> </blockquote> </div> </blockquote> </div> <br> <br clear="all"> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </blockquote> </body> </html> |
From: Joschka B. <jos...@am...> - 2011-02-19 05:39:36
|
Hey Sander, On Feb 19, 2011, at 5:23 AM, Sander van Dijk wrote: > Hello MC, > > As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing. > > The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work: > First of all: great to hear you're working on these things! Congrats on the accepted project proposal :-) Very exciting! > * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster. Nice :-) > > * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does. Very cool. > > However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful. > Long time ago, Hesham Ebrahimi worked on parallelizing parts of ODE as team member of the MC. He used the Intel Threading Building Blocks (TBB) library back then, but unfortunately, I'm not exactly sure anymore what happened to that code :-( Anyways, I remember that there's a book on Intel TBB published by O'Reilly (author is James Reinders) which he used. This has parallelization of ODE as an example towards the end of the book. See if you can get a hold of that it, it might be a big help (at least that last part on ODE). Otherwise, I would suggest considering the integration of the Bullet physics engine [1] as an alternative to ODE. Bullet has more features, a very active development team, and increasing support for massively parallel computation for collision detection and for the solver using OpenCL. This looks like the better alternative for the future to me. The project I did with Andreas to move ODE to a plugin and enable other physics engines was meant to be the first part for a later Bullet integration, but unfortunately, nobody had time to work on that yet. Give Bullet integration some thought :-) I don't have any data at hand, but I'm pretty convinced it will lead to much better performance, especially when supported by GPU computation and multiple CPUs. Keep us updated on your progress and all the best, Joschka [1] http://bulletphysics.org/ |
From: Joschka B. <jos...@am...> - 2011-02-19 06:08:30
|
One thing I forgot: have you tested how running agents as plugins in the simulator, like the example in rcssserver3d/plugin/soccer/agentintegration, improves performance (only for training/learning, not in competitions, of course)? Would be interesting to have actual measurements on that. Cheers, Joschka P.S.: did I mention Bullet integration would be a great project? ;-) On Feb 19, 2011, at 2:23 PM, Joschka Boedecker wrote: > Hey Sander, > > On Feb 19, 2011, at 5:23 AM, Sander van Dijk wrote: > >> Hello MC, >> >> As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing. >> >> The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work: >> > > First of all: great to hear you're working on these things! Congrats to you and Ubbo on the accepted project proposal :-) Very exciting! > >> * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster. > > Nice :-) > >> >> * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does. > > Very cool. > >> >> However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful. >> > > Long time ago, Hesham Ebrahimi worked on parallelizing parts of ODE as team member of the MC. He used the Intel Threading Building Blocks (TBB) library back then, but unfortunately, I'm not exactly sure anymore what happened to that code :-( Anyways, I remember that there's a book on Intel TBB published by O'Reilly (author is James Reinders) which he used. This has parallelization of ODE as an example towards the end of the book. See if you can get a hold of that it, it might be a big help (at least that last part on ODE). > > Otherwise, I would suggest considering the integration of the Bullet physics engine [1] as an alternative to ODE. Bullet has more features, a very active development team, and increasing support for massively parallel computation for collision detection and for the solver using OpenCL. This looks like the better alternative for the future to me. > > The project I did with Andreas to move ODE to a plugin and enable other physics engines was meant to be the first part for a later Bullet integration, but unfortunately, nobody had time to work on that yet. Give Bullet integration some thought :-) I don't have any data at hand, but I'm pretty convinced it will lead to much better performance, especially when supported by GPU computation and multiple CPUs. > > Keep us updated on your progress and all the best, > Joschka > > [1] http://bulletphysics.org/ > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Simspark Generic Physical MAS Simulator > simspark-devel mailing list > sim...@li... > https://lists.sourceforge.net/lists/listinfo/simspark-devel |
From: Sander v. D. <sgv...@gm...> - 2011-02-19 16:51:35
|
Hey Joschka, Thanks for your reply. I didn't know about TBB, I will have a look into it. I found a partly copy of the book online, but of course the part about ODE is missing :) I will see if I can find it. Regarding Bullet, I totally agree that for the future that is probably the way forward. However, we hope to have these improvements ready for Istanbul, and preferably for the German and Iran Open too. Using a different physics engine will most likely change the dynamics, having most teams probably have to recreate/relearn their movements. It would not be good to throw that onto them on such short notice. Probably the best time to make this switch is when we use new models/heterogeneous agents, when everybody has to start over anyway. So that's why I am focusing on ODE now, the dynamics should stay the same and it seems like there are relatively straightforward ways of parallelising it. However, that is not a argument against already making a start with Bullet now, and I have started taking a look at the API, to see how well it fits the interfaces we now have after Andreas' very useful work. About the integrated agents, that is a good idea! I will try to include those in my tests. Thanks again, Sander On Sat, Feb 19, 2011 at 6:08 AM, Joschka Boedecker < jos...@am...> wrote: > One thing I forgot: have you tested how running agents as plugins in the > simulator, like the example in rcssserver3d/plugin/soccer/agentintegration, > improves performance (only for training/learning, not in competitions, of > course)? Would be interesting to have actual measurements on that. > > Cheers, > Joschka > > P.S.: did I mention Bullet integration would be a great project? ;-) > > On Feb 19, 2011, at 2:23 PM, Joschka Boedecker wrote: > > > Hey Sander, > > > > On Feb 19, 2011, at 5:23 AM, Sander van Dijk wrote: > > > >> Hello MC, > >> > >> As you may know, the RC federation put out a call for proposals for > projects, among others to work on the competition infrastructure. I sent in > a proposal together with Ubbo Visser, which got accepted. So now I am with > Ubbo to work on simspark for 2 months, and I would like to keep you up to > date with what we're doing. > >> > >> The main aim of our project is to make the simulator usable for bulk > training. One part of that means debugging the simulator and make it faster > and more stable, the other part is to make some external tools. On the > second point we are still working out the details, but on the first part I > did some work: > >> > > > > First of all: great to hear you're working on these things! Congrats to > you and Ubbo on the accepted project proposal :-) Very exciting! > > > >> * Did a lot of profiling, which a.o. showed that the server spent more > than 10% of the time on dynamic casting alone. This was mostly because of > continuous searches for nodes in the scene tree. I have put in some caching > here to alleviate it, reducing the time spent casting to 1%. This extra 10% > has now gone to ODE. I still have to create some performance tests to see if > this made stuff faster. > > > > Nice :-) > > > >> > >> * Multi threading mode is fixed (but see below). Although at first I was > doubtful of whether the current way it is done would help, it should, > because now the second and third most costly things, gathering perception > data (20%) and gathering monitor data (8%) can now be done in parallel. > However, while running a 6vs6 game there is not a real noticeable speed-up. > But again, I still have to do proper performance tests to see what it does. > > > > Very cool. > > > >> > >> However, the biggest opportunity to optimise is ODE, which now eats up > 67-70% of all computation time. There was a project at CMU in 2007 to > parallelize ODE [1], where they made the collision detection parallel. > Profiling shows however that this will not help: collision detection takes > up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. > Luckily, ODE already splits this work into different parts, updating > 'islands' seperately, where in our case each island is one agent. I am now > working on parallelising this, and if that works we can in theory cut up the > 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, > hopefully making having 8 cores actually useful. > >> > > > > Long time ago, Hesham Ebrahimi worked on parallelizing parts of ODE as > team member of the MC. He used the Intel Threading Building Blocks (TBB) > library back then, but unfortunately, I'm not exactly sure anymore what > happened to that code :-( Anyways, I remember that there's a book on Intel > TBB published by O'Reilly (author is James Reinders) which he used. This has > parallelization of ODE as an example towards the end of the book. See if you > can get a hold of that it, it might be a big help (at least that last part > on ODE). > > > > Otherwise, I would suggest considering the integration of the Bullet > physics engine [1] as an alternative to ODE. Bullet has more features, a > very active development team, and increasing support for massively parallel > computation for collision detection and for the solver using OpenCL. This > looks like the better alternative for the future to me. > > > > The project I did with Andreas to move ODE to a plugin and enable other > physics engines was meant to be the first part for a later Bullet > integration, but unfortunately, nobody had time to work on that yet. Give > Bullet integration some thought :-) I don't have any data at hand, but I'm > pretty convinced it will lead to much better performance, especially when > supported by GPU computation and multiple CPUs. > > > > Keep us updated on your progress and all the best, > > Joschka > > > > [1] http://bulletphysics.org/ > > > ------------------------------------------------------------------------------ > > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > > Pinpoint memory and threading errors before they happen. > > Find and fix more than 250 security defects in the development cycle. > > Locate bottlenecks in serial and parallel code that limit performance. > > http://p.sf.net/sfu/intel-dev2devfeb > > _______________________________________________ > > Simspark Generic Physical MAS Simulator > > simspark-devel mailing list > > sim...@li... > > https://lists.sourceforge.net/lists/listinfo/simspark-devel > > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Hedayat V. <hed...@gm...> - 2011-02-20 00:35:34
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Sander!<br> (my reply with more time to write! :P)<br> <br> On ۱۱/۰۲/۱۸ 11:53, Sander van Dijk wrote: <blockquote cite="mid:AAN...@ma..." type="cite">Hello MC,<br> <br> As you may know, the RC federation put out a call for proposals for projects, among others to work on the competition infrastructure. I sent in a proposal together with Ubbo Visser, which got accepted. So now I am with Ubbo to work on simspark for 2 months, and I would like to keep you up to date with what we're doing.<br> </blockquote> Thanks for letting us informed, and very happy to hear that it is going to happen in these 2 months. :)<br> <br> <blockquote cite="mid:AAN...@ma..." type="cite"> <br> The main aim of our project is to make the simulator usable for bulk training. One part of that means debugging the simulator and make it faster and more stable, the other part is to make some external tools. On the second point we are still working out the details, but on the first part I did some work:<br> <br> * Did a lot of profiling, which a.o. showed that the server spent more than 10% of the time on dynamic casting alone. This was mostly because of continuous searches for nodes in the scene tree. I have put in some caching here to alleviate it, reducing the time spent casting to 1%. This extra 10% has now gone to ODE. I still have to create some performance tests to see if this made stuff faster.<br> </blockquote> That's great. It'd be also nice if you specify which tools do you use for profiling for the record. IIRC, previously we had some inconsistent profiling results based on the tools we used; so it might be helpful to know the tools beside the results. Also, it might be helpful if you provide the complete results about the most time consuming parts with more details. Maybe there are people who'll work on some other time consuming parts. :)<br> <br> <br> <blockquote cite="mid:AAN...@ma..." type="cite"> <br> * Multi threading mode is fixed (but see below). Although at first I was doubtful of whether the current way it is done would help, it should, because now the second and third most costly things, gathering perception data (20%) and gathering monitor data (8%) can now be done in parallel. However, while running a 6vs6 game there is not a real noticeable speed-up. But again, I still have to do proper performance tests to see what it does.<br> </blockquote> Thanks for the fix. <br> We've not conducted any benchmarks to find the difference when multi-threading is enabled and when it is disabled; but our experience at IranOpen 2010 and apparently GermanOpen 2010 experience have shown that multi-threaded mode is practically faster.<br> Also, Andreas Seekircher has reported his experience with multi-threaded which also confirms that multi-threaded mode is faster, but also he has faced a problem which deserves some attention:<br> <blockquote type="cite">However there was a strange behavior, that the simulation was running quite fast on my laptop with up to 9 agents and the simulator was using more than one core. When I started the 10th agent it was getting much slower and it seemed that the simulator was then using only one core (it was then again the same speed like without multi-threading). This happened with 4 cores. On a dual core system already the 5th agent slowed down the simulation... Is this a known issue? </blockquote> I guess that in this situation, ODE is the main bottleneck. But that's just a guess.<br> <br> <br> <blockquote cite="mid:AAN...@ma..." type="cite"> <br> However, the biggest opportunity to optimise is ODE, which now eats up 67-70% of all computation time. There was a project at CMU in 2007 to parallelize ODE [1], where they made the collision detection parallel. Profiling shows however that this will not help: collision detection takes up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. Luckily, ODE already splits this work into different parts, updating 'islands' seperately, where in our case each island is one agent. I am now working on parallelising this, and if that works we can in theory cut up the 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, hopefully making having 8 cores actually useful.<br> </blockquote> Great. But it'd be probably nice if we parallelize the collision detection too. Specially, it's computation time will increase considerably when two or more robots collide (fortunately the new referee doesn't allow many robots to collide at the same place, but with more players it is more likely that we'll have collisions in different part of the field). <br> <br> <br> <br> <blockquote cite="mid:AAN...@ma..." type="cite"> <br> So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know!<br> </blockquote> Certainly! :) <br> I noticed something in your recent commit: you've removed the ugly busy-waiting loop in SimControlThread, but wouldn't it result in a faster simulation when ODE has not much work to do? The loop was there to make sure that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it is now possible for a cycle to finish too soon. <br> <br> There are some collaboration opportunities with your work and what I'm planning to do, but I'll talk about them in a separate email soon.<br> <br> Thanks,<br> Hedayat<br> <blockquote cite="mid:AAN...@ma..." type="cite"> <br> Cheers,<br> Sander<br> <br> [1] <a moz-do-not-send="true" href="http://www.cs.cmu.edu/%7Empa/ode/">http://www.cs.cmu.edu/~mpa/ode/</a><br> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> <br> </blockquote> </body> </html> |
From: Sander v. D. <sgv...@gm...> - 2011-02-20 05:36:55
|
Hey, On Sat, Feb 19, 2011 at 7:34 PM, Hedayat Vatankhah <hed...@gm...>wrote: > Hi Sander! > (my reply with more time to write! :P) > Great, thanks :) > > > On ۱۱/۰۲/۱۸ 11:53, Sander van Dijk wrote: > > Hello MC, > > As you may know, the RC federation put out a call for proposals for > projects, among others to work on the competition infrastructure. I sent in > a proposal together with Ubbo Visser, which got accepted. So now I am with > Ubbo to work on simspark for 2 months, and I would like to keep you up to > date with what we're doing. > > Thanks for letting us informed, and very happy to hear that it is going to > happen in these 2 months. :) > > > > The main aim of our project is to make the simulator usable for bulk > training. One part of that means debugging the simulator and make it faster > and more stable, the other part is to make some external tools. On the > second point we are still working out the details, but on the first part I > did some work: > > * Did a lot of profiling, which a.o. showed that the server spent more than > 10% of the time on dynamic casting alone. This was mostly because of > continuous searches for nodes in the scene tree. I have put in some caching > here to alleviate it, reducing the time spent casting to 1%. This extra 10% > has now gone to ODE. I still have to create some performance tests to see if > this made stuff faster. > > That's great. It'd be also nice if you specify which tools do you use for > profiling for the record. IIRC, previously we had some inconsistent > profiling results based on the tools we used; so it might be helpful to know > the tools beside the results. Also, it might be helpful if you provide the > complete results about the most time consuming parts with more details. > Maybe there are people who'll work on some other time consuming parts. :) > Yes, good point. I will make sure to record all test details. For now: I am mostly using valgrind (with the callgrind and helgrind tools in specific). I first tried gprof, but then everything was very unstable, but maybe that's helped with the current fixes. > * Multi threading mode is fixed (but see below). Although at first I was > doubtful of whether the current way it is done would help, it should, > because now the second and third most costly things, gathering perception > data (20%) and gathering monitor data (8%) can now be done in parallel. > However, while running a 6vs6 game there is not a real noticeable speed-up. > But again, I still have to do proper performance tests to see what it does. > > Thanks for the fix. > We've not conducted any benchmarks to find the difference when > multi-threading is enabled and when it is disabled; but our experience at > IranOpen 2010 and apparently GermanOpen 2010 experience have shown that > multi-threaded mode is practically faster. > Also, Andreas Seekircher has reported his experience with multi-threaded > which also confirms that multi-threaded mode is faster, but also he has > faced a problem which deserves some attention: > > However there was a strange behavior, that the simulation was running quite > fast on my laptop with up to 9 agents and the simulator was using more than > one core. When I started the 10th agent it was getting much slower and it > seemed that the simulator was then using only one core (it was then again > the same speed like without multi-threading). This happened with 4 cores. On > a dual core system already the 5th agent slowed down the simulation... Is > this a known issue? > > I guess that in this situation, ODE is the main bottleneck. But that's just > a guess. > Yes, Andreas notified me of that, too. This happens when agents and server are run on the same machine with multiple cores.I have found that at some point, the system's scheduler assigns an agent to the same core as the server (even though in practice there is still room on another core), so the server can't run at full speed. With taskset(1) it is possible to explicitly set a process' CPU affinity, and by starting the agents with e.g. 'taskset 2 ./start.sh localhost' the server is able to take up 100% again. However, the biggest opportunity to optimise is ODE, which now eats up > 67-70% of all computation time. There was a project at CMU in 2007 to > parallelize ODE [1], where they made the collision detection parallel. > Profiling shows however that this will not help: collision detection takes > up 0.45% in rcssserver3d. What is expensive for us, is stepping the physics. > Luckily, ODE already splits this work into different parts, updating > 'islands' seperately, where in our case each island is one agent. I am now > working on parallelising this, and if that works we can in theory cut up the > 67% CPU time into 12/18 parts (4vs6/9vs9) that can be run in parallel, > hopefully making having 8 cores actually useful. > > Great. But it'd be probably nice if we parallelize the collision detection > too. Specially, it's computation time will increase considerably when two or > more robots collide (fortunately the new referee doesn't allow many robots > to collide at the same place, but with more players it is more likely that > we'll have collisions in different part of the field). > >From what I can tell so far it seems that the main part of the speed reduction due to collisions is not caused by the collision detection, but by the fact that it adds many new constraints to the LCP problem that ODE solves to step the physics. However, I still have to make a team of robots that just run into each other to be able to say that with certainty. And you are right, it would be nice in any case ;) So far about what I am doing. Now, I would also like something from you guys > ;-) First of all, give the new stuff I committed a good test. Behaviour of > the simulator should still be the same, but it could be that I missed > something and that timing of messages is slightly different, breaking > agents. Also, give the multi threaded mode a good test, see if you can make > it crash. And, finally, I will be working full time on the simulator for 1 > 1/2 months more, if you think there is anything that I may be able to > squeeze in there, do let me know! > > Certainly! :) > I noticed something in your recent commit: you've removed the ugly > busy-waiting loop in SimControlThread, but wouldn't it result in a faster > simulation when ODE has not much work to do? The loop was there to make sure > that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it > is now possible for a cycle to finish too soon. > I think you refer to: if (isInputControl) { while (int(mSumDeltaTime*100) < int(mSimStep*100)) controlNode->StartCycle(); // advance the time } ? The only use I saw of that was to keep updating the InputControl to get messages from the monitor while the physics are updated. The time check is there to stop doing this when the physics are done. Without this check, the InputControl (and the other controls, AgentControl and MonitorControl) wait for the physics to be done anyway at the next barrier, so the cycle can't be finished too soon. And this loop caused the most problems with multi threading, because it allowed the scene graph to be changed while the physics were still running on it. > There are some collaboration opportunities with your work and what I'm > planning to do, but I'll talk about them in a separate email soon. > Looking forward to it :) > > Thanks, > Hedayat > > > Cheers, > Sander > > [1] http://www.cs.cmu.edu/~mpa/ode/ > > -- > Adaptive Systems Research Group > Department of Computer Science > University of Hertfordshire > United Kingdom > > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Hedayat V. <hed...@gm...> - 2011-02-20 08:28:56
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi,<br> <br> On ۱۱/۰۲/۲۰ 09:06, Sander van Dijk wrote: <blockquote cite="mid:AANLkTinzmGOLWX+OzoWy9Wb9YWUsE5Q4AqzKtZ=Ez...@ma..." type="cite">.... <div class="gmail_quote"> <div><br> </div> <div>Yes, good point. I will make sure to record all test details. For now: I am mostly using valgrind (with the callgrind and helgrind tools in specific). I first tried gprof, but then everything was very unstable, but maybe that's helped with the current fixes.</div> </div> </blockquote> Thanks.<br> <br> <br> <blockquote cite="mid:AANLkTinzmGOLWX+OzoWy9Wb9YWUsE5Q4AqzKtZ=Ez...@ma..." type="cite"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000">...Also, Andreas Seekircher has reported his experience with multi-threaded which also confirms that multi-threaded mode is faster, but also he has faced a problem which deserves some attention:<br> <blockquote type="cite">However there was a strange behavior, that the simulation was running quite fast on my laptop with up to 9 agents and the simulator was using more than one core. When I started the 10th agent it was getting much slower and it seemed that the simulator was then using only one core (it was then again the same speed like without multi-threading). This happened with 4 cores. On a dual core system already the 5th agent slowed down the simulation... Is this a known issue? </blockquote> I guess that in this situation, ODE is the main bottleneck. But that's just a guess.</div> </blockquote> <div><br> </div> <div>Yes, Andreas notified me of that, too. This happens when agents and server are run on the same machine with multiple cores.I have found that at some point, the system's scheduler assigns an agent to the same core as the server (even though in practice there is still room on another core), so the server can't run at full speed. With taskset(1) it is possible to explicitly set a process' CPU affinity, and by starting the agents with e.g. 'taskset 2 ./start.sh localhost' the server is able to take up 100% again.</div> </div> </blockquote> Great, thanks for the info. We might be able to design a more general framework for running agents using Linux CGroups and maybe taking advantage of perf tools. But I should investigate more before being able to comment more on this issue.<br> <br> <br> <blockquote cite="mid:AANLkTinzmGOLWX+OzoWy9Wb9YWUsE5Q4AqzKtZ=Ez...@ma..." type="cite"> <div class="gmail_quote"> <div><br> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <div class="im"> <blockquote type="cite">...<br> </blockquote> </div> Great. But it'd be probably nice if we parallelize the collision detection too. Specially, it's computation time will increase considerably when two or more robots collide (fortunately the new referee doesn't allow many robots to collide at the same place, but with more players it is more likely that we'll have collisions in different part of the field).<br> </div> </blockquote> <div><br> </div> <div>From what I can tell so far it seems that the main part of the speed reduction due to collisions is not caused by the collision detection, but by the fact that it adds many new constraints to the LCP problem that ODE solves to step the physics. However, I still have to make a team of robots that just run into each other to be able to say that with certainty. And you are right, it would be nice in any case ;) <br> </div> </div> </blockquote> Thanks for the clarification. Yes IIRC solving the LCP problem was a real bottleneck. If it doesn't worth the effort, we might skip this part for now. <br> <br> <br> <blockquote cite="mid:AANLkTinzmGOLWX+OzoWy9Wb9YWUsE5Q4AqzKtZ=Ez...@ma..." type="cite"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <div class="im"> <blockquote type="cite">So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know!<br> </blockquote> </div> Certainly! :) <br> I noticed something in your recent commit: you've removed the ugly busy-waiting loop in SimControlThread, but wouldn't it result in a faster simulation when ODE has not much work to do? The loop was there to make sure that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it is now possible for a cycle to finish too soon. <br> </div> </blockquote> <div><br> </div> <div>I think you refer to:</div> <div> <div><br> </div> <div> if (isInputControl)</div> <div> {</div> <div> while (int(mSumDeltaTime*100) < int(mSimStep*100))</div> <div> controlNode->StartCycle(); // advance the time</div> <div> }</div> </div> <div><br> </div> <div>? The only use I saw of that was to keep updating the InputControl to get messages from the monitor while the physics are updated. The time check is there to stop doing this when the physics are done. Without this check, the InputControl (and the other controls, AgentControl and MonitorControl) wait for the physics to be done anyway at the next barrier, so the cycle can't be finished too soon. And this loop caused the most problems with multi threading, because it allowed the scene graph to be changed while the physics were still running on it.</div> </div> </blockquote> Yes, I was referring to this loop. Unfortunately InputControl doesn't exactly do what it is supposed to. Beside handing input, it also functions as the simulator timer, which is very ambiguous (and I'm going to replace it with another timer implementation). This loop is not intended to receive messages from the monitor (you'll see the same loop in SimulationServer::Cycle() which is run in single threaded mode); it is a busy-waiting loop to make sure that a cycle will not last less than mSimStep. On of the functions which InputControl::StartCycle does is to inspect SDL's timer and call SimulationServer::AdvanceTime(), which in turn updates mSumDeltaTime. Yes, doesn't look good and was really problematic for me to understand what happens completely :P<br> <br> Personally, I'm planning to remove the use of SDL timer altogether and switch to using Boost's timing facilities, which is much more cleaner and also doesn't need a busy-waiting loop. <br> <br> Thanks,<br> Hedayat<br> <br> <br> <blockquote cite="mid:AANLkTinzmGOLWX+OzoWy9Wb9YWUsE5Q4AqzKtZ=Ez...@ma..." type="cite"> <div class="gmail_quote"> <div><br> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <br> There are some collaboration opportunities with your work and what I'm planning to do, but I'll talk about them in a separate email soon.<br> </div> </blockquote> <div><br> </div> <div>Looking forward to it :)</div> <div> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <br> Thanks,<br> <font color="#888888"> Hedayat</font><br> </div> </blockquote> </div> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </blockquote> </body> </html> |
From: Sander v. D. <sgv...@gm...> - 2011-02-24 00:43:59
|
Hey all, I have worked on multithreading ODE. First of all thanks for sending me Hesham Ebrahimi's work. He took basically the same approach as I had, so it confirmed my ideas and it helped identifying and verifying problematic places. I also adopted his use if Intel's Thread Building Blocks library. It is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran into at my first attempt. My implementation is slightly different, to make sure that all tasks are created before spawning any of them, which can be an issue when small tasks (like stepping the ball) finish fast. I have uploaded the current result at[1]. Using it I got similar results as Hesham with the same test he, i.e. significantly less cpu cycles spent doing the world stepping. However, although I double checked each part multiple times, the physics seem a little less stable. During most runs it is fine, but at some, especially with a lot of agents, one blows up and the server crashes. I am still digging into it, It is hard to reproduce, and not sure yet if the multi threading makes it worse, but if any of you want to do some test runs and see how it works for you, that may help. Especially if it does blow up and you see a pattern in when it happens. Thanks all, Sander [1] http://homepages.feis.herts.ac.uk/~sv08aav/ode-0.11.1-tbb.tar.gz PS don't forget to install tbb dev packages (though configure should warn you about that too) and remake and install simspark PS2 gmail won't let me tar.gz, that's why it's targz On Wed, Feb 23, 2011 at 7:20 PM, Sander van Dijk <sgv...@gm...>wrote: > Hey all, > > I have worked on multithreading ODE. First of all thanks for sending me > Hesham Ebrahimi's work. He took basically the same approach as I had, so it > confirmed my ideas and it helped identifying and verifying problematic > places. I also adopted his use if Intel's Thread Building Blocks library. It > is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran > into at my first attempt. My implementation is slightly different, to make > sure that all tasks are created before spawning any of them, which can be an > issue when small tasks (like stepping the ball) finish fast. > > I have attached the current result here. Using it I got similar results as > Hesham with the same test he, i.e. significantly less cpu cycles spent doing > the world stepping. However, although I double checked each part multiple > times, the physics seem a little less stable. During most runs it is fine, > but at some, especially with a lot of agents, one blows up and the server > crashes. I am still digging into it, It is hard to reproduce, and not sure > yet if the multi threading makes it worse, but if any of you want to do some > test runs and see how it works for you, that may help. Especially if it does > blow up and you see a pattern in when it happens. > > Thanks all, > > Sander > > PS don't forget to install tbb dev packages (though configure should warn > you about that too) and remake and install simspark > PS2 gmail won't let me tar.gz, that's why it's targz > > > On Sun, Feb 20, 2011 at 3:28 AM, Hedayat Vatankhah <hed...@gm...>wrote: > >> Hi, >> >> On ۱۱/۰۲/۲۰ 09:06, Sander van Dijk wrote: >> >> .... >> >> Yes, good point. I will make sure to record all test details. For now: I >> am mostly using valgrind (with the callgrind and helgrind tools in >> specific). I first tried gprof, but then everything was very unstable, but >> maybe that's helped with the current fixes. >> >> Thanks. >> >> >> >> ...Also, Andreas Seekircher has reported his experience with >>> multi-threaded which also confirms that multi-threaded mode is faster, but >>> also he has faced a problem which deserves some attention: >>> >>> However there was a strange behavior, that the simulation was running >>> quite fast on my laptop with up to 9 agents and the simulator was using more >>> than one core. When I started the 10th agent it was getting much slower and >>> it seemed that the simulator was then using only one core (it was then again >>> the same speed like without multi-threading). This happened with 4 cores. On >>> a dual core system already the 5th agent slowed down the simulation... Is >>> this a known issue? >>> >>> I guess that in this situation, ODE is the main bottleneck. But that's >>> just a guess. >>> >> >> Yes, Andreas notified me of that, too. This happens when agents and >> server are run on the same machine with multiple cores.I have found that at >> some point, the system's scheduler assigns an agent to the same core as the >> server (even though in practice there is still room on another core), so the >> server can't run at full speed. With taskset(1) it is possible to explicitly >> set a process' CPU affinity, and by starting the agents with e.g. 'taskset >> 2 ./start.sh localhost' the server is able to take up 100% again. >> >> Great, thanks for the info. We might be able to design a more general >> framework for running agents using Linux CGroups and maybe taking advantage >> of perf tools. But I should investigate more before being able to comment >> more on this issue. >> >> >> >> ... >>> >>> Great. But it'd be probably nice if we parallelize the collision >>> detection too. Specially, it's computation time will increase considerably >>> when two or more robots collide (fortunately the new referee doesn't allow >>> many robots to collide at the same place, but with more players it is more >>> likely that we'll have collisions in different part of the field). >>> >> >> From what I can tell so far it seems that the main part of the speed >> reduction due to collisions is not caused by the collision detection, but by >> the fact that it adds many new constraints to the LCP problem that ODE >> solves to step the physics. However, I still have to make a team of robots >> that just run into each other to be able to say that with certainty. And you >> are right, it would be nice in any case ;) >> >> Thanks for the clarification. Yes IIRC solving the LCP problem was a real >> bottleneck. If it doesn't worth the effort, we might skip this part for now. >> >> >> >> So far about what I am doing. Now, I would also like something from >>> you guys ;-) First of all, give the new stuff I committed a good test. >>> Behaviour of the simulator should still be the same, but it could be that I >>> missed something and that timing of messages is slightly different, breaking >>> agents. Also, give the multi threaded mode a good test, see if you can make >>> it crash. And, finally, I will be working full time on the simulator for 1 >>> 1/2 months more, if you think there is anything that I may be able to >>> squeeze in there, do let me know! >>> >>> Certainly! :) >>> I noticed something in your recent commit: you've removed the ugly >>> busy-waiting loop in SimControlThread, but wouldn't it result in a faster >>> simulation when ODE has not much work to do? The loop was there to make sure >>> that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it >>> is now possible for a cycle to finish too soon. >>> >> >> I think you refer to: >> >> if (isInputControl) >> { >> while (int(mSumDeltaTime*100) < int(mSimStep*100)) >> controlNode->StartCycle(); // advance the time >> } >> >> ? The only use I saw of that was to keep updating the InputControl to >> get messages from the monitor while the physics are updated. The time check >> is there to stop doing this when the physics are done. Without this check, >> the InputControl (and the other controls, AgentControl and MonitorControl) >> wait for the physics to be done anyway at the next barrier, so the cycle >> can't be finished too soon. And this loop caused the most problems with >> multi threading, because it allowed the scene graph to be changed while the >> physics were still running on it. >> >> Yes, I was referring to this loop. Unfortunately InputControl doesn't >> exactly do what it is supposed to. Beside handing input, it also functions >> as the simulator timer, which is very ambiguous (and I'm going to replace it >> with another timer implementation). This loop is not intended to receive >> messages from the monitor (you'll see the same loop in >> SimulationServer::Cycle() which is run in single threaded mode); it is a >> busy-waiting loop to make sure that a cycle will not last less than >> mSimStep. On of the functions which InputControl::StartCycle does is to >> inspect SDL's timer and call SimulationServer::AdvanceTime(), which in turn >> updates mSumDeltaTime. Yes, doesn't look good and was really problematic for >> me to understand what happens completely :P >> >> Personally, I'm planning to remove the use of SDL timer altogether and >> switch to using Boost's timing facilities, which is much more cleaner and >> also doesn't need a busy-waiting loop. >> >> Thanks, >> Hedayat >> >> >> >> >>> There are some collaboration opportunities with your work and what I'm >>> planning to do, but I'll talk about them in a separate email soon. >>> >> >> Looking forward to it :) >> >> >>> >>> Thanks, >>> Hedayat >>> >> -- >> Adaptive Systems Research Group >> Department of Computer Science >> University of Hertfordshire >> United Kingdom >> >> > > > -- > Adaptive Systems Research Group > Department of Computer Science > University of Hertfordshire > United Kingdom > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Sander v. D. <sgv...@gm...> - 2011-03-02 17:41:18
|
Hey guys, Another update: currently, with the ODE version I put now at my page [1], and current SVN head of rcssserver3d, I am able to run 9 vs 9 in real time on my i7 740QM laptop, with the agents and monitor also running on the same machine. Having said that: - This is when the number of collisions is not too high. Because of the extra constraints and the fact that they result in agents being put into 1 one island, collisions are even worse in multithreaded version. The select-and-move functionality of RoboViz comes in quite handy here, but it's still a pain. - Even worse: with many collisions ODE crashes. This seems to be a memory issue, probably because ODE allocates everything on the stack and runs out of space there. Indeed, configuring ODE with -DdUSE_MALLOC_FOR_ALLOCA seems to prevent this issue by using the heap instead. However, using the stack frequently is suboptimal for multithreading, it might be worth it looking into TBB's options to increase thread stack size. - Multithreading in Simspark is still sometimes unstable, s with 18 agents, eemingly on the communication/predicate generation/parsing side. Looking into that (after the rather enjoying 18 robot wrestling match currently going on on my screen ;) Sander On Wed, Feb 23, 2011 at 7:43 PM, Sander van Dijk <sgv...@gm...>wrote: > Hey all, > > I have worked on multithreading ODE. First of all thanks for sending me > Hesham Ebrahimi's work. He took basically the same approach as I had, so it > confirmed my ideas and it helped identifying and verifying problematic > places. I also adopted his use if Intel's Thread Building Blocks library. It > is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran > into at my first attempt. My implementation is slightly different, to make > sure that all tasks are created before spawning any of them, which can be an > issue when small tasks (like stepping the ball) finish fast. > > I have uploaded the current result at[1]. Using it I got similar results as > Hesham with the same test he, i.e. significantly less cpu cycles spent doing > the world stepping. However, although I double checked each part multiple > times, the physics seem a little less stable. During most runs it is fine, > but at some, especially with a lot of agents, one blows up and the server > crashes. I am still digging into it, It is hard to reproduce, and not sure > yet if the multi threading makes it worse, but if any of you want to do some > test runs and see how it works for you, that may help. Especially if it does > blow up and you see a pattern in when it happens. > > Thanks all, > > Sander > > [1] http://homepages.feis.herts.ac.uk/~sv08aav/ode-0.11.1-tbb.tar.gz > > > PS don't forget to install tbb dev packages (though configure should warn > you about that too) and remake and install simspark > PS2 gmail won't let me tar.gz, that's why it's targz > > On Wed, Feb 23, 2011 at 7:20 PM, Sander van Dijk <sgv...@gm...>wrote: > >> Hey all, >> >> I have worked on multithreading ODE. First of all thanks for sending me >> Hesham Ebrahimi's work. He took basically the same approach as I had, so it >> confirmed my ideas and it helped identifying and verifying problematic >> places. I also adopted his use if Intel's Thread Building Blocks library. It >> is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran >> into at my first attempt. My implementation is slightly different, to make >> sure that all tasks are created before spawning any of them, which can be an >> issue when small tasks (like stepping the ball) finish fast. >> >> I have attached the current result here. Using it I got similar results as >> Hesham with the same test he, i.e. significantly less cpu cycles spent doing >> the world stepping. However, although I double checked each part multiple >> times, the physics seem a little less stable. During most runs it is fine, >> but at some, especially with a lot of agents, one blows up and the server >> crashes. I am still digging into it, It is hard to reproduce, and not sure >> yet if the multi threading makes it worse, but if any of you want to do some >> test runs and see how it works for you, that may help. Especially if it does >> blow up and you see a pattern in when it happens. >> >> Thanks all, >> >> Sander >> >> PS don't forget to install tbb dev packages (though configure should warn >> you about that too) and remake and install simspark >> PS2 gmail won't let me tar.gz, that's why it's targz >> >> >> On Sun, Feb 20, 2011 at 3:28 AM, Hedayat Vatankhah <hed...@gm...>wrote: >> >>> Hi, >>> >>> On ۱۱/۰۲/۲۰ 09:06, Sander van Dijk wrote: >>> >>> .... >>> >>> Yes, good point. I will make sure to record all test details. For now: >>> I am mostly using valgrind (with the callgrind and helgrind tools in >>> specific). I first tried gprof, but then everything was very unstable, but >>> maybe that's helped with the current fixes. >>> >>> Thanks. >>> >>> >>> >>> ...Also, Andreas Seekircher has reported his experience with >>>> multi-threaded which also confirms that multi-threaded mode is faster, but >>>> also he has faced a problem which deserves some attention: >>>> >>>> However there was a strange behavior, that the simulation was running >>>> quite fast on my laptop with up to 9 agents and the simulator was using more >>>> than one core. When I started the 10th agent it was getting much slower and >>>> it seemed that the simulator was then using only one core (it was then again >>>> the same speed like without multi-threading). This happened with 4 cores. On >>>> a dual core system already the 5th agent slowed down the simulation... Is >>>> this a known issue? >>>> >>>> I guess that in this situation, ODE is the main bottleneck. But that's >>>> just a guess. >>>> >>> >>> Yes, Andreas notified me of that, too. This happens when agents and >>> server are run on the same machine with multiple cores.I have found that at >>> some point, the system's scheduler assigns an agent to the same core as the >>> server (even though in practice there is still room on another core), so the >>> server can't run at full speed. With taskset(1) it is possible to explicitly >>> set a process' CPU affinity, and by starting the agents with e.g. 'taskset >>> 2 ./start.sh localhost' the server is able to take up 100% again. >>> >>> Great, thanks for the info. We might be able to design a more general >>> framework for running agents using Linux CGroups and maybe taking advantage >>> of perf tools. But I should investigate more before being able to comment >>> more on this issue. >>> >>> >>> >>> ... >>>> >>>> Great. But it'd be probably nice if we parallelize the collision >>>> detection too. Specially, it's computation time will increase considerably >>>> when two or more robots collide (fortunately the new referee doesn't allow >>>> many robots to collide at the same place, but with more players it is more >>>> likely that we'll have collisions in different part of the field). >>>> >>> >>> From what I can tell so far it seems that the main part of the speed >>> reduction due to collisions is not caused by the collision detection, but by >>> the fact that it adds many new constraints to the LCP problem that ODE >>> solves to step the physics. However, I still have to make a team of robots >>> that just run into each other to be able to say that with certainty. And you >>> are right, it would be nice in any case ;) >>> >>> Thanks for the clarification. Yes IIRC solving the LCP problem was a real >>> bottleneck. If it doesn't worth the effort, we might skip this part for now. >>> >>> >>> >>> So far about what I am doing. Now, I would also like something from >>>> you guys ;-) First of all, give the new stuff I committed a good test. >>>> Behaviour of the simulator should still be the same, but it could be that I >>>> missed something and that timing of messages is slightly different, breaking >>>> agents. Also, give the multi threaded mode a good test, see if you can make >>>> it crash. And, finally, I will be working full time on the simulator for 1 >>>> 1/2 months more, if you think there is anything that I may be able to >>>> squeeze in there, do let me know! >>>> >>>> Certainly! :) >>>> I noticed something in your recent commit: you've removed the ugly >>>> busy-waiting loop in SimControlThread, but wouldn't it result in a faster >>>> simulation when ODE has not much work to do? The loop was there to make sure >>>> that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it >>>> is now possible for a cycle to finish too soon. >>>> >>> >>> I think you refer to: >>> >>> if (isInputControl) >>> { >>> while (int(mSumDeltaTime*100) < int(mSimStep*100)) >>> controlNode->StartCycle(); // advance the time >>> } >>> >>> ? The only use I saw of that was to keep updating the InputControl to >>> get messages from the monitor while the physics are updated. The time check >>> is there to stop doing this when the physics are done. Without this check, >>> the InputControl (and the other controls, AgentControl and MonitorControl) >>> wait for the physics to be done anyway at the next barrier, so the cycle >>> can't be finished too soon. And this loop caused the most problems with >>> multi threading, because it allowed the scene graph to be changed while the >>> physics were still running on it. >>> >>> Yes, I was referring to this loop. Unfortunately InputControl doesn't >>> exactly do what it is supposed to. Beside handing input, it also functions >>> as the simulator timer, which is very ambiguous (and I'm going to replace it >>> with another timer implementation). This loop is not intended to receive >>> messages from the monitor (you'll see the same loop in >>> SimulationServer::Cycle() which is run in single threaded mode); it is a >>> busy-waiting loop to make sure that a cycle will not last less than >>> mSimStep. On of the functions which InputControl::StartCycle does is to >>> inspect SDL's timer and call SimulationServer::AdvanceTime(), which in turn >>> updates mSumDeltaTime. Yes, doesn't look good and was really problematic for >>> me to understand what happens completely :P >>> >>> Personally, I'm planning to remove the use of SDL timer altogether and >>> switch to using Boost's timing facilities, which is much more cleaner and >>> also doesn't need a busy-waiting loop. >>> >>> Thanks, >>> Hedayat >>> >>> >>> >>> >>>> There are some collaboration opportunities with your work and what I'm >>>> planning to do, but I'll talk about them in a separate email soon. >>>> >>> >>> Looking forward to it :) >>> >>> >>>> >>>> Thanks, >>>> Hedayat >>>> >>> -- >>> Adaptive Systems Research Group >>> Department of Computer Science >>> University of Hertfordshire >>> United Kingdom >>> >>> >> >> >> -- >> Adaptive Systems Research Group >> Department of Computer Science >> University of Hertfordshire >> United Kingdom >> > > > > -- > Adaptive Systems Research Group > Department of Computer Science > University of Hertfordshire > United Kingdom > -- Adaptive Systems Research Group Department of Computer Science University of Hertfordshire United Kingdom |
From: Hedayat V. <hed...@gm...> - 2011-03-05 19:41:40
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Sander,<br> <br> <span> <style type="text/css">blockquote {color: navy !important; background-color: RGB(245,245,245) !important; padding: 0 15 10 15 !important; margin: 15 0 0 0; border-left: #1010ff 2px solid;} blockquote blockquote {color: maroon !important; background-color: RGB(235,235,235) !important; border-left-color:maroon !important} blockquote blockquote blockquote {color: green !important; background-color: RGB(225,225,225) !important; border-left-color:teal !important} blockquote blockquote blockquote blockquote {color: purple !important; background-color: RGB(215,215,215) !important; border-left-color: purple !important} blockquote blockquote blockquote blockquote blockquote {color: teal !important; background-color: RGB(205,205,205) !important; border-left-color: green !important}</style><i><b>Sander van Dijk <a class="moz-txt-link-rfc2396E" href="mailto:sgv...@gm..."><sgv...@gm...></a></b></i> wrote on 03/02/2011 9:11:10 PM +0350:</span><br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite">Hey guys, <div><br> </div> <div>Another update: currently, with the ODE version I put now at my page [1], and current SVN head of rcssserver3d, I am able to run 9 vs 9 in real time on my i7 740QM laptop, with the agents and monitor also running on the same machine. Having said that:</div> </blockquote> First, thanks a lot for making us informed about your progress. Unfortunately I was a little busy these days and unable to follow you fast enough :)<br> It's great to hear about the results so far, and I'm really happy to hear that the work of Hesham was not wasted. <br> <br> <br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite"> <div><br> </div> <div>- This is when the number of collisions is not too high. Because of the extra constraints and the fact that they result in agents being put into 1 one island, collisions are even worse in multithreaded version. The select-and-move functionality of RoboViz comes in quite handy here, but it's still a pain.</div> </blockquote> Is this a considerable issue even with the current automatic referee? Are all agents able to collide with each other even with this referee?<br> <br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite"> <div>- Even worse: with many collisions ODE crashes. This seems to be a memory issue, probably because ODE allocates everything on the stack and runs out of space there. Indeed, configuring ODE with -DdUSE_MALLOC_FOR_ALLOCA seems to prevent this issue by using the heap instead. However, using the stack frequently is suboptimal for multithreading, it might be worth it looking into TBB's options to increase thread stack size.</div> </blockquote> We might be able to use a custom memory allocator too. So that we can allocate a big block of memory from heap and assign it to ode in an efficient manner. However, I don't know how ODE uses memory so my suggestion might not be very practical.<br> <br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite"> <div>- Multithreading in Simspark is still sometimes unstable, s with 18 agents, eemingly on the communication/predicate generation/parsing side. Looking into that (after the rather enjoying 18 robot wrestling match currently going on on my screen ;)</div> </blockquote> About the communication part, I'm going to take this issue very seriously soon. <br> <br> Thanks,<br> Hedayat<br> <br> <blockquote style="color: navy; background-color: rgb(245, 245, 245); padding-left: 15px; border-left: 2px solid rgb(16, 16, 255);" cite="mid:AAN...@ma..." type="cite"> <div><br> </div> <div>Sander</div> <div><br> </div> <div><br> <div class="gmail_quote">On Wed, Feb 23, 2011 at 7:43 PM, Sander van Dijk <span dir="ltr"><<a moz-do-not-send="true" href="mailto:sgv...@gm...">sgv...@gm...</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div class="im">Hey all,<br> <br> I have worked on multithreading ODE. First of all thanks for sending me Hesham Ebrahimi's work. He took basically the same approach as I had, so it confirmed my ideas and it helped identifying and verifying problematic places. I also adopted his use if Intel's Thread Building Blocks library. It is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran into at my first attempt. My implementation is slightly different, to make sure that all tasks are created before spawning any of them, which can be an issue when small tasks (like stepping the ball) finish fast.<br> <br> </div> I have uploaded the current result at[1]. Using it I got similar results as Hesham with the same test he, i.e. significantly less cpu cycles spent doing the world stepping. However, although I double checked each part multiple times, the physics seem a little less stable. During most runs it is fine, but at some, especially with a lot of agents, one blows up and the server crashes. I am still digging into it, It is hard to reproduce, and not sure yet if the multi threading makes it worse, but if any of you want to do some test runs and see how it works for you, that may help. Especially if it does blow up and you see a pattern in when it happens.<br> <br> Thanks all,<br> <font color="#888888"><br> Sander<br> </font><br> [1] <a moz-do-not-send="true" href="http://homepages.feis.herts.ac.uk/%7Esv08aav/ode-0.11.1-tbb.tar.gz" target="_blank">http://homepages.feis.herts.ac.uk/~sv08aav/ode-0.11.1-tbb.tar.gz</a> <div class="im"> <br> <br> PS don't forget to install tbb dev packages (though configure should warn you about that too) and remake and install simspark<br> PS2 gmail won't let me tar.gz, that's why it's targz<br> <br> </div> <div> <div> </div> <div class="h5"> <div class="gmail_quote"> On Wed, Feb 23, 2011 at 7:20 PM, Sander van Dijk <span dir="ltr"><<a moz-do-not-send="true" href="mailto:sgv...@gm..." target="_blank">sgv...@gm...</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Hey all,<br> <br> I have worked on multithreading ODE. First of all thanks for sending me Hesham Ebrahimi's work. He took basically the same approach as I had, so it confirmed my ideas and it helped identifying and verifying problematic places. I also adopted his use if Intel's Thread Building Blocks library. It is quite helpful; simple as OpenMP, but doesn't respawn threads, which I ran into at my first attempt. My implementation is slightly different, to make sure that all tasks are created before spawning any of them, which can be an issue when small tasks (like stepping the ball) finish fast.<br> <br> I have attached the current result here. Using it I got similar results as Hesham with the same test he, i.e. significantly less cpu cycles spent doing the world stepping. However, although I double checked each part multiple times, the physics seem a little less stable. During most runs it is fine, but at some, especially with a lot of agents, one blows up and the server crashes. I am still digging into it, It is hard to reproduce, and not sure yet if the multi threading makes it worse, but if any of you want to do some test runs and see how it works for you, that may help. Especially if it does blow up and you see a pattern in when it happens.<br> <br> Thanks all,<br> <font color="#888888"><br> Sander<br> </font><br> PS don't forget to install tbb dev packages (though configure should warn you about that too) and remake and install simspark<br> PS2 gmail won't let me tar.gz, that's why it's targz <div> <div><br> <br> <div class="gmail_quote">On Sun, Feb 20, 2011 at 3:28 AM, Hedayat Vatankhah <span dir="ltr"><<a moz-do-not-send="true" href="mailto:hed...@gm..." target="_blank">hed...@gm...</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> Hi,<br> <br> On ۱۱/۰۲/۲۰ 09:06, Sander van Dijk wrote: <blockquote type="cite">.... <div> <div class="gmail_quote"> <div><br> </div> <div>Yes, good point. I will make sure to record all test details. For now: I am mostly using valgrind (with the callgrind and helgrind tools in specific). I first tried gprof, but then everything was very unstable, but maybe that's helped with the current fixes.</div> </div> </div> </blockquote> Thanks.<br> <br> <br> <blockquote type="cite"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000">...Also, Andreas Seekircher has reported his experience with multi-threaded which also confirms that multi-threaded mode is faster, but also he has faced a problem which deserves some attention: <div><br> <blockquote type="cite">However there was a strange behavior, that the simulation was running quite fast on my laptop with up to 9 agents and the simulator was using more than one core. When I started the 10th agent it was getting much slower and it seemed that the simulator was then using only one core (it was then again the same speed like without multi-threading). This happened with 4 cores. On a dual core system already the 5th agent slowed down the simulation... Is this a known issue? </blockquote> I guess that in this situation, ODE is the main bottleneck. But that's just a guess.</div> </div> </blockquote> <div> <div><br> </div> <div>Yes, Andreas notified me of that, too. This happens when agents and server are run on the same machine with multiple cores.I have found that at some point, the system's scheduler assigns an agent to the same core as the server (even though in practice there is still room on another core), so the server can't run at full speed. With taskset(1) it is possible to explicitly set a process' CPU affinity, and by starting the agents with e.g. 'taskset 2 ./start.sh localhost' the server is able to take up 100% again.</div> </div> </div> </blockquote> Great, thanks for the info. We might be able to design a more general framework for running agents using Linux CGroups and maybe taking advantage of perf tools. But I should investigate more before being able to comment more on this issue.<br> <br> <br> <blockquote type="cite"> <div class="gmail_quote"> <div><br> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <div> <blockquote type="cite">...<br> </blockquote> </div> <div> Great. But it'd be probably nice if we parallelize the collision detection too. Specially, it's computation time will increase considerably when two or more robots collide (fortunately the new referee doesn't allow many robots to collide at the same place, but with more players it is more likely that we'll have collisions in different part of the field).<br> </div> </div> </blockquote> <div> <div><br> </div> <div>From what I can tell so far it seems that the main part of the speed reduction due to collisions is not caused by the collision detection, but by the fact that it adds many new constraints to the LCP problem that ODE solves to step the physics. However, I still have to make a team of robots that just run into each other to be able to say that with certainty. And you are right, it would be nice in any case ;) <br> </div> </div> </div> </blockquote> Thanks for the clarification. Yes IIRC solving the LCP problem was a real bottleneck. If it doesn't worth the effort, we might skip this part for now. <br> <div> <br> <br> <blockquote type="cite"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <div> <blockquote type="cite">So far about what I am doing. Now, I would also like something from you guys ;-) First of all, give the new stuff I committed a good test. Behaviour of the simulator should still be the same, but it could be that I missed something and that timing of messages is slightly different, breaking agents. Also, give the multi threaded mode a good test, see if you can make it crash. And, finally, I will be working full time on the simulator for 1 1/2 months more, if you think there is anything that I may be able to squeeze in there, do let me know!<br> </blockquote> </div> Certainly! :) <br> I noticed something in your recent commit: you've removed the ugly busy-waiting loop in SimControlThread, but wouldn't it result in a faster simulation when ODE has not much work to do? The loop was there to make sure that a cycle will last no less than 0.02 (mSimStep). If I'm not mistaken, it is now possible for a cycle to finish too soon. <br> </div> </blockquote> <div><br> </div> <div>I think you refer to:</div> <div> <div><br> </div> <div> if (isInputControl)</div> <div> {</div> <div> while (int(mSumDeltaTime*100) < int(mSimStep*100))</div> <div> controlNode->StartCycle(); // advance the time</div> <div> }</div> </div> <div><br> </div> <div>? The only use I saw of that was to keep updating the InputControl to get messages from the monitor while the physics are updated. The time check is there to stop doing this when the physics are done. Without this check, the InputControl (and the other controls, AgentControl and MonitorControl) wait for the physics to be done anyway at the next barrier, so the cycle can't be finished too soon. And this loop caused the most problems with multi threading, because it allowed the scene graph to be changed while the physics were still running on it.</div> </div> </blockquote> </div> Yes, I was referring to this loop. Unfortunately InputControl doesn't exactly do what it is supposed to. Beside handing input, it also functions as the simulator timer, which is very ambiguous (and I'm going to replace it with another timer implementation). This loop is not intended to receive messages from the monitor (you'll see the same loop in SimulationServer::Cycle() which is run in single threaded mode); it is a busy-waiting loop to make sure that a cycle will not last less than mSimStep. On of the functions which InputControl::StartCycle does is to inspect SDL's timer and call SimulationServer::AdvanceTime(), which in turn updates mSumDeltaTime. Yes, doesn't look good and was really problematic for me to understand what happens completely :P<br> <br> Personally, I'm planning to remove the use of SDL timer altogether and switch to using Boost's timing facilities, which is much more cleaner and also doesn't need a busy-waiting loop. <br> <br> Thanks,<br> <font color="#888888"> Hedayat<br> <br> <br> </font> <blockquote type="cite"> <div> <div class="gmail_quote"> <div><br> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <br> There are some collaboration opportunities with your work and what I'm planning to do, but I'll talk about them in a separate email soon.<br> </div> </blockquote> <div><br> </div> <div>Looking forward to it :)</div> <div> </div> <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"> <br> Thanks,<br> <font color="#888888"> Hedayat</font><br> </div> </blockquote> </div> </div> -- <br> <div> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </div> </blockquote> </div> </blockquote> </div> <br> <br clear="all"> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </div> </div> </blockquote> </div> <br> <br clear="all"> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </div> </div> </blockquote> </div> <br> <br clear="all"> <br> -- <br> Adaptive Systems Research Group<br> Department of Computer Science<br> University of Hertfordshire<br> United Kingdom<br> </div> </blockquote> </body> </html> |