Re: [opendemo-devel] Perfomance.
Status: Beta
Brought to you by:
girlich
From: Dr. U. G. <Uwe...@ph...> - 2002-01-07 07:22:00
|
Hello! > > ... Please perform another profiler run first. > Gotcha, done them. > I've had to redo the profiling with a special demo for this (my demo i've > run previous time was lost), the demo spans for 20 seconds and is uploaded > to cvs (1 megabyte) with a config and windows bat file, so if you set up the > linux profiling, use the config which starts the demo, they're still not > perfect, because after the demo ends server doesn't quit automatically and > library isn't unloaded, thus making stale calls to vmmain and affecting the > stats a bit. 2 problems here: I tried the linux profiler gprof (together with the GCC compiler option -pg) but this somehow gave me no result file. I suspect, that a special initialization function will be called before main() in such binaries but we link only shared libraries, so this function call is missing. I tried the environment varibale LD_PROFILE, which was made exactly for the purpose of profiling a shared library but the ld.so included profiler had problems loading the nvidia OpenGL driver (I did not try to profile this library but it is somehow involved too (at least for time measurements)). So I need to find the init-function, call it myself in vmMain() and'll try -pg/gprof again. > >From what I can see, the optimizations are clearly viisble, even without > profiling my fps has gone better. These are good news. > Func Func+Child Hit > Time % Time % Count Function > --------------------------------------------------------- > 852,447 15,8 867,984 16,1 40371 _odpStartElement (odp_parse.obj) This comes from the object memcopy. Beside _odpUpdateEntity and _xmlParseName the next functions show all only symptons of the same problem: the linear buffer, which is a direct copy of a part of the file. As Conor already mentioned, a ring-buffer would be much better here. I'll think about it (and I have already coded several ring-buffers in previous projects). > 706,381 13,1 1925,211 35,7 2791160 _fbGrow (od_filebuf.obj) > 602,798 11,2 602,798 11,2 1457110 _trap_FS_Read (g_syscalls.obj) > 470,155 8,7 2291,271 42,5 2037568 _fbReadChar (od_filebuf.obj) > 441,877 8,2 1044,675 19,4 1457110 _odFread (od_q3a_fileio.obj) > 244,663 4,5 394,574 7,3 404480 _odpUpdateEntity (odp_main.obj) > 224,653 4,2 224,653 4,2 1768386 _mbResize (od_membuf.obj) C library realloc() is out of reach for profiling, so we get the same result for Func and Func+Child > 198,653 3,7 1667,464 30,9 110023 _xmlParseName (od_xml_read.obj) > 160,243 3,0 241,820 4,5 643569 _fbSkip (od_filebuf.obj) A ring-buffer would simply solve this bottleneck of constantly calling fbGrow(), mbResize and memcpy() but we have 21 occurrences of mbGetBuffer(), whose are not so easily converted to a ring-buffer. We need to know before calling mbGetBuffer(), how many bytes should be available. Maybe this is already done with the usually fbGrow() call before mbGetBuffer() but I'm not really sure in all 21 cases. The second problem are the new files in the main directory! Eugene, please move them into a newly created profiling (or test/profile or test/profile/data and test/profile/bin) directory. The main directory is really not the place for this kind of files. Bye, Uwe |