From: Hanna L. <ha...@us...> - 2002-09-20 18:04:03
|
LSE Con Call Minutes Sept 20, 2002 I. Peter Wong - rmap performance testing rmap consumes a lot of memory and many pte's. using read (as compared to readv) without rmap on 2.5.25 was the only way he was able to finish the runs. martin said he is running into memory issues and they know the problem and will have the fixes soon. Martin told him to try using threads instead of processes or less tasks in general that might help peters runs. there are optimizations in 2.5.37 that will make rmap take up almost half the space. Badari asked if he was not able to get any throughput runs yet (even without rmap)? Peter said that is true. II. Duc Vianney - Hyperthreading performance He is using multithreaded benchmarks. Did not see a big difference between hyperthreading and no hyperthreading. except in aim 9 sync disk test there was a 38% degredation. unfortunately the hardware was taken away so he can not rerun that test right now. eventually he will get it back and should be able to test it again. Martin asked them to try to rerun the aim 9 test again since that is the only one showing interesting results and it should be fixed. III. Dave McCracken - rmap performance testing (rollup patch backported to 2.5.26) He wanted to enable testing of base rmap against nonrmap for performance testing. 2.5.26 was the release before rmap went in. took the original patch that got merged plus some of the basic optimizations against 2.5.26. performance team tested it and in general showed it was not a significant cost. some of the tests they ran didnt stress the areas where it (fork, exec) might have a cost. but akpm is testing fork and exec. However, when they started testing with volanomark (paging) it showed some performance hits. Martin said it wasnt what they were expecting since they thought the static overhead would be the most cost and would help most wiht paging. but that wasnt the case at least with this 2.5.26 (also with -mm1 was slower). so something is still broken there. Dave reminded everyone there are other changes in 2.5 besides rmap. Dave thinks we will just have to live with the instabilities and keep testing as changes come out and try to abstract out which parts are memory related and which ones are not. Badari reminded everyone that 2.5.32 and on is bouncing need akpms scsi_hack otherwise you bounce. It is a one line fix you will need otherwise you will bounce. but since it is a hack akpm is not pushing it to Linus. IV. Martin Bligh - rmap status Covered most already but we know there are performance issues. linus has merged the intel large page support patch. going to try to replace the interface to make it more standard and easier to use. all of it should be done before halloween. bill hartner asked where intel has the interface documented. laughter, regarding his optimism. it would be nice to see it documented though. bill hartner has a question about large memory reducing tlb misses. martin increase tlb coverage by a factor of a 1000. how measureable is the reduction in tlb misses? martin - mostly shown by faster processes and memory interfaces. mainly win in not stalling the cpu. bill was thinking about some performance counters to measure stalls. andi kleen was having horrendous tlb misses using gcc and something? V. Bill Hartner Rik asked for 2.4.19 measurements. baseline with O1 scheduler. noticed that even on 4bg system should not swap with 2.4 rmap patch. even with 1/2 gig free memory it is swapping. martin said to definitely mail that out. Bill going to rerun with 2bg swap. might not have those out until monday. |
From: William L. I. I. <wl...@ho...> - 2002-09-20 23:15:35
|
On Fri, Sep 20, 2002 at 11:08:18AM -0700, Hanna Linder wrote: > IV. Martin Bligh - rmap status > Covered most already but we know there are performance issues. linus has > merged the intel large page support patch. going to try to replace > the interface to make it more standard and easier to use. all of > it should be done before halloween. bill hartner asked where intel > has the interface documented. laughter, regarding his optimism. > it would be nice to see it documented though. Hubertus and I will be pushing this ASAP. Hopefully you can help get some numbers and/or early testing in. On Fri, Sep 20, 2002 at 11:08:18AM -0700, Hanna Linder wrote: > V. Bill Hartner > Rik asked for 2.4.19 measurements. baseline with O1 scheduler. > noticed that even on 4bg system should not swap with 2.4 rmap patch. > even with 1/2 gig free memory it is swapping. martin said to > definitely mail that out. Bill going to rerun with 2bg swap. might > not have those out until monday. That's a bit odd. Could it be related to some of the low memory thresholds? Cheers, Bill |
From: Erich F. <ef...@es...> - 2002-09-22 11:04:18
|
On Friday 20 September 2002 20:08, Hanna Linder wrote: > LSE Con Call Minutes Sept 20, 2002 > > II. Duc Vianney - Hyperthreading performance > > He is using multithreaded benchmarks. Did not see a big difference betw= een > hyperthreading and no hyperthreading. except in aim 9 sync disk test > there was a 38% degredation. unfortunately the hardware was taken > away so he can not rerun that test right now. eventually he will > get it back and should be able to test it again. Martin asked them > to try to rerun the aim 9 test again since that is the only one > showing interesting results and it should be fixed. Hmmm, I think one cannot really expect performance improvements from the HT patch for benchmarks with many threads. HT brings you some improvements if there are in average less than 2 tasks running per CPU core (i.e. each sibling gets in average less than 1 task). The aim of the HT patches is to get the cores equally loaded when having few (longrunning) tasks. With many tasks all runqueues are full and the scheduler gets the core loads balanced anyway. Regards, Erich PS: The HT patch for sharing RQs (from Jun+Ingo) is not in the mainline kernel, I can't see it in 2.5.35-37. Duc wrote that it went into 2.5.32, do I misunderstand something or was it just applied separately for the tests? |
From: Martin J. B. <mb...@ar...> - 2002-09-22 15:06:18
|
> Hmmm, I think one cannot really expect performance improvements from > the HT patch for benchmarks with many threads. HT brings you some > improvements if there are in average less than 2 tasks running per > CPU core (i.e. each sibling gets in average less than 1 task). Errm, if it can't get perf improvements multithreading, that makes it rather pointless doesn't it? Isn't that the whole point of SMT in the first place? Perhaps I'm just misunderstanding you. > The aim > of the HT patches is to get the cores equally loaded when having few > (longrunning) tasks. With many tasks all runqueues are full and the > scheduler gets the core loads balanced anyway. Sure, the sched changes seem very sane. There's two things I think we should gain from the sched changes - 1. balance out across cpus when you have few tasks (what you mention), and 2. try not to keep tasks local to one pair of evil twins when you have many tasks. M. |
From: Erich F. <ef...@es...> - 2002-09-22 20:05:12
|
On Sunday 22 September 2002 17:04, Martin J. Bligh wrote: > > Hmmm, I think one cannot really expect performance improvements from > > the HT patch for benchmarks with many threads. HT brings you some > > improvements if there are in average less than 2 tasks running per > > CPU core (i.e. each sibling gets in average less than 1 task). > > Errm, if it can't get perf improvements multithreading, that makes it > rather pointless doesn't it? Isn't that the whole point of SMT in the > first place? Perhaps I'm just misunderstanding you. Aaargh, I had the HT patch from Ingo+Jun in mind but wrote about HT. Sorry, my mistake. I looked at the patch and wanted to compare HT with the old support (2.4.19) and HT with the shared runqueue. I don't expect advantages for the new HT patch when testing with many threads, it should be superior with few threads. Erich |