From: Andrew M. T. <ath...@au...> - 2001-05-09 16:29:10
|
Hello, I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a workload with Samba, and I wanted to get some feedback on results so far. I would appreciate comments and any suggestions for improving scalability on this workload. The environment consists of an Intel Profusion based SMP with 8 x 700 Mhz Xeon, 1 Mb L2, 14+ GB ram, ServeRAID, 8 Intel ethernet cards (IBM Netfinity 8500R). There are 16 500 Mhz PII, 128 MB clients running Windows NT. I tested for uniprocessor, 2-way, and 4-way SMP configurations. Future plans including testing 8-way performance when more test clients are available. Netbench(r) 7.01 was used with the enterprise disk suite test. The test was modified to use 2 engines per client, and the range of test clients was changed from 1-60 to 8-16 (for 2P & 4P) and 4-12 (for uniprocessor). My initial results for linux 2.4.0, ext2 are as follows: [UP] [2P] [4P] 08 149 12 199 16 227 236 260 # Eng 20 193 272 317 Mbps 24 223 283 369 28 285 396 32 285 405 Same test, but with IRQ to processor affinity for 2P & 4P on the 8 ethernet cards: [2P] [4P] 16 231 259 # Eng 20 278 297 24 293 320 Mbps 28 297 365 32 299 399* *Still investigating; we had some cpu idle time on the 4P/32 engines, but not on test configuration with out IRQ aff. And for linux 2.4.3 with reiserfs: [UP] [2P] [4P] 08 130 12 190 16 203 210 231 # Eng 20 190 235 279 24 200 249 319 Mbps 28 239 360 32 251 335 Same, but with IRQ affinity for 2P & 4P on the 8 ethernet cards: [2P] [4P] 16 224 236 # Eng 20 220 308 24 252 331 Mbps 28 269 375 32 267 382 --All results in Mbps, using Netbench(r) 7.0.1 and Samba 2.0.7 --Netbench(r) is available at http://www.netbench.com I would like to help improve SMP scalability on this workload. If you have questions or comments about the above results, or if you are conducting similar tests, please send email to lse...@li.... I have some ideas on my next steps, but would like to discuss first. Regards, Andrew Theurer |
From: Mike K. <mkr...@se...> - 2001-05-09 16:57:15
|
On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > workload with Samba, and I wanted to get some feedback on results so > far. Do you have any kernel profile or lock contention data? -- Mike Kravetz mkr...@se... IBM Linux Technology Center |
From: Andrew M. T. <ath...@au...> - 2001-05-09 17:30:23
|
I do have kernprof ACG and lockmeter for a 4P run. We saw no significant problems with lockmeter. csum_partial_copy_generic was the highest % in profile, at 4.34%. I'll see if we can get some space on http://lse.sourceforge.net to post the test data. Andrew Theurer Mike Kravetz wrote: > > On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > > > > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > > workload with Samba, and I wanted to get some feedback on results so > > far. > > Do you have any kernel profile or lock contention data? > > -- > Mike Kravetz mkr...@se... > IBM Linux Technology Center |
From: Alan C. <al...@lx...> - 2001-05-09 17:35:19
|
> significant problems with lockmeter. csum_partial_copy_generic was the > highest % in profile, at 4.34%. I'll see if we can get some space on Are you using Antons optimisations to samba to use sendfile ? Alan |
From: Andrew M. T. <ath...@au...> - 2001-05-09 17:43:53
|
Alan Cox wrote: > > > significant problems with lockmeter. csum_partial_copy_generic was the > > highest % in profile, at 4.34%. I'll see if we can get some space on > > Are you using Antons optimisations to samba to use sendfile ? > > Alan Not yet. As I understand it, we need a supported nic to take advantage of the sendfile/zero copy patch. Once we have the HW, we will use it. Thanks, Andrew Theurer |
From: Chris E. <ch...@sc...> - 2001-05-09 23:35:55
|
On Wed, 9 May 2001, Alan Cox wrote: > > significant problems with lockmeter. csum_partial_copy_generic was the > > highest % in profile, at 4.34%. I'll see if we can get some space on > > Are you using Antons optimisations to samba to use sendfile ? And you might like to try 2.4.4 (I saw 2.4.0 and 2.4.3 mentioned). 2.4.4 has the zerocopy TCP stuff (or was it 2.4.3 :) Also, if the load is not disk limited, you might like to try Mingo's pagecache/timers scalability patches. etc. Cheers Chris |
From: Christoph H. <hc...@ns...> - 2001-05-09 17:35:50
|
On Wed, May 09, 2001 at 12:30:35PM -0500, Andrew M. Theurer wrote: > I do have kernprof ACG and lockmeter for a 4P run. We saw no > significant problems with lockmeter. csum_partial_copy_generic was the > highest % in profile, at 4.34%. I'll see if we can get some space on > http://lse.sourceforge.net to post the test data. Maybe you should try Kernel 2.4.4 (with Zerocopy TCP/IP) and Anton's sendfile for samba patch. A copy of the latter was posted to lkml - see http://www.uwsg.indiana.edu/hypermail/linux/kernel/0101.3/0484.html, even if that maybe be unusable to due html crappieness. Christoph -- Of course it doesn't work. We've performed a software upgrade. |
From: Maneesh S. <sma...@se...> - 2001-05-10 04:48:36
|
On Wed, May 09, 2001 at 12:30:35PM -0500, Andrew M. Theurer wrote: > I do have kernprof ACG and lockmeter for a 4P run. We saw no > significant problems with lockmeter. csum_partial_copy_generic was the > highest % in profile, at 4.34%. I'll see if we can get some space on > http://lse.sourceforge.net to post the test data. > > Andrew Theurer > > Mike Kravetz wrote: > > > > On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > > > > > > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > > > workload with Samba, and I wanted to get some feedback on results so > > > far. > > > > Do you have any kernel profile or lock contention data? > > > > -- > > Mike Kravetz mkr...@se... > > IBM Linux Technology Center Hello Andrew, If in the kernprof data you find "fget" as one of the high rankers (say in top 10) then can you try the scalable FD management patch which uses read-copy-update mechanism for protecting files_struct. As of now there are working patches available for read-copy-update mechanism and FD management at "http://lse.sourceforge.net/locking/rclock.html" as rclock-2.4.2-01.patch and files_struct_rcu-2.4.2-03.patch but we are working on simpler interfaces. Also let me know if you need the patches for a different 2.4 kernel version. Maneesh -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. email: sma...@se... http://lse.sourceforge.net/locking/rclock.html |
From: Dipankar S. <dip...@se...> - 2001-05-10 08:39:21
|
Hello Andrew, You would need contact one of the administrators of the LSE project for this. You would need a developer id for uploading. You can get all the information from http://sourceforge.net/projects/lse/. I think it will be very helpful to have the results including lockmeter and kernprof data available in lse.sourceforge.net. Thanks Dipankar -- Dipankar Sarma <dip...@se...> Project: http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India. On Wed, May 09, 2001 at 12:30:35PM -0500, Andrew M. Theurer wrote: > I do have kernprof ACG and lockmeter for a 4P run. We saw no > significant problems with lockmeter. csum_partial_copy_generic was the > highest % in profile, at 4.34%. I'll see if we can get some space on > http://lse.sourceforge.net to post the test data. > > Andrew Theurer > > Mike Kravetz wrote: > > > > On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > > > > > > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > > > workload with Samba, and I wanted to get some feedback on results so > > > far. > > > > Do you have any kernel profile or lock contention data? > > > > -- > > Mike Kravetz mkr...@se... > > IBM Linux Technology Center > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > http://lists.sourceforge.net/lists/listinfo/lse-tech |
From: David Collier-B. <da...@ca...> - 2001-05-11 15:19:52
|
On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > workload with Samba, and I wanted to get some feedback on results so > far. Also consider using Andrew Tridgell's dbench/tbench/smbtorture suite in this process: it is mathmeatically comparable to NetBench, runs on smaller numbers of load-generationg machines, and can give better breakdowns into the disk component, then network component and the on-server component of the available performance. I also have some results from SPARC Linux: send me email. --dave -- David Collier-Brown, | Always do right. This will gratify Performance & Engineering Team | some people and astonish the rest. Americas Customer Engineering | -- Mark Twain (905) 415-2849 | da...@ca... |
From: Kenichi O. <oku...@dd...> - 2001-05-10 01:27:17
|
>>>>> "AMT" == Andrew M Theurer <ath...@au...> writes: AMT> I would like to help improve SMP scalability on this workload. If you AMT> have questions or comments about the above results, or if you are AMT> conducting similar tests, please send email to AMT> lse...@li.... I have some ideas on my next steps, AMT> but would like to discuss first. Did you check vmstat result of each benchmarks? Most of the problems are caused due to kernel. If you look at result of vmstat, more than 80% CPU time are used in kernel. It's true that heavy kernel overhead is due to Samba, and is due to Samba generating lot's and lot's of request against kernels ( not only disk IO, but it requires many signal handling etc ). So, there's really two things we need to do. 1) make Linux more scalable. ( This sometimes seems as if it's tuning, but it's really bug fix. So, don't ask performance team to tune. Let them FIX. ) 2) make Samba work in less signals. This means, don't call useless system calls, use shared memory more effectively, divide Samba source into OS dependent part and independent part so that you can do tuning for specific OS and still have wide userland, etc. ---- Kenichi Okuyama. |