From: Dave O. <dm...@os...> - 2002-08-06 21:12:13
|
This is a summary of where I think we are with the DBT benchmark work on an 8-way system. I'd appreciate any suggestions on other approaches, or ways to get interesting results more quickly. Our goal isn't to necessarily produce high DBT numbers: large numbers of DBT transactions per second for example. Instead, it's to construct an environment where the kernel can be studied doing "interesting" things. Things of interest would include system panics, or kernel algorithms that aren't scaling well. I'm looking at some of the results we've been getting on an 8-proc system running the SAP database. So far, the results are not as interesting as we'd like, mostly because the database we're running is too small. I think SAP is basically loading the database into its cache. After that, the system runs pretty much just in user mode, with less than 5% kernel. We haven't profiled this environment yet. That might be interesting. But I don't THINK it's the priority right now. We're trying to maintain a stable environment while we learn to grow the benchmark. So in the shor term, we're running 2.4.18. We've learned about some limitations in the client simulation software that limit it to a maximum of 800 users. Eventually, we probably need to go back and fix the software. But in the mean time, we're adding more client machines to simulate more users. There are other configuration issues, such as the number of database connections, and so on, that we're learning to adjust. Once we get to a number of users that we're using pretty much all our cpus time, we intend to grow the database. I would expect this to exercise the kernel more, doing I/O and paging. This is when results should be more interesting. At that point, I think the 2.4 kernels are less interesting than the 2.5. I'd like us to begin doing benchmark runs with the latest 2.5.30 kernel, and then begin adding patches of interest to lse-tech or others on lkml. We could initially produce sar statistics. But, I think doing kernel profile and lock meter runs on these would be valuable as well. I know there is cost to doing kernel profiling and lockmeter data collection. Does it make sense to COMBINE lockmeter and profiling in one kernel? I suspect it doesn't. Likewise how much does profiling or lockmeter data collection corrupt the sar data? Since it takes about half a day to do one benchmark run, it would be good if we could run a profiled kernel right away, and still be able to trust the sar data. There's also a kernel profiling patch from SGI. It does more than the built-in profiling. It recompiles the kernel to do mcount data collection, and so on. This is more costly than the native profiling. How much value is there in this extra data? We also have requests into the SAP mail list asking for more information about how SAP works. It's obviously caching database information. We'd like to know how it manages its cache, and what limitations there are on its caching implementation. We're also looking into how it strips I/O. Let me know what you think! Dave |