From: Rick L. <ric...@us...> - 2001-11-16 22:22:45
|
I wrote: why I asked the question. Doing e) almost certainly puts it into the 2.5 timespace, but not 100% certainly, I suppose. Before I dig too deep into some test patches I thought I'd test the waters among the folks here in LSE. Matthew replied: Er, hello? 2.4 is suppoed to be being STABILISED, not being REWRITTEN. If more people were concerned with this, we might be almost finished with 2.5 by now. I think we're on the same side here. What I was saying that choice "e" (a rewrite) almost certainly put us into a 2.5 timeframe while acknowledging it might be possible for one of us to have an insight that redesigns it yet makes it obviously simple and consummately safe and highly desirable to put into 2.4. For the record, if a rewrite is needed, I'm not holding my breath for a 2.4-friendly epiphany, and I'm actually hoping to find something that can safely go into 2.4. A 2.5-only solution, while better than none, will not help anybody in the near future. And I'm perfectly willing to acknowledge that 2.4 and 2.5 solutions might look extremely different, given that 2.4 is trying to stablize. In the meantime, I've gotten some useful information back (thanks all) and Matthew, it sounds like you'd be a good person to consult with on this (if you don't mind.) Rick |
From: Gerrit H. <ge...@us...> - 2001-11-17 00:36:04
|
We are primarily using benchmarks to highlight kernel internal bottlenecks. However, in many cases, we have to compare the overall benchmark results with control data to show that we are getting reaosnable results on comparable configurations. So, short answer, yes, the overall results are interesting. This is being done through the Austin LTC Performance Team, Rick Lindsley (ne...@be...) is helping them with the current bottleneck, Andrew Theurer is actually running the benchmark. This testing is happening on IA32 4-proc machines. I'm not sure if it is being done on 8-proc machines as yet. The ultimate goal is not the best possible number in the world, but that would be a nice side effect. ;) Also, we would take application performance patches as well in most cases. The goal is to put the most pressure on the machine that we can, and fix the bottlenecks that show up. gerrit > Hi Gerrit, > > > Good input, Anton - thanks. We'll see if we can recompile and > > test that way. Of course, that will move the bottleneck somewhere > > else, but that is fine. <grin> > > I forgot to mention, if this isnt sparc, ppc, intel or mips then you > will need to write a small spinlock stub in source/tdb/spinlock.c. > > I'd be interested in the benchmark results you come up with. Are you > interested only in kernel performance under load or the benchmark result > as well? I have some old samba performance patches I never merged that > I can pass on. > > Anton > > |
From: Rick L. <ric...@us...> - 2001-11-17 01:57:51
|
I'd be interested in the benchmark results you come up with. Are you interested only in kernel performance under load or the benchmark result as well? I have some old samba performance patches I never merged that I can pass on. Thanks, Anton. As gerrit mentioned, we've got a performance group running this and I'm doing analysis (and proposing patches :) as a result of their data. Before we began this discussion, we agreed to try it without the fcntl locks to see what difference we get. I'd expect to see the BKL drop way off, but have no idea about what to expect about overall performance. Matthew, it sounds like you are tied into the 2.5 effort as well. Could I discuss with you, offlist perhaps, what is planned there? While I can't probably contribute to that over the long haul, I'd like to at least understand some of what is being proposed there. And lastly, we started to diverge into a discussion of "real-world" benchmarks and I'd like to add: everybody has a different reality, and thus (as the linux-kernel mailing list, among others, has proven) it's nearly impossible to come to consensus on a "real-world" benchmark. That's one reason I tried to downplay the benchmark and emphasize the use of the BKL here. I think everybody agrees (and did agree in this discussion) that it's used inappropriately here. The discussion then can turn to whether to replace it and what to replace it with, both short term and long term. <philosophy> If you work for a company, "real-world" is what your customers want (for whatever irrational reasons.) If you're free-lancing with Linux then I envy you!, and "real-world" is whatever you've concluded is important (for whatever irrational reasons :) After spending months trying to define "real-world" I've realized that I could have spent that time better simply fixing bugs, and am content to let others define the term. </philosophy> Rick |
From: Rick L. <ric...@us...> - 2001-11-17 02:18:10
|
This is being done through the Austin LTC Performance Team, Rick Lindsley (ne...@be...) is helping them with the current bottleneck, Andrew Theurer is actually running the benchmark. Please note that this address is incorrect to anybody outside of IBM. In order to keep my blood pressure down, let's just summarize by saying the email system here is complex. The best external address for me is ric...@us...; second best is ri...@ea.... Rick |
From: Rick L. <ri...@ea...> - 2001-11-17 08:34:32
|
The real solution to your problem is not to rearchitect the locking system but to compile samba with the --spinlocks which replace the fcntl locks with userspace spinlocks. And I believe (please correct me if I'm wrong) that this would not benefit from a per-inode spinlock rather than a global spinlock, since all the locking is on a single file. Correct. For this reason, in parallel with the discussion going on here, I'd asked our tester more details about the contention they were seeing. If contention of kernel_flag is because of multiple processes banging on a single file rather than multiple files, then even the per-inode lock would not help. So the three tracks I have going in parallel are: * lse-tech: what issues are there with changing inode structure? * benchmark: describe better the contention you are seeing * benchmark: can you lose the fcntl locks and solve your own problem? Despite the concern expressed by some, I don't have any patches yet, and won't until I feel I have a strong solution. I am trying to discern the nature of the problem first, while also collecting information on possible solutions. Call it "speculative execution" :) Rick |
From: Paul M. <Pau...@us...> - 2001-11-17 21:26:58
|
> On Fri, Nov 16, 2001 at 03:45:11PM -0800, Gerrit Huizenga wrote: > > > > Aha, the clusters argument. Good for some solutions, terrible for > > most real-world databases. You can argue for them but over the past > > You're talking about traditional clusters, not ccClusters. See > Larry McVoy's papers on the subject. I found the following: http://www.bitmover.com/talks/llnl/slide01.html and friends. Could you please send pointers to any others you might have? Thanx, Paul |
From: Matthew W. <wi...@de...> - 2001-11-18 00:05:19
|
On Sat, Nov 17, 2001 at 01:24:37PM -0800, Paul McKenney wrote: > > > On Fri, Nov 16, 2001 at 03:45:11PM -0800, Gerrit Huizenga wrote: > > > > > > Aha, the clusters argument. Good for some solutions, terrible for > > > most real-world databases. You can argue for them but over the past > > > > You're talking about traditional clusters, not ccClusters. See > > Larry McVoy's papers on the subject. > > I found the following: > > http://www.bitmover.com/talks/llnl/slide01.html > > and friends. > > Could you please send pointers to any others you might have? http://www.bitmover.com/talks/cliq/slide01.html is a more recent talk. http://www.uwsg.indiana.edu/hypermail/linux/kernel/0001.2/1172.html is a discussion on l-k whch references some more stuff. http://www.uwsg.indiana.edu/hypermail/linux/kernel/0001.3/0236.html seems to be a continuation of the previous discussion. http://www.uwsg.indiana.edu/hypermail/linux/kernel/0007.3/1222.html is Linus seeming to indicate support for ccCluster. Anyone else have any good links / bits of documentation? -- Revolutions do not require corporate support. |
From: Paul M. <Pau...@us...> - 2001-11-19 19:31:52
|
> On Sat, Nov 17, 2001 at 01:24:37PM -0800, Paul McKenney wrote: > > > > > On Fri, Nov 16, 2001 at 03:45:11PM -0800, Gerrit Huizenga wrote: > > > > > > > > Aha, the clusters argument. Good for some solutions, terrible for > > > > most real-world databases. You can argue for them but over the past > > > > > > You're talking about traditional clusters, not ccClusters. See > > > Larry McVoy's papers on the subject. > > > > I found the following: > > > > http://www.bitmover.com/talks/llnl/slide01.html > > > > and friends. > > > > Could you please send pointers to any others you might have? Thank you for the pointers! > http://www.bitmover.com/talks/cliq/slide01.html is a more recent talk. OK, about 1.5 years old, so a bit more recent. > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0001.2/1172.html is > a discussion on l-k whch references some more stuff. > > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0001.3/0236.html seems > to be a continuation of the previous discussion. INHO, the devil will be in the details on this -- keep in mind that normal SMP locking looks quite simple if you don't look too closely. It would be very interesting to see a prototype, particularly of the various networking protocol stacks. > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0007.3/1222.html is Linus > seeming to indicate support for ccCluster. I interpreted this as Linus saying "let's try SMP, NUMA, and ccClusters and use whatever works best in whatever area that it works best". Other views from people with more skill interpreting Linus? Thanx, Paul > Anyone else have any good links / bits of documentation? |
From: Andrew T. <hab...@us...> - 2001-11-20 15:33:09
|
OK, I ran NetBench on ext3 with tdb spinlocks, and Anton was correct, this reduced contention on kernel_flag from posix_lock_file(). With fcntl, posix_lock_file() was responsible for 4.9% spin on kernel_flag. With user space spinlocks, posix_lock_file() dropped to 1.2% on kernel_flag. Note, total spin time dropped from 16.8% only to 14.8%, and throughput increased slightly from 628 Mbps to 631 Mbps. I also did not observe any significant difference in idle time. Andrew Theurer |