Re: [perfmon2] Problem with signaling user code from a PMU interrupt handler.
Status: Beta
Brought to you by:
seranian
From: Philip M. <mu...@cs...> - 2008-05-13 08:47:42
|
Hi folks, I'd like to keep the option of wrapping Perfmon's spinlocks, especially those that do irqsave/restore. Why? Because it could allow the implementation of NMI's on systems that don't support them in hardware. Phil On May 13, 2008, at 2:41 AM, Corey J Ashford wrote: > Hi Stephane, > > I tried out changing the wrapper for the PMU interrupt from > STD_EXCEPTION_PSERIES to MASKABLE_EXCEPTION_PSERIES then ran the > test which usually crashed within a minute or two, and it ran for 20 > minutes non-stop. Then I tried removing POWER's defines for > pfm_spin_lock_irqsave (and so on), and that worked as expected - no > problems. > > After seeing all of the complications working around the issue of > the wrapper used for the PMU interrupt, I'm about 95% convinced that > we should go with an #ifdef for CONFIG_PERFMON to use the > MASKABLE_EXCEPTION_PSERIES wrapper, and then remove POWER's > definitions for the locking macros. Optionally, we could get rid of > the pfm_spin* macros as well from perfmon, simplifying the code > (back to what it was originally). > > The only downside of this is that if someone configures their kernel > with perfmon and then uses Oprofile, they will not get any samples > during interrupt disabled code in the kernel. > > Any thoughts? > > - Corey > > Corey Ashford > Software Engineer > IBM Linux Technology Center, Linux Toolchain > Beaverton, OR > 503-578-3507 > cja...@us... > > per...@li... wrote on 05/12/2008 > 03:34:16 PM: > > > Hi Stephane, > > > > Thanks for your response. > > > > I was thinking of trying out a kernel without the change, but I have > > a strong suspicion that we are in the interrupt handler because of > > the dreaded "soft interrupt disabling" optimization in POWER. > > > > I'm going to start looking at the schedule code, but I'm afraid we > > might be in a bind here, because the interrupt disable call is in > > common code, rather than in perfmon code where we could change it to > > a hard disable. > > > > I may have to start pushing for the PMU interrupt to use the > > MASKABLE_EXCEPTION_PSERIES wrapper, perhaps only when > CONFIG_PERFMON is true. > > > > Regards, > > > > - Corey > > > > Corey Ashford > > Software Engineer > > IBM Linux Technology Center, Linux Toolchain > > Beaverton, OR > > 503-578-3507 > > cja...@us... > > > > "stephane eranian" <er...@go...> wrote on 05/12/2008 > 01:58:23 PM: > > > > > Corey, > > > > > > It looks like you have an interrupt masking issue. You should > not get > > > into the interrupt handler > > > while executing in schedule. Do this happen when you do get into a > > > resend_irq situation or > > > when you're not? > > > > > > > > > > > > On Sat, May 10, 2008 at 1:55 AM, Corey Ashford <cja...@us... > > wrote: > > > > Hello Stephane, > > > > > > > > While trying to test my implementation of resend_irq for > POWER, I ran > > > > into a perfmon2 problem, I think. > > > > > > > > In order to increase the number of PMU interrupts I'm getting, > I decided > > > > to start with a PAPI C test case "first.c" and modify it so > that it > > > > records both user and kernel domain, and adds a call to > PAPI_overflow to > > > > set the threshold on PAPI_TOT_CYC to 1 million so that I'd get > about > > > > 2000 PMU interrupts per second (2 GHz processor) just as a > starting point. > > > > > > > > The overflow handler passed to PAPI_overflow() does nothing > other than > > > > print a count of overflows received every 1000 counts. > > > > > > > > Well, I can run this test case to completion sometimes, but > fairly often > > > > it will hang in the kernel with a stack trace similar to this: > > > > > > > > 3:mon> c0 > > > > 0:mon> t > > > > [c00000002e32eef0] c0000000004b18d4 ._spin_lock+0x5c/0x88 > > > > [c00000002e32ef70] c00000000004991c .task_rq_lock+0x68/0xcc > > > > [c00000002e32f010] c000000000049b48 .try_to_wake_up+0x40/0x1c0 > > > > [c00000002e32f0d0] c000000000062758 .signal_wake_up+0x48/0x74 > > > > [c00000002e32f160] c000000000062a88 .__group_send_sig_info > +0xa8/0xcc > > > > [c00000002e32f200] c0000000000631fc .group_send_sig_info > +0x64/0xa0 > > > > [c00000002e32f2b0] c0000000000eca24 .send_sigio+0x124/0x1f0 > > > > [c00000002e32f3f0] c0000000000ecb5c .__kill_fasync+0x6c/0xa0 > > > > [c00000002e32f480] c0000000000ecbec .kill_fasync+0x5c/0x94 > > > > [c00000002e32f520] c00000000024d6c4 .pfm_notify_user+0xd4/0xf0 > > > > [c00000002e32f5a0] c00000000024d850 .pfm_ovfl_notify+0x170/0x198 > > > > [c00000002e32f640] c00000000024b314 .pfm_interrupt_handler > +0xbec/0xf9c > > > > [c00000002e32f7b0] d0000000000e82f4 .pfm_power5_irq_handler > +0x40/0x80 > > > > [perfmon_power5] > > > > [c00000002e32f840] c000000000043dbc .powerpc_irq_handler > +0x60/0x78 > > > > [c00000002e32f8c0] c000000000023488 . > > performance_monitor_exception+0x38/0x50 > > > > [c00000002e32f940] c000000000003d80 performance_monitor_common > +0x100/0x180 > > > > --- Exception: f00 (Performance Monitor) at c000000000045e38 > > > > .__enqueue_entity+0 > > > > x3c/0xb8 > > > > [c00000002e32fcb0] c00000000004e5e8 .put_prev_task_fair > +0x74/0x98 > > > > [c00000002e32fd40] c0000000004af708 .schedule+0x46c/0x768 > > > > [c00000002e32fe30] c000000000008c54 do_work+0x14/0x34 > > > > --- Exception: 901 (Decrementer) at 0000000010001ff8 > > > > SP (ffffd320) is in userspace > > > > > > > > > > > > So what we have here is that spin_lock is getting called in > the context > > > > of schedule(). That doesn't seem good to me, but I'm am not > wise enough > > > > in the ways of the Linux kernel. Do you think this should > work correctly? > > > > > > > > To make forward progress on resend_irq, I'm going to switch > away from > > > > using an overflow handler to a sampler test case, so this > shouldn't stop > > > > my progress, it will just slow me down a bit. > > > > > > > > Regards, > > > > > > > > - Corey > > > > > > > > -- > > > > Corey Ashford > > > > Software Engineer > > > > IBM Linux Technology Center, Linux Toolchain > > > > Beaverton, OR > > > > 503-578-3507 > > > > cja...@us... > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > This SF.net email is sponsored by the 2008 JavaOne(SM) > Conference > > > > Don't miss this year's exciting event. There's still time to > save $100. > > > > Use priority code J8TL2D2. > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java. > > > sun.com/javaone > > > > _______________________________________________ > > > > perfmon2-devel mailing list > > > > per...@li... > > > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > perfmon2-devel mailing list > > per...@li... > > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________ > perfmon2-devel mailing list > per...@li... > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel |