From: <bug...@fr...> - 2009-10-14 21:34:25
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 Summary: [KMS] KDE 4.3 causes lock up on RV620 (M82) Product: DRI Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: DRM/Radeon AssignedTo: dri...@li... ReportedBy: za...@gm... First I reported but #24467 but it was incorrect. I was successfully using openSUSE 11.1 (KDE 3.5.x) with KMS. Probably about 30 KDE 3.5 sessions without any lock up. Then I switched to openSUSE 11.2 milestone 6 (KDE 4.3.0) and run *one* KMS X session, it was fine. Next switched to 11.2 milestone 8 and experienced lock up. First I thought it was due to regressions in drm/radeon and even tried to trace it in bug #24467. But that was just matter of (un)luck. One KDE 4.3 sessions for 20 tries works fine. I even reinstalled milestone 6 and confirmed it's freezes. Also tried going back to commits tested with openSUSE 11.1 and no luck here. So finally, it seems KDE 4.3 (both .0 and .1) do some operation that locks up my machine. This is not likely to be Mesa's r600 related, as I removed this driver. Also I've tried starting plain X, playing movie with mplayer in it, starting "konsole" (KDE4's app), xclock and xeyes. It was working fine, no lock up. Then I started "kwin" (KDE4's window manager/decorator) and that causes lock up seconds after starting it. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-19 22:58:31
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #1 from Rafał Miłecki <za...@gm...> 2009-10-19 15:58:14 PST --- I've tried disabling Composite, DFS and UTS, but didn't help. Then I've hacked xf86-video-ati to disable Solid (return FALSE in R600PrepareSolid). No more lock ups with this hack (checked with 3 reboots). -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-19 23:14:58
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #2 from Alex Deucher <ag...@ya...> 2009-10-19 16:14:45 PST --- Created an attachment (id=30577) --> (http://bugs.freedesktop.org/attachment.cgi?id=30577) make vtx buffer 32k like non-kms Does this patch help at all? I suspect there's a problem with the default state and prepare state when we run out of vtx buffer space. This should match things up more closely with non-kms and help id the problem, but isn't a real solution. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-20 05:26:56
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #3 from Rafał Miłecki <za...@gm...> 2009-10-19 22:26:43 PST --- (In reply to comment #2) > Created an attachment (id=30577) --> (http://bugs.freedesktop.org/attachment.cgi?id=30577) [details] > make vtx buffer 32k like non-kms > > Does this patch help at all? I suspect there's a problem with the default > state and prepare state when we run out of vtx buffer space. It does not :/ -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-21 15:48:24
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #4 from Rafał Miłecki <za...@gm...> 2009-10-21 08:48:13 PST --- I rebooted once more day later and I got locks up again. Even with Solid disabled :| That's seems I got luck again when testing this in 3 reboots in row. So turning off solid does not fix issue. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-21 15:50:24
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #5 from Rafał Miłecki <za...@gm...> 2009-10-21 08:49:43 PST --- Created an attachment (id=30600) --> (http://bugs.freedesktop.org/attachment.cgi?id=30600) Dumps of ring in several lock up states I've made some dumps of rings using debugfs (and ssh). -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-21 15:56:26
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #6 from Rafał Miłecki <za...@gm...> 2009-10-21 08:55:27 PST --- The weird thing about some dumps is that ring seems to be... smaller than normally. AFAIK this never should happen. I copy head of each ring log from attached archive: ==> ring1.txt <== CP_STAT 0x80008241 CP_RB_WPTR 0x00000010 CP_RB_RPTR 0x00040790 1920 free dwords in ring 1920 dwords in ring r[1936]=0x0000002b r[1937]=0x80000000 r[1938]=0x80000000 ==> ring2.txt <== CP_STAT 0x80000241 CP_RB_WPTR 0x00000010 CP_RB_RPTR 0x000403d8 968 free dwords in ring 968 dwords in ring r[0984]=0x00000256 r[0985]=0x00000011 r[0986]=0xc0002a00 ==> ring3.txt <== CP_STAT 0x80000241 CP_RB_WPTR 0x00000010 CP_RB_RPTR 0x000400b8 168 free dwords in ring 168 dwords in ring r[0184]=0x00000256 r[0185]=0x00000011 r[0186]=0xc0002a00 ==> ring4.txt <== CP_STAT 0x800280c1 CP_RB_WPTR 0x00000010 CP_RB_RPTR 0x00040538 1320 free dwords in ring 1320 dwords in ring r[1336]=0xe400000c r[1337]=0x40240054 r[1338]=0x00000000 ==> ring5.txt <== CP_STAT 0x00000000 CP_RB_WPTR 0x00000000 CP_RB_RPTR 0x00000000 262128 free dwords in ring 0 dwords in ring r[0000]=0xc0023200 ==> ring6.txt <== CP_STAT 0x00000000 CP_RB_WPTR 0x00000000 CP_RB_RPTR 0x00000000 262128 free dwords in ring 0 dwords in ring r[0000]=0xc0023200 ==> ring7.txt <== CP_STAT 0x00000000 CP_RB_WPTR 0x00000000 CP_RB_RPTR 0x00000000 262128 free dwords in ring 0 dwords in ring r[0000]=0xc0023200 ==> ring8.txt <== CP_STAT 0xffffffff CP_RB_WPTR 0xffffffff CP_RB_RPTR 0xffffffff 262127 free dwords in ring 0 dwords in ring r[262143]=0x80000000 ==> ring.normal.txt <== CP_STAT 0x00000000 CP_RB_WPTR 0x00000530 CP_RB_RPTR 0x00000530 262144 free dwords in ring 0 dwords in ring r[1328]=0x24242c23 In first logs you can see decreased size of ring and equal amount of dwords in ring and free dwords. Next weird thing. I did not have time yet to check GRBM_STATUS register on GPU (as suggested by Alex). I was told to check it's value and search for it in "r600_reg* files in ddx and also r600_demo". Purpose is "to see what blocks are busy when it hangs". -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-22 11:42:35
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 Stefano Carignano <sca...@gm...> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sca...@gm... --- Comment #7 from Stefano Carignano <sca...@gm...> 2009-10-22 04:42:21 PST --- Out of curiosity, are you using a preemptive kernel ? Do you observe a change in the behavior/survival rate when enabling preempt ? As I mentioned in bug #24587, enabling a preemptive kernel for me reduces the chance of survival to ~ 5% (I won't say 0 because I managed to have a working session with it once, out of 10-15 tries) Also, does suspend/resume work for you ? Or you too upon resume get the same lockup after a few seconds ? Is there something I could do to help debugging this thing a bit? -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-28 20:00:23
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #8 from Rafał Miłecki <za...@gm...> 2009-10-28 13:00:09 PST --- (In reply to comment #7) > Out of curiosity, are you using a preemptive kernel ? Do you observe a change > in the behavior/survival rate when enabling preempt ? I do: > grep "PREEMPT" .config # CONFIG_PREEMPT_RCU is not set # CONFIG_PREEMPT_RCU_TRACE is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_DEBUG_PREEMPT is not set # CONFIG_PREEMPT_TRACER is not set didn't try without PREEMPT yet. Thanks for tip. > Also, does suspend/resume work for you ? Or you too upon resume get the same > lockup after a few seconds ? I didn't manage to have X running long enough to try that. > Is there something I could do to help debugging this thing a bit? That's probably question to developers (Alex maybe?)... Maybe you could dump value of mentioner register while lock up, no more ideas from me. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-28 21:43:25
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #9 from Rafał Miłecki <za...@gm...> 2009-10-28 14:43:10 PST --- Before lock up (in console and in X - the same): 0x8010: 0x00003030 0x8014: 0x00000003 While lock up: 0x8010: 0xA0003030 0x8014: 0x00000003 -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-28 22:36:43
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #10 from Rafał Miłecki <za...@gm...> 2009-10-28 15:36:30 PST --- Huh, noticed something new, I am sure we will solve that issue soon now :) In my case rdev->cp.ring_size is 1'048'576. That means 1'048'576/4=262'144 dwords. And... 262'144 == 0x40000! As you can see in ring info dumps lock up often happens when CP_RB_RPTR just over 0x00040000 (like 0x00040790). There are reasons why this may happen according to my current knowledge/understanding: 1) GPU thinks ring is bigged and keeps reading over 0x00040000 while it should not 2) Driver thinks ring is 0x00040000 while it's bigger. As the result we leave some ring space over 0x00040000 untouched (with garbages maybe) and start writing from 0x0 again. GPU sees it didn't achieve read pointer and tries to read more... from space over 0x00040000 which contains garbages -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-10-29 00:34:30
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #11 from Rafał Miłecki <za...@gm...> 2009-10-28 17:24:20 PST --- Another interesting thing: # cat r600_ring_info CP_STAT 0x00000000 CP_RB_WPTR 0x00000000 0x00000010 CP_RB_RPTR 0x00000000 0x00000000 262128 free dwords in ring 16 dwords in ring r[0000]=0xc0023200 r[0001]=0x101e1000 r[0002]=0x00000000 r[0003]=0x000002a0 r[0004]=0xc0016800 r[0005]=0x00000140 r[0006]=0x00002f10 r[0007]=0x80000000 r[0008]=0x80000000 r[0009]=0x80000000 r[0010]=0x80000000 r[0011]=0x80000000 r[0012]=0x80000000 r[0013]=0x80000000 r[0014]=0x80000000 r[0015]=0x80000000 r[0016]=0xc0016800 r[0017]=0x00000141 r[0018]=0xdeadbeef ... show I ever see 0xdeadbeef? Isn't this some test value of writeback of sth? -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-01 13:49:25
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #12 from Rafał Miłecki <za...@gm...> 2009-11-01 05:32:49 PST --- I can confirm lock up happens only when GPU reads commands about 0x00040000 position. Moreover when it do one round of reading (0x0 → 0x40000) without lock up, this won't lock up in any further reading of ~0x40000. Rendering is stable then. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 13:55:46
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #13 from Rafał Miłecki <za...@gm...> 2009-11-02 05:55:33 PST --- Created an attachment (id=30913) --> (http://bugs.freedesktop.org/attachment.cgi?id=30913) dmesg from clean drm-next -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 13:56:38
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #14 from Rafał Miłecki <za...@gm...> 2009-11-02 05:56:26 PST --- Created an attachment (id=30914) --> (http://bugs.freedesktop.org/attachment.cgi?id=30914) dmesg with glisse's patch applied I've applied http://people.freedesktop.org/~glisse/0001-r600-hack-to-test-ring-buffer.patch [drm] RPTR/WPTR before schedule 0x000000FE 0x000000FE [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-16). [drm:r600_init] *ERROR* radeon: failled testing IB (-16). -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 14:02:53
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #15 from Rafał Miłecki <za...@gm...> 2009-11-02 06:02:40 PST --- Created an attachment (id=30915) --> (http://bugs.freedesktop.org/attachment.cgi?id=30915) r600_ring_info from debugfs with glisse's patch -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 14:35:00
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #16 from Rafał Miłecki <za...@gm...> 2009-11-02 06:34:47 PST --- Created an attachment (id=30916) --> (http://bugs.freedesktop.org/attachment.cgi?id=30916) dmesg with hacked v2 of glisse's patch Added registers dumps on "radeon: fence wait failed". -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 16:17:51
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #17 from Rafał Miłecki <za...@gm...> 2009-11-02 08:17:38 PST --- Created an attachment (id=30919) --> (http://bugs.freedesktop.org/attachment.cgi?id=30919) dmesg with v2 of glisse's patch [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] 1530 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) [drm] 1534 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) mask: 0x000000FF [drm] 1282 0x08000F03 CP_RB_CNTL (1024 ring size 256 dw) [drm] ring test succeeded in 1 usecs [drm] radeon: ib pool ready. [drm] RPTR/WPTR before schedule 0x000000FE 0x000000FE phy0: Selected rate control algorithm 'iwl-agn-rs' [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-16). [drm] 1788 0x08000F03 CP_RB_CNTL (1024 ring size 256 dw) REGISTER: CP_STAT : 0x80000645 REGISTER: CP_RB_RPTR : 0x000001B0 REGISTER: CP_RB_WPTR : 0x00000005 REGISTER: CP_RB_CNTL : 0x08000F03 [drm:r600_init] *ERROR* radeon: failled testing IB (-16). -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 16:45:45
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #18 from Rafał Miłecki <za...@gm...> 2009-11-02 08:45:33 PST --- (In reply to comment #17) > Created an attachment (id=30919) --> (http://bugs.freedesktop.org/attachment.cgi?id=30919) [details] > dmesg with v2 of glisse's patch I've meant v3 here, sorry. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 17:18:21
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #19 from Rafał Miłecki <za...@gm...> 2009-11-02 09:18:04 PST --- Created an attachment (id=30920) --> (http://bugs.freedesktop.org/attachment.cgi?id=30920) dmesg with v4 of glisse's patch [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] 1530 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) [drm] 1534 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) mask: 0x000000FF [drm] 1282 0x08000F03 CP_RB_CNTL (1024 ring size 256 dw) [drm] ring test succeeded in 1 usecs [drm] radeon: ib pool ready. [drm] RPTR/WPTR before schedule 0x000000FE 0x000000FE [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-16). [drm] 1788 0x08000F03 CP_RB_CNTL (1024 ring size 256 dw) REGISTER: CP_STAT : 0x80000241 REGISTER: CP_RB_RPTR : 0x000001B0 REGISTER: CP_RB_WPTR : 0x00000005 REGISTER: CP_RB_CNTL : 0x08000F03 [drm:r600_init] *ERROR* radeon: failled testing IB (-16). -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 19:54:11
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #20 from Rafał Miłecki <za...@gm...> 2009-11-02 11:53:58 PST --- Created an attachment (id=30923) --> (http://bugs.freedesktop.org/attachment.cgi?id=30923) Alex's patch little modified I've modified http://www.botchco.com/alex/xorg/dump_rb_cntl_and_reset.diff a little. Result: [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] predicted: 0x08000911 [drm] actual: 0x08000f03 [drm] i'll write: (0x08000000 | 0x00000900 | 0x00000011) [drm] actual after writing: 0x08000911 [drm] actual after reset: 0x08000911 [drm] actual after CP start: 0x08000911 So we write value to CNTL and it looks alright when we read it back. But it seems the lacking part was CP reset. With patch applied it worked in 4 boots in a row. 2 warm, 2 cold. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 20:03:41
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #21 from Rafał Miłecki <za...@gm...> 2009-11-02 12:03:29 PST --- Created an attachment (id=30925) --> (http://bugs.freedesktop.org/attachment.cgi?id=30925) dmesg with v5 of glisse's patch [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] 1533 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) [drm] 1537 0x00000000 CP_RB_CNTL (1024 ring size 256 dw) mask: 0x000000FF [drm] 1271 0x08000F03 CP_RB_CNTL (1024 ring size 256 dw) rbbufsz 7 [drm] 1279 0x08000907 CP_RB_CNTL (1024 ring size 256 dw) [drm] ring test succeeded in 1 usecs [drm] radeon: ib pool ready. [drm] RPTR/WPTR before schedule 0x000000FE 0x000000FE -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 20:41:15
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #22 from Alex Deucher <ag...@ya...> 2009-11-02 12:41:03 PST --- Created an attachment (id=30926) --> (http://bugs.freedesktop.org/attachment.cgi?id=30926) don't RMW cp_rb_cntl As pointed out by Andre and IRC, we don't RMW CP_RB_CNTL in the non-kms path. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 21:13:12
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #23 from Rafał Miłecki <za...@gm...> 2009-11-02 13:12:59 PST --- (In reply to comment #22) > Created an attachment (id=30926) --> (http://bugs.freedesktop.org/attachment.cgi?id=30926) [details] > don't RMW cp_rb_cntl > > As pointed out by Andre and IRC, we don't RMW CP_RB_CNTL in the non-kms path. Seems to work, tested this in three boots. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2009-11-02 21:17:10
|
http://bugs.freedesktop.org/show_bug.cgi?id=24535 --- Comment #24 from Rafał Miłecki <za...@gm...> 2009-11-02 13:16:57 PST --- OK, so to sum up this. Reading CP_RB_CNTL just after writing it is a bad idea. Last patch from Alex avoids that and it works fine with it applied. I've also tried to put "mdelay(1)" after writing to CP_RB_CNTL and it worked as well. -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |