From: Carlos M. <car...@gm...> - 2005-08-29 20:48:37
|
Hi, I was testing the driver with USB and was seeing some really strange behaviour. It took some time but I finally managed to see (more or less) what's going on. If I load acx-20050818, no problems, but when I load 20050821, things start to go wrong. If it uploads the firmware and then fails allocatig DMA memory. Somthing to the effect of acx: init memory FAILED acx: acx100_init_memory_pools FAILED wlan0: acx100_create_dma_regions FAILED Now, if it doesn't need to upload the firmware[0] there are no errors, however, doing ip l produces a "Cannot handle kernel pointer at address x" message. This might have to do with the fact that the driver doesn't seem to bother unregistering the netdev, so I might end up with wlan3, when the other three (wlan0, wlan1 and wlan2) should no longer be there. I'll try to get debug logs posted somewhere tomorrow. [0] This happens when the module is unloaded and then loaded again. As long as the USB adapter stays plugged-in it will maintain the firmware in memory and report itself booted. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Carlos M. <car...@gm...> - 2005-08-30 08:43:59
Attachments:
dmesg.20050818.bz2
dmesg.20050821.bz2
|
OK, here are the logs. I tested them one after the other and they both failed as described in the other mail. In the first run of 20050821, I forgot to turn full debug on, so you don't see much detail. Tell me if you need more. The second run gets a bit further and fails at ctlReadMemoryMap. Notice how it tries to allocate wlan1, when 20050818 allocated wlan0 and should have released it. When I did "ip link", the error you see at the end happened. The box then locked completely when I tried to ssh into it to get the logs. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-08-30 08:58:21
|
On Tuesday 30 August 2005 11:43, Carlos Martin wrote: > OK, here are the logs. I tested them one after the other and they both > failed as described in the other mail. > > In the first run of 20050821, I forgot to turn full debug on, so you > don't see much detail. Tell me if you need more. The second run gets a > bit further and fails at ctlReadMemoryMap. > > Notice how it tries to allocate wlan1, when 20050818 allocated wlan0 > and should have released it. When I did "ip link", the error you see > at the end happened. The box then locked completely when I tried to > ssh into it to get the logs. Thanks! Does acx-20050818.tar.bz2 work? Does acx-20050816.tar.bz2 work? Note that there is a acx-20050816_split.tar.bz2 tarball, which contains changes from 16 to 18 step-by-step. Try them if 16 works for you and 18 doesn't. -- vda |
From: Carlos M. <car...@gm...> - 2005-08-30 09:04:38
|
On 30/08/05, Denis Vlasenko <vd...@il...> wrote: > Does acx-20050818.tar.bz2 work? Yes, that's the last one that works. That's the first log. However, it couldn't find the signal. I'm not surprised however, Windows itself reports low signal with the one I'm trying to connect. > Does acx-20050816.tar.bz2 work? Will try. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-08-30 09:10:12
|
On Tuesday 30 August 2005 12:04, Carlos Martin wrote: > On 30/08/05, Denis Vlasenko <vd...@il...> wrote: > > Does acx-20050818.tar.bz2 work? > Yes, that's the last one that works. That's the first log. However, > it couldn't find the signal. Do you mean "it doesn't crash or fail to load the module"? (because without actual traffic it's a bit hard to know for sure that it works) Did older versions manage to find the signal? > I'm not surprised however, Windows itself > reports low signal with the one I'm trying to connect. > > Does acx-20050816.tar.bz2 work? > Will try. -- vda |
From: Carlos M. <car...@gm...> - 2005-08-30 09:19:18
|
Sent to list. Damn, I gotta remember to reply to all... On 30/08/05, Denis Vlasenko <vd...@il...> wrote: > On Tuesday 30 August 2005 12:04, Carlos Martin wrote: > > On 30/08/05, Denis Vlasenko <vd...@il...> wrote: > > > Does acx-20050818.tar.bz2 work? > > Yes, that's the last one that works. That's the first log. However, > > it couldn't find the signal. > > Do you mean "it doesn't crash or fail to load the module"? > (because without actual traffic it's a bit hard to > know for sure that it works) Well, yes, that's what I'm testing at the moment. In a few minutes I should have results of some actual data passing through (or lack of data passing through) > > Did older versions manage to find the signal? I don't think I've ever found that signal. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Carlos M. <car...@gm...> - 2005-08-30 11:37:48
Attachments:
dmesg.20050816.bz2
|
Log of 20050816. Careful, uncompressed it's 6.1M. Mostly of the same over and over and over again. This filled the monitor and left the computer unresponsive. The lights on the keyboard wouldn't light up. Suddently it stopped and I couldn't type anything. The cursor wasn't even blinking. I could however ssh into it, but when I tried to unload it, it completely locked up. I'll see when I can do more tests. My brother has taken possesion of the USB device and won't let go. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-08-30 13:45:30
Attachments:
HOWTO_find_oops.txt
|
On Tuesday 30 August 2005 14:37, Carlos Martin wrote: > I'll see when I can do more tests. My brother has taken possesion of > the USB device and won't let go. 8) > This filled the monitor and left the computer unresponsive. The lights > on the keyboard wouldn't light up. Suddently it stopped and I couldn't > type anything. The cursor wasn't even blinking. I could however ssh > into it, but when I tried to unload it, it completely locked up. Probably irrelevant, but: > Aug 30 13:24:45 localhost kernel: firmware size: 42324 bytes This is fw 1.0.7-USB. Newer fw 1.0.9-USB exists, and you seem to use it for acx20050821: >firmware size: 42344 bytes This adds another variable to the testing. Let's stick to 1.0.9, ok? About 20050816 oops: acx_l_transmit_authen1(wlandevice_t *priv) { ... acxlog(L_ASSOC, "Sending authentication1 request, " "awaiting response!\n"); tx_desc = acx_l_get_tx_desc(priv); if (!tx_desc) { FN_EXIT0; return NOT_OK; } hostdesc = tx_desc->fixed_size.s.host_desc; Aug 30 13:25:41 localhost kernel: ffff06be ==> acx_l_transmit_authen1 Aug 30 13:25:41 localhost kernel: Sending authentication1 request, awaiting response! Aug 30 13:25:41 localhost kernel: ffff06be ==> acx_l_get_tx_desc Aug 30 13:25:41 localhost kernel: tx: got desc 0, 15 remain Aug 30 13:25:41 localhost kernel: ffff06be <== acx_l_get_tx_desc And now... head = (void*)hostdesc->data; BOOM! I suspect hostdesc is NULL here. Use attached HOWTO and check this theory please. Aug 30 13:25:41 localhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Aug 30 13:25:41 localhost kernel: printing eip: Aug 30 13:25:41 localhost kernel: d8f5bbf7 Aug 30 13:25:41 localhost kernel: *pde = 00000000 Aug 30 13:25:41 localhost kernel: Oops: 0002 [#1] Aug 30 13:25:41 localhost kernel: Modules linked in: acx_usb ipv6 rivafb vgastate 8139too parport_pc parport Aug 30 13:25:41 localhost kernel: CPU: 0 Aug 30 13:25:41 localhost kernel: EIP: 0060:[pg0+412732407/1067709440] Not tainted VLI Aug 30 13:25:41 localhost kernel: EFLAGS: 00010086 (2.6.10) Aug 30 13:25:41 localhost kernel: EIP is at acx_l_transmit_authen1+0x47/0x150 [acx_usb] Aug 30 13:25:41 localhost kernel: eax: d71cd400 ebx: d4dd0220 ecx: d71cdc00 edx: 00000000 Aug 30 13:25:41 localhost kernel: esi: d4dd0220 edi: d4dd0000 ebp: d71cd400 esp: d7f99e88 Aug 30 13:25:41 localhost kernel: ds: 007b es: 007b ss: 0068 Aug 30 13:25:41 localhost kernel: Process events/0 (pid: 3, threadinfo=d7f98000 task=d7fc1000) Aug 30 13:25:41 localhost kernel: Stack: d8f6d0a0 d4dd02a4 d4dd0220 d4dd025c 00000007 d8f5c9bf d8f6d224 d4dd02db Aug 30 13:25:41 localhost kernel: d4dd0234 0000001f 00000015 00000000 00000004 80000300 d8f61227 d8f71c15 Aug 30 13:25:41 localhost kernel: d4dd1b00 d4dd0234 d4dd0226 09000bb8 00000000 00000000 00000020 00010000 Aug 30 13:25:41 localhost kernel: Call Trace: Aug 30 13:25:41 localhost kernel: [pg0+412735935/1067709440] acx_s_complete_scan+0x3ff/0x870 [acx_usb] Aug 30 13:25:41 localhost kernel: [pg0+412754471/1067709440] acx_s_issue_cmd_timeo_debug+0x357/0x420 [acx_usb] Aug 30 13:25:41 localhost kernel: [pg0+412783632/1067709440] acx_e_after_interrupt_task+0x0/0x280 [acx_usb] Aug 30 13:25:41 localhost kernel: [pg0+412784223/1067709440] acx_e_after_interrupt_task+0x24f/0x280 [acx_usb] Aug 30 13:25:41 localhost kernel: [worker_thread+512/1072] worker_thread+0x200/0x430 Aug 30 13:25:41 localhost kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Aug 30 13:25:41 localhost kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Aug 30 13:25:41 localhost kernel: [worker_thread+0/1072] worker_thread+0x0/0x430 Aug 30 13:25:41 localhost kernel: [kthread+149/208] kthread+0x95/0xd0 Aug 30 13:25:42 localhost kernel: [kthread+0/208] kthread+0x0/0xd0 Aug 30 13:25:42 localhost kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 Aug 30 13:25:42 localhost kernel: Code: 0f 85 00 01 00 00 a8 08 0f 85 d9 00 00 00 89 f8 e8 4f f4 00 00 85 c0 89 c5 0f 84 a6 00 00 00 8b 48 18 8d 9f 20 02 00 00 8b 51 18 <66> c7 0 >If I load acx-20050818, no problems, but when I load 20050821, things >start to go wrong If above is true, you can skip investigating 0816, because there are massive cleanups in USB code after it anyway. Please save the HOWTO though, you may need to decode an OOPS later. I think a good starting point would be a pair of debug=0xffff logs of simple "modprobe acx_usb", one with 20050818 abnd another with 20050821. Reading your log of 20050821. It start to fail... wlan1: warning: command returned status 11 acx100_s_init_wep: writing WEP options ...here: wlan1: warning: command returned status 14 usb 1-1: khubd timed out on ep0out acx: init memory FAILED acx: acx100_init_memory_pools FAILED wlan1: acx100_create_dma_regions FAILED ... But this log is not verbose enough. Thanks! -- vda |
From: Carlos M. <car...@gm...> - 2005-08-30 14:59:23
|
On 30/08/05, Denis Vlasenko <vd...@il...> wrote: > This adds another variable to the testing. Let's stick > to 1.0.9, ok? OK, I tried the older firmware to see if it fixed something, but it's probably best to stick to 1.0.9 > >If I load acx-20050818, no problems, but when I load 20050821, things > >start to go wrong >=20 > If above is true, you can skip investigating 0816, because > there are massive cleanups in USB code after it anyway. > Please save the HOWTO though, you may need to decode an OOPS later. OK. I'll see where this OOPS happens and continue testing with 0818. >=20 > I think a good starting point would be a pair of debug=3D0xffff > logs of simple "modprobe acx_usb", one with 20050818 > abnd another with 20050821. >=20 > Reading your log of 20050821. It start to fail... >=20 > wlan1: warning: command returned status 11 > acx100_s_init_wep: writing WEP options >=20 > ...here: >=20 > wlan1: warning: command returned status 14 > usb 1-1: khubd timed out on ep0out > acx: init memory FAILED > acx: acx100_init_memory_pools FAILED > wlan1: acx100_create_dma_regions FAILED > ... >=20 > But this log is not verbose enough. I'll get one debug=3D0xffff of that. Actually, I've just noticed, my 0816 and my 0818 tarballs are the same. The last ChangeLog entry is for 08016 so I'll redownload 0818 and report with that one. Every mention of 0818 before this line refers to 0816. /me is confused. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Carlos M. <car...@gm...> - 2005-08-30 21:37:28
|
Right, the 0816 and 0818 tarball mixup is now officially Your Fault(tm). They are exactly the same. You forgot to patch it with the 0816_split tarball. I've tested the split tarball, and can say it breaks at 0816.4. Also, there is a delay (a second or so) after ACX100_CMD_INIT_MEMORY when it fails. In all other cases, the logs just fly on the console. Attached are the logs of doing `insmod acx_usb.ko debug=3D0xffff` on the different versions in the 0816_split tarball. You probably don't need all of them, but it can't hurt (much) to send them. Just so you know 20050816.2 doesn't compile. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-08-31 06:45:13
|
On Wednesday 31 August 2005 00:37, Carlos Martin wrote: > Right, the 0816 and 0818 tarball mixup is now officially Your > Fault(tm). They are exactly the same. You forgot to patch it with the > 0816_split tarball. Oh my, I've spent 40 minutes splitting 0818 -> 0821 patch into smaller pieces. Now it turns out a similar patchset already was lying in _my own_ download directory! >:O Rechecked. Indeed 0818 is the same as 0816. I am removing 0818 tarball from dl dir. > I've tested the split tarball, and can say it breaks at 0816.4. Also, > there is a delay (a second or so) after ACX100_CMD_INIT_MEMORY when it > fails. In all other cases, the logs just fly on the console. > Attached are the logs of doing `insmod acx_usb.ko debug=0xffff` on the > different versions in the 0816_split tarball. You probably don't need > all of them, but it can't hurt (much) to send them. Thanks! It's dreaded acx100_s_create_dma_regions() change. This is what's going on: a number of PCI-only fields in wlandevice_t is #ifdef'ed out from USB compilation, and relevant code mostly painlessly moved to pci.c Except for acx100_s_create_dma_regions(). There surprisingly some of purely PCI-related data used for USB initialization! Yay. I just #ifdef'ed out relevant parts but already feared that something is definitely wrong here. Let's try to find out what exactly was programmed into the card. In tarball [sent separately to avoid spamming ml]: acx-20050816_3.0 is former acx-20050816.3 which you found working acx-20050816_4.z is acx-20050816_4 which doesn't work acx-20050816_3.0 -> acx-20050816_3.1 - trivial changes, should work acx-20050816_3.1 -> acx-20050816_3.2 - acx100_s_create_dma_regions change acx-20050816_3.1i == instrumented acx-20050816_3.1 which will dump contents of configuration being showed into acx acx-20050816_3.2i == instrumented acx-20050816_3.2 Please compile and modprobe acx-20050816_3.1i and acx-20050816_3.2i with debug=0xffff and send me logs. Please reboot between modprobes in order to get 'clean state' debugs. -- vda |
From: Carlos M. <car...@gm...> - 2005-08-31 11:10:15
Attachments:
dmesg.0816_3.1i.bz2
dmesg.0816_3.2i.bz2
|
Here you go. With reboots between them. --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-08-31 11:55:30
|
On Wednesday 31 August 2005 14:10, Carlos Martin wrote: > Here you go. With reboots between them. Thanks! Sending you new test tarball. I will need logs again. There is a even chance it will work ;) acx-20050816_3.1i - not changed acx-20050816_3.2i - updated. maybe works now acx-20050816_3.3i - was acx-20050816_3.2i in prev tarball. won't work You need to test 'new' 2i only unless you changed testing environment (in that case test 1i and 2i). [sent off-list] -- vda |
From: Carlos M. <car...@gm...> - 2005-08-31 12:15:21
|
On 31/08/05, Denis Vlasenko <vd...@il...> wrote: > On Wednesday 31 August 2005 14:10, Carlos Martin wrote: > > Here you go. With reboots between them. >=20 > Thanks! >=20 > Sending you new test tarball. I will need logs again. > There is a even chance it will work ;) That's nice. This one will have to wait a bit. And about that OOPS, I've tried to trace it, but the numbers jump around a bit. The printf (from your HOWTO) says the oops is in 0x2ba7, but the "Code: blah, blah" is located at 0x2b7c. So, it is in that area, but I can't be sure exactly where. --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Carlos M. <car...@gm...> - 2005-08-31 17:21:37
Attachments:
dmesg.0816_3.2i-test2.bz2
|
Attached the log of the .2i. Do yo have any idea why it doesn't deregister the device? I've looked a bit at the code and I suspect a NULL pointer somwhere, so that it doesn't even try to, but I'm not sure. I remember it working in Andreas' tarball, although that doesn't automatically mean that it did, just that I didn't notice a problem. cmn --=20 Carlos Mart=EDn http://www.cmartin.tk http://rpgscript.berlios.de "Vacate your oddly-shaped office and leave. The revolution is coming." |
From: Denis V. <vd...@il...> - 2005-09-01 05:56:52
|
On Wednesday 31 August 2005 20:21, Carlos Martin wrote: > Attached the log of the .2i. Wow. It seems to work, eh? > Do yo have any idea why it doesn't deregister the device? I've looked > a bit at the code and I suspect a NULL pointer somwhere, so that it > doesn't even try to, but I'm not sure. I remember it working in > Andreas' tarball, although that doesn't automatically mean that it > did, just that I didn't notice a problem. Dunno. Add prints to this function: static void acx100usb_e_disconnect(struct usb_interface *intf) { wlandevice_t *priv = usb_get_intfdata(intf); int i, result; printk("acx100usb_e_disconnect: 1\n"); /* No WLAN device...no sense */ if (!priv) {printk("acx100usb_e_disconnect: !priv, bailing\n"); return; } acx_sem_lock(priv); /* stop the transmit queue */ if (priv->netdev) { rtnl_lock(); if (!acx_queue_stopped(priv->netdev)) { printk("acx100usb_e_disconnect: acx_stop_queue\n"); acx_stop_queue(priv->netdev, "on USB disconnect"); } rtnl_unlock(); #ifdef CONFIG_PROC_FS acx_proc_unregister_entries(priv->netdev); #endif } printk.... /* now abort pending URBs and free them */ for (i = 0; i < ACX100_USB_NUM_BULK_URBS; i++) { acx_unlink_and_free_urb(priv->bulkrx_urbs[i]); acx_unlink_and_free_urb(priv->usb_tx[i].urb); } /* Unregister the network devices */ if (priv->netdev) { rtnl_lock(); result = unregister_netdevice(priv->netdev); rtnl_unlock(); kfree(priv->netdev); } acx_sem_unlock(priv); /* finally free the WLAN device */ kfree(priv); } and you will find out. -- vda |