[Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've had this crash reported to me...

[18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684] SS:ESP 0068:dfb89d14 

[18845.090712] [<c13ecff3>] ? do_page_fault+0x0/0x2e1 
[18845.120042] [<e082f490>] ? br2684_push+0x19/0x234 [br2684] 
[18845.153530] [<e084fa13>] solos_bh+0x28b/0x7c8 [solos_pci] 
[18845.186488] [<e084f711>] ? solos_irq+0x2d/0x51 [solos_pci] 
[18845.219960] [<c100387b>] ? handle_irq+0x3b/0x48 
[18845.247732] [<c10265cb>] ? irq_exit+0x34/0x57 
[18845.274437] [<c1025720>] tasklet_action+0x42/0x69 
[18845.303247] [<c102643f>] __do_softirq+0x8e/0x129 
[18845.331540] [<c10264ff>] do_softirq+0x25/0x2a 
[18845.358274] [<c102664c>] _local_bh_enable_ip+0x5e/0x6a 
[18845.389677] [<c102666d>] local_bh_enable+0xb/0xe 
[18845.417944] [<e08490a8>] ppp_unregister_channel+0x32/0xbb [ppp_generic] 
[18845.458193] [<e08731ad>] pppox_unbind_sock+0x18/0x1f [pppox] 
[18845.492712] [<e087f5f7>] pppoe_device_event+0xa7/0x159 [pppoe] 
[18845.528269] [<c13ed2ff>] notifier_call_chain+0x2b/0x4a 
[18845.559674] [<c1038a62>] raw_notifier_call_chain+0xc/0xe 
[18845.592110] [<c1300867>] dev_close+0x51/0x8b 
[18845.618266] [<c1300927>] rollback_registered_many+0x86/0x15e 
[18845.652813] [<c1300ab3>] unregister_netdevice_queue+0x67/0x91 
[18845.687849] [<c1300b79>] unregister_netdev+0x14/0x1c 
[18845.718221] [<e082f4d1>] br2684_push+0x5a/0x234 [br2684] 
[18845.750676] [<e083dc21>] vcc_release+0x64/0x100 [atm] 

The problem is that the 'find_vcc' functions in these drivers are
returning a vcc with the ATM_VF_READY bit cleared, because it's already
in the process of being destroyed. If we fix that simple oversight,
there's still a race condition because the socket can still be closed
(and completely freed, afaict) between our call to find_vcc() and our
call to vcc->push() in the RX tasklet.

Here's a patch for solos-pci which should fix it. We prevent the race by
making the dev->ops->close() function wait for the RX tasklet to
complete, so it can't still be using the vcc in question.

I think this same approach should work OK for usbatm and he. Less sure
about atmtcp -- we may need some extra locking there to protect
atmtcp_c_send(). And I'm ignoring eni_proc_read() for now.

Can anyone see a better approach -- short of rewriting the whole ATM
layer to make the locking saner?

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index c5f5186..a73f102 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -774,7 +774,8 @@ static struct atm_vcc *find_vcc(struct atm_dev *dev, short vpi, int vci)
 	sk_for_each(s, node, head) {
 		vcc = atm_sk(s);
 		if (vcc->dev == dev && vcc->vci == vci &&
-		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE)
+		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE &&
+		    test_bit(ATM_VF_READY, &vcc->flags))
 			goto out;
 	}
 	vcc = NULL;
@@ -900,6 +901,10 @@ static void pclose(struct atm_vcc *vcc)
 	clear_bit(ATM_VF_ADDR, &vcc->flags);
 	clear_bit(ATM_VF_READY, &vcc->flags);
 
+	/* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
+	   tasklet has finished processing any incoming packets (and, more to
+	   the point, using the vcc pointer). */
+	tasklet_unlock_wait(&card->tlet);
 	return;
 }
 

-- 
David Woodhouse                            Open Source Technology Centre
Dav...@in...                              Intel Corporation