From: Ryan R. <ry...@gm...> - 2007-05-25 01:54:45
|
All, Kernel version: 2.6.19 I've been having some connection issues with IrCOMM. It seems that a new ircomm_tty_cb object is created and cached in a hashbin every time a new /dev/ircomm? device is open. I start seeing problems after more than one ircomm device node has been opened by the application (i.e. /dev/ircomm0 and /dev/ircomm1). After some analysis of the problem it seems that I if I update the implementation of ircomm_tty_attach.c:ircomm_tty_discovery_indication() to just use the object passed back to the callback via the priv variable, the problems seem to go away. Is there a reason that this callback loops through the objects in the hash_bin and processes the IRCOMM_TTY_DISCOVERY_INDICATION event for each object rather than just using the structure passed into the callback? It doesn't seem that the search is necessary, but I certainly don't understand the code well enough to assert that. Here is basically what I have done. (Sorry, not at my Linux box right now, so I don't have a diff.) Can somebody tell me what the drawbacks of doing this would be? 352 /* 353 * Function ircomm_tty_discovery_indication (discovery) 354 * 355 * Remote device is discovered, try query the remote IAS to see which 356 * device it is, and which services it has. 357 * 358 */ 359 static void ircomm_tty_discovery_indication(discinfo_t *discovery, 360 DISCOVERY_MODE mode, 361 void *priv) 362 { 363 struct ircomm_tty_cb *self; 364 struct ircomm_tty_info info; 365 366 IRDA_DEBUG(2, "%s()\n", __FUNCTION__ ); 367 368 /* Important note : 369 * We need to drop all passive discoveries. 370 * The LSAP management of IrComm is deficient and doesn't deal 371 * with the case of two instance connecting to each other 372 * simultaneously (it will deadlock in LMP). 373 * The proper fix would be to use the same technique as in IrNET, 374 * to have one server socket and separate instances for the 375 * connecting/connected socket. 376 * The workaround is to drop passive discovery, which drastically 377 * reduce the probability of this happening. 378 * Jean II */ 379 if(mode == DISCOVERY_PASSIVE) 380 return; 381 382 info.daddr = discovery->daddr; 383 info.saddr = discovery->saddr; 384 ---> self = (struct ircomm_tty_cb *) priv; ---> ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, ---> NULL, &info); ---> #if 0 385 /* FIXME. We have a locking problem on the hashbin here. 386 * We probably need to use hashbin_find_next(), but we first 387 * need to ensure that "line" is unique. - Jean II */ 388 self = (struct ircomm_tty_cb *) hashbin_get_first(ircomm_tty); 389 while (self != NULL) { 390 IRDA_ASSERT(self->magic == IRCOMM_TTY_MAGIC, return;); 391 392 ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, 393 NULL, &info); 394 395 self = (struct ircomm_tty_cb *) hashbin_get_next(ircomm_tty); 396 } ---> #endif 397 } Thanks for any info on why the current function is implemented the way it is and if my change is reasonable or not. -- Ryan |
From: Samuel O. <sa...@so...> - 2007-05-28 00:59:54
|
Hi Ryan, On Thu, May 24, 2007 at 09:54:45PM -0400, Ryan Reading wrote: > All, > > Kernel version: 2.6.19 > > I've been having some connection issues with IrCOMM. It seems that a > new ircomm_tty_cb object is created and cached in a hashbin every time > a new /dev/ircomm? device is open. I start seeing problems after more > than one ircomm device node has been opened by the application (i.e. > /dev/ircomm0 and /dev/ircomm1). After some analysis of the problem it > seems that I if I update the implementation of > ircomm_tty_attach.c:ircomm_tty_discovery_indication() to just use the > object passed back to the callback via the priv variable, the problems > seem to go away. > > Is there a reason that this callback loops through the objects in the > hash_bin and processes the IRCOMM_TTY_DISCOVERY_INDICATION event for > each object rather than just using the structure passed into the > callback? It doesn't seem that the search is necessary, but I > certainly don't understand the code well enough to assert that. Well, it seems to me that you're right. However, could you describe more precisely the problem you get with the current code ? Cheers, Samuel. > Here is basically what I have done. (Sorry, not at my Linux box right > now, so I don't have a diff.) Can somebody tell me what the drawbacks > of doing this would be? > > 352 /* > 353 * Function ircomm_tty_discovery_indication (discovery) > 354 * > 355 * Remote device is discovered, try query the remote IAS to see which > 356 * device it is, and which services it has. > 357 * > 358 */ > 359 static void ircomm_tty_discovery_indication(discinfo_t *discovery, > 360 DISCOVERY_MODE mode, > 361 void *priv) > 362 { > 363 struct ircomm_tty_cb *self; > 364 struct ircomm_tty_info info; > 365 > 366 IRDA_DEBUG(2, "%s()\n", __FUNCTION__ ); > 367 > 368 /* Important note : > 369 * We need to drop all passive discoveries. > 370 * The LSAP management of IrComm is deficient and doesn't deal > 371 * with the case of two instance connecting to each other > 372 * simultaneously (it will deadlock in LMP). > 373 * The proper fix would be to use the same technique as in IrNET, > 374 * to have one server socket and separate instances for the > 375 * connecting/connected socket. > 376 * The workaround is to drop passive discovery, which drastically > 377 * reduce the probability of this happening. > 378 * Jean II */ > 379 if(mode == DISCOVERY_PASSIVE) > 380 return; > 381 > 382 info.daddr = discovery->daddr; > 383 info.saddr = discovery->saddr; > 384 > ---> self = (struct ircomm_tty_cb *) priv; > ---> ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, > ---> NULL, &info); > ---> #if 0 > 385 /* FIXME. We have a locking problem on the hashbin here. > 386 * We probably need to use hashbin_find_next(), but we first > 387 * need to ensure that "line" is unique. - Jean II */ > 388 self = (struct ircomm_tty_cb *) hashbin_get_first(ircomm_tty); > 389 while (self != NULL) { > 390 IRDA_ASSERT(self->magic == IRCOMM_TTY_MAGIC, return;); > 391 > 392 ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, > 393 NULL, &info); > 394 > 395 self = (struct ircomm_tty_cb *) > hashbin_get_next(ircomm_tty); > 396 } > ---> #endif > 397 } > > Thanks for any info on why the current function is implemented the way > it is and if my change is reasonable or not. > > -- Ryan > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > irda-users mailing list > ird...@li... > http://lists.sourceforge.net/lists/listinfo/irda-users |
From: Ryan R. <ry...@gm...> - 2007-05-28 21:18:07
|
Samuel, Thanks for the response. I have application code that essentially does the following (this is pseudo-code): fd = open( "/dev/ircomm0", O_NONBLOCK ); if( fd != -1 ) { if( select( fd, 5_sec ) ) /* Wait for port to be ready for data */ return SUCCESS; } fd = open( "/dev/ircomm1", O_NONBLOCK ); if( fd != -1 ) { if( select( fd, 5_sec ) ) /* Wait for port to be ready for data */ return SUCCESS; } ... and so on. As long as this code always finds the device on /dev/ircomm0, then this code works fine. If /dev/ircomm0 can't be opened for whatever reason, /dev/ircomm1 will be opened as well. This creates 2 entries in the ircomm_tty hashbin. If there is only one entry in the hashbin, everything works without problem. As soon as there are two entries in the hashbin, this code stops working and it won't work again until ircomm is unloaded and reloaded. If there are two entries, the open() call will succeed, but the select() call will timeout. At the protocol level, the CONN_CMD is sent twice. I believe the CONN_CMDs have different SLSAPs. A response to the first CONN_CMD is received *after* the second CONN_CMD is sent, but then a disconnect response is sent and the connection is never actually established. The only difference I could see between the working case and the broken case was the two CONN_CMD being sent rather than just one. After digging into the code, I found the multiple CONN_CMD were being sent b/c of the ircomm_tty_discovery_indication() implementation. I changed the implementation to what I have below, and the problems I was having seem to go away. I still have more testing to do over the next week or so, but I wanted to get a sanity check from some experts because I'm obviously speculating on the validity of the patch. -- Ryan On 5/27/07, Samuel Ortiz <sa...@so...> wrote: > Hi Ryan, > > On Thu, May 24, 2007 at 09:54:45PM -0400, Ryan Reading wrote: > > All, > > > > Kernel version: 2.6.19 > > > > I've been having some connection issues with IrCOMM. It seems that a > > new ircomm_tty_cb object is created and cached in a hashbin every time > > a new /dev/ircomm? device is open. I start seeing problems after more > > than one ircomm device node has been opened by the application (i.e. > > /dev/ircomm0 and /dev/ircomm1). After some analysis of the problem it > > seems that I if I update the implementation of > > ircomm_tty_attach.c:ircomm_tty_discovery_indication() to just use the > > object passed back to the callback via the priv variable, the problems > > seem to go away. > > > > Is there a reason that this callback loops through the objects in the > > hash_bin and processes the IRCOMM_TTY_DISCOVERY_INDICATION event for > > each object rather than just using the structure passed into the > > callback? It doesn't seem that the search is necessary, but I > > certainly don't understand the code well enough to assert that. > Well, it seems to me that you're right. However, could you describe more > precisely the problem you get with the current code ? > > Cheers, > Samuel. > > > > Here is basically what I have done. (Sorry, not at my Linux box right > > now, so I don't have a diff.) Can somebody tell me what the drawbacks > > of doing this would be? > > > > 352 /* > > 353 * Function ircomm_tty_discovery_indication (discovery) > > 354 * > > 355 * Remote device is discovered, try query the remote IAS to see which > > 356 * device it is, and which services it has. > > 357 * > > 358 */ > > 359 static void ircomm_tty_discovery_indication(discinfo_t *discovery, > > 360 DISCOVERY_MODE mode, > > 361 void *priv) > > 362 { > > 363 struct ircomm_tty_cb *self; > > 364 struct ircomm_tty_info info; > > 365 > > 366 IRDA_DEBUG(2, "%s()\n", __FUNCTION__ ); > > 367 > > 368 /* Important note : > > 369 * We need to drop all passive discoveries. > > 370 * The LSAP management of IrComm is deficient and doesn't deal > > 371 * with the case of two instance connecting to each other > > 372 * simultaneously (it will deadlock in LMP). > > 373 * The proper fix would be to use the same technique as in IrNET, > > 374 * to have one server socket and separate instances for the > > 375 * connecting/connected socket. > > 376 * The workaround is to drop passive discovery, which drastically > > 377 * reduce the probability of this happening. > > 378 * Jean II */ > > 379 if(mode == DISCOVERY_PASSIVE) > > 380 return; > > 381 > > 382 info.daddr = discovery->daddr; > > 383 info.saddr = discovery->saddr; > > 384 > > ---> self = (struct ircomm_tty_cb *) priv; > > ---> ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, > > ---> NULL, &info); > > ---> #if 0 > > 385 /* FIXME. We have a locking problem on the hashbin here. > > 386 * We probably need to use hashbin_find_next(), but we first > > 387 * need to ensure that "line" is unique. - Jean II */ > > 388 self = (struct ircomm_tty_cb *) hashbin_get_first(ircomm_tty); > > 389 while (self != NULL) { > > 390 IRDA_ASSERT(self->magic == IRCOMM_TTY_MAGIC, return;); > > 391 > > 392 ircomm_tty_do_event(self, IRCOMM_TTY_DISCOVERY_INDICATION, > > 393 NULL, &info); > > 394 > > 395 self = (struct ircomm_tty_cb *) > > hashbin_get_next(ircomm_tty); > > 396 } > > ---> #endif > > 397 } > > > > Thanks for any info on why the current function is implemented the way > > it is and if my change is reasonable or not. > > > > -- Ryan > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > irda-users mailing list > > ird...@li... > > http://lists.sourceforge.net/lists/listinfo/irda-users > |
From: Samuel O. <sa...@so...> - 2007-06-02 00:56:44
|
On Mon, May 28, 2007 at 05:18:08PM -0400, Ryan Reading wrote: > If there is only one entry in the hashbin, everything works without > problem. As soon as there are two entries in the hashbin, this code > stops working and it won't work again until ircomm is unloaded and > reloaded. > > If there are two entries, the open() call will succeed, but the > select() call will timeout. You mean the 2nd open(), right ? > At the protocol level, the CONN_CMD is > sent twice. I believe the CONN_CMDs have different SLSAPs. I believe this is an IAS connection request, since this is what the 2 IrCOMM instances are supposed to do after having been notified of a discovery event: checking for the peer's IrCOMM support. > The only difference I could see between the working case and the > broken case was the two CONN_CMD being sent rather than just one. > After digging into the code, I found the multiple CONN_CMD were being > sent b/c of the ircomm_tty_discovery_indication() implementation. I > changed the implementation to what I have below, and the problems I > was having seem to go away. I quickly tried to reproduce this bug with 2 IrDA dongles, and only one of them seeing an IrCOMM capable phone. As expected both of them were sending the IAS request, but I didn't see any further issues. I will try again this weekend, but meanwhile could you do the following on your side: 1) echo 5 > /proc/sys/net/irda/debug 2) Run your application until you hit the failing condition 3) echo 0 > /proc/sys/net/irda/debug 4) Send me the corresponding dmesg output. > I still have more testing to do over the > next week or so, but I wanted to get a sanity check from some experts > because I'm obviously speculating on the validity of the patch. The patch looks OK to me. The current code looks valid as well, even though not optimal. So, I'm trying to understand why it fails on you. Cheers, Samuel. |
From: Samuel O. <sa...@so...> - 2007-07-01 22:34:01
|
Hi Ryan, On Sat, Jun 02, 2007 at 03:56:39AM +0300, Samuel Ortiz wrote: > > I still have more testing to do over the > > next week or so, but I wanted to get a sanity check from some experts > > because I'm obviously speculating on the validity of the patch. > The patch looks OK to me. > The current code looks valid as well, even though not optimal. So, I'm > trying to understand why it fails on you. Have you got more time to test your patch ? It works fine for me, but I'd like to double check with you before pushing it upstream. Cheers, Samuel. > Cheers, > Samuel. > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > irda-users mailing list > ird...@li... > http://lists.sourceforge.net/lists/listinfo/irda-users |
From: Ryan R. <ry...@gm...> - 2007-07-03 01:04:16
|
Samuel, I've done ad hoc testing and have been using it day to day and it has worked very well for me. I wanted to send debug traces back to the list but have been swamped at work. When I get time I'll try and do some more formal testing. Thanks! -- Ryan On 7/1/07, Samuel Ortiz <sa...@so...> wrote: > Hi Ryan, > > On Sat, Jun 02, 2007 at 03:56:39AM +0300, Samuel Ortiz wrote: > > > I still have more testing to do over the > > > next week or so, but I wanted to get a sanity check from some experts > > > because I'm obviously speculating on the validity of the patch. > > The patch looks OK to me. > > The current code looks valid as well, even though not optimal. So, I'm > > trying to understand why it fails on you. > Have you got more time to test your patch ? > It works fine for me, but I'd like to double check with you before > pushing it upstream. > > Cheers, > Samuel. > > > > Cheers, > > Samuel. > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > irda-users mailing list > > ird...@li... > > http://lists.sourceforge.net/lists/listinfo/irda-users > |