You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(5) |
Jun
(2) |
Jul
(18) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(2) |
Feb
(7) |
Mar
(22) |
Apr
(9) |
May
(10) |
Jun
(2) |
Jul
(21) |
Aug
(10) |
Sep
(1) |
Oct
|
Nov
(32) |
Dec
(4) |
2004 |
Jan
(21) |
Feb
(27) |
Mar
(3) |
Apr
|
May
(16) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Tziporet K. <tzi...@me...> - 2004-05-31 07:02:25
|
Hi, The reason you get this error is not related to any spinlocks. For all privileged verbs the THH driver check that process is not in interrupts context (since we put the process to sleep and this is = forbidden in interrupts).=20 We use the function in_interrupt() do decide if the process is in = interrupt context or not. Note that this function is also TRUE for tasklet. So it seems you are calling memory registration in tasklet or = interrupt. What you can do is use this function before calling to ib_reg_mem() and = see in what context the process is. Tziporet -----Original Message----- From: er...@te... [mailto:er...@te...] Sent: Monday, May 31, 2004 8:54 AM To: inf...@li... Subject: [Infiniband-access_layer] ib_reg_mem fails after a few times Hi, I'm using ib_reg_mem in order to register memory. It works fine 18 = times. On the next time, I get the following errors: May 30 18:14:37 psl-178 kernel: TVPD(1): THH_hob_register_mr: NOT IN = TASK CONTEXT) May 30 18:14:37 psl-178 kernel: TVPD(1): kvp_mlx_mrw.c[493]: kvp_mlnx_register_mr: Could not register the MR, reason: (-255: HH_ERR) May 30 18:14:37 psl-178 kernel: srpl_queue_req() !ERROR!: ib_reg_mem() failed!, status =3D 0x2a This call is done in a synchronous context & I'm not holding any = spinlocks. Also, I'm deregistering the memory successfully. ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle = 10g.=20 Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id149&alloc_id=8166&op=CCk _______________________________________________ Infiniband-access_layer mailing list Inf...@li... https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer |
From: <er...@te...> - 2004-05-31 05:54:02
|
Hi, I'm using ib_reg_mem in order to register memory. It works fine 18 times.= On the next time, I get the following errors: May 30 18:14:37 psl-178 kernel: TVPD(1): THH_hob_register_mr: NOT IN TAS= K CONTEXT) May 30 18:14:37 psl-178 kernel: TVPD(1): kvp_mlx_mrw.c[493]: kvp_mlnx_register_mr: Could not register the MR, reason: (-255: HH_ERR) May 30 18:14:37 psl-178 kernel: srpl_queue_req() !ERROR!: ib_reg_mem() fa= iled!, status =3D 0x2a This call is done in a synchronous context & I'm not holding any spinlock= s. Also, I'm deregistering the memory successfully. |
From: <er...@te...> - 2004-05-27 15:27:55
|
So, how can I deregister this memory? I understand that I cannot do that = while holding a spinlock. You said that "migrating the deregistration call to b= e asynchronous wouldn't be difficult". What do you mean by that & how can I= do that? BTW - when the SRP driver was released, was it possible to release memory= when holding a spinlock? This may explain the current problem. Quoting "Hefty, Sean" <sea...@in...>: > >I'm trying to release the memory from a CQ callback function. This > command > >runs after running cl_spinlock_acquire. > >However, this code was released as a part of the SF project code (SRP)= , > so > >I assumed that it should work fine... >=20 > That call should not have been made from a CQ callback. Memory > registration/deregistration needs to be done without holding a spinlock= . >=20 > Note that this is a general issue with the Mellanox HCA, and the reason > that most of the IBAL destruction APIs work asynchronously is to allow > destroy calls to be made from any callback. We made a design decision > not to use asynchronous memory deregistration for security purposes, bu= t > migrating the deregistration call to be asynchronous wouldn't be > difficult. >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g= .=20 > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3D3149&alloc_id=3D8166&op=3Dclick > _______________________________________________ > Infiniband-access_layer mailing list > Inf...@li... > https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer >=20 |
From: Hefty, S. <sea...@in...> - 2004-05-27 15:09:00
|
>I'm trying to release the memory from a CQ callback function. This command >runs after running cl_spinlock_acquire. >However, this code was released as a part of the SF project code (SRP), so >I assumed that it should work fine... That call should not have been made from a CQ callback. Memory registration/deregistration needs to be done without holding a spinlock. Note that this is a general issue with the Mellanox HCA, and the reason that most of the IBAL destruction APIs work asynchronously is to allow destroy calls to be made from any callback. We made a design decision not to use asynchronous memory deregistration for security purposes, but migrating the deregistration call to be asynchronous wouldn't be difficult. |
From: Zilber E. <er...@te...> - 2004-05-27 14:59:19
|
I'm trying to release the memory from a CQ callback function. This command runs after running cl_spinlock_acquire. However, this code was released as a part of the SF project code (SRP), so I assumed that it should work fine... On Thu, 27 May 2004, Fab Tillier wrote: > You are probably holding a spinlock or are running in a tasklet context. > Memory registration and deregistration has to be done in a thread context > that is able to block.This is a limitation of the Mellanox HCA. > > - Fab > > > -----Original Message----- > > From: er...@te... [mailto:er...@te....= il] > > Sent: Thursday, May 27, 2004 5:55 AM > > To: inf...@li... > > Subject: [Infiniband-access_layer] ib_dereg_mr fails > > > > Hi, > > > > I'm regsitering memory using ib_reg_mem (in order to allow another > > application > > to perform RDMA write). Later, I want to free this memory using > > ib_dereg_mr. I > > get the following errors: > > > > TVPD(1): THH_hob_deregister_mr: NOT IN TASK CONTEXT) > > May 27 15:43:20 psl-178 kernel:TVPD(1): kvp_mlx_mrw.c[609]: > > kvp_mlnx_deregister_mr: THH_hob_deregister_mr error (-255: HH_ERR) > > May 27 15:43:20 psl-178 kernel: srpl_retire() !ERROR!: ib_dereg_mr() > > failed ! > > status =3D 0x2a > > > > Why? > > > > Thanks > > Erez > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: Oracle 10g > > Get certified on the hottest thing ever to hit the market... Oracle 10g= =2E > > Take an Oracle 10g class now, and we'll give you theexam FREE. > > http://ads.osdn.com/?ad_id149&alloc_id=8166&op=3Dick > > _______________________________________________ > > Infiniband-access_layer mailing list > > Inf...@li... > > https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer > > > > ------------------ |
From: Fab T. <fti...@in...> - 2004-05-27 14:45:25
|
You are probably holding a spinlock or are running in a tasklet context. Memory registration and deregistration has to be done in a thread = context that is able to block. This is a limitation of the Mellanox HCA. - Fab > -----Original Message----- > From: er...@te... = [mailto:er...@te...] > Sent: Thursday, May 27, 2004 5:55 AM > To: inf...@li... > Subject: [Infiniband-access_layer] ib_dereg_mr fails >=20 > Hi, >=20 > I'm regsitering memory using ib_reg_mem (in order to allow another > application > to perform RDMA write). Later, I want to free this memory using > ib_dereg_mr. I > get the following errors: >=20 > TVPD(1): THH_hob_deregister_mr: NOT IN TASK CONTEXT) > May 27 15:43:20 psl-178 kernel: TVPD(1): kvp_mlx_mrw.c[609]: > kvp_mlnx_deregister_mr: THH_hob_deregister_mr error (-255: HH_ERR) > May 27 15:43:20 psl-178 kernel: srpl_retire() !ERROR!: ib_dereg_mr() > failed ! > status =3D 0x2a >=20 > Why? >=20 > Thanks > Erez >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle = 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id149&alloc_id=8166&op=3Dick > _______________________________________________ > Infiniband-access_layer mailing list > Inf...@li... > https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer |
From: <er...@te...> - 2004-05-27 12:55:28
|
Hi, I'm regsitering memory using ib_reg_mem (in order to allow another applic= ation to perform RDMA write). Later, I want to free this memory using ib_dereg_= mr. I get the following errors: TVPD(1): THH_hob_deregister_mr: NOT IN TASK CONTEXT) May 27 15:43:20 psl-178 kernel: TVPD(1): kvp_mlx_mrw.c[609]: kvp_mlnx_deregister_mr: THH_hob_deregister_mr error (-255: HH_ERR) May 27 15:43:20 psl-178 kernel: srpl_retire() !ERROR!: ib_dereg_mr() fail= ed ! status =3D 0x2a Why? Thanks Erez |
From: Tillier, F. <fti...@in...> - 2004-05-21 22:11:01
|
That means you are specifying too large of an outstanding RDMA read = limit. The limit seems to be 5 - the responder is probably giving too = high a value, so when the CM tries to configure the QP based on the REP = data it received, the modify QP fails. This is just a guess from the information you have provided. If you = can, find out what the passive side is sending in the REP. - Fab > -----Original Message----- > From: er...@te... = [mailto:er...@te...] > Sent: Friday, May 21, 2004 1:03 PM > To: inf...@li... > Subject: [Infiniband-access_layer] Error in tqpm.c when using the CM >=20 > Hi, >=20 > When using the CM (as the ative side), I get the following error: > May 20 18:01:06 psl-178 kernel: THH(1): tqpm.c[1231]: Error = rra_max=3D0x5 > > QPM's > log2_max=3D0x2, attr_p->qp_ous_rd_atom =3D 0x18 >=20 > Any idea? >=20 > Thanks > Erez >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle = 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id149&alloc_id=8166&op=3Dick > _______________________________________________ > Infiniband-access_layer mailing list > Inf...@li... > https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer |
From: <er...@te...> - 2004-05-21 20:02:53
|
Hi, When using the CM (as the ative side), I get the following error: May 20 18:01:06 psl-178 kernel: THH(1): tqpm.c[1231]: Error rra_max=3D0x= 5 > QPM's log2_max=3D0x2, attr_p->qp_ous_rd_atom =3D 0x18 Any idea? Thanks Erez |
From: Hefty, S. <sea...@in...> - 2004-05-19 00:04:48
|
> You mean no more changes to the svn openib tree during this time? If > so, why? No... we're only asking to delay patches into the SourceForge tree. |
From: Greg KH <gr...@kr...> - 2004-05-18 23:55:00
|
On Tue, May 18, 2004 at 03:55:26PM -0700, Hefty, Sean wrote: > We are now ready to begin the integration process of the SourceForge > Infiniband Access Layer into openib. As part of this process, we will > start taking all necessary steps to prepare the code for consideration > by the Linux kernel development community. (E.g. reformatting the code, > integrating complib, removing OS abstractions, etc.) While this is > occurring, we would like to ask that all non-critical patches be delayed > until these changes are complete and then recreated using the updated > code base. You mean no more changes to the svn openib tree during this time? If so, why? thanks, greg k-h |
From: Ashok R. <ash...@in...> - 2004-05-18 23:44:38
|
On Tue, May 18, 2004 at 03:55:26PM -0700, Hefty, Sean wrote: > > We are now ready to begin the integration process of the SourceForge > Infiniband Access Layer into openib. As part of this process, we will > start taking all necessary steps to prepare the code for consideration > by the Linux kernel development community. (E.g. reformatting the > code, integrating complib, removing OS abstractions, etc.) While this > is occurring, we would like to ask that all non-critical patches be > delayed until these changes are complete and then recreated using the > updated code base. One suggestion is when you do these, please do them in consolidated group of changes. - Fix formatting and checkin as a separate changeset - Fix complib dependency as a separate change set - etc.. so that you can avoid looking at a huge change that looks entirely different. on complib: i think for IB code if the groups interest is portablity, you would want to retain that. But what might appease the lkml community (if not completely) is not to export outside of IBAL infrastructure code. so if we dont export new spinlock, or other abstractions to outside of ibal that may be acceptable (???) only downside is other channel drivers such as SRP/IPOIB would be completely linux native, which probably is OK? I guess the SRP code may be able to share a big chunk of code amount other implementations, not sure exactly how the Best of breed SRP looks like today. Hopefully you will do this on the top version of 2.6.6 or greater. If so in complib there are some abstractions like creating a single kernel thread, creating a pool of threads which should be changed to use native mechanisms in 2.6 instead for better use. For e.g create_workqueue(), create_singlethread_workqueue() etc depending on use, that can automatically handle creating on each cpu. There is also code for handling hotplug CPU, that can scale the kernel threads depending on a new CPU arrival. As far as possible use the system service threads already in place, without creating too many new ones... ... my 2c worth... Cheers, ashok |
From: Hefty, S. <sea...@in...> - 2004-05-18 23:37:27
|
> About how long do you expect this process of code re-formatting to take > ? I'm hoping to have it completed by the end of the week. I will need to spend some time testing the code after running Lindent, just to make sure that nothing broke. =20 |
From: Woodruff, R. J <wo...@co...> - 2004-05-18 23:19:08
|
Sean Hefty Wrote,=20 >We are now ready to begin the integration process of the SourceForge Infiniband Access=20 >Layer into openib. As part of this process, we will start taking all necessary steps to=20 >prepare the code for consideration by the Linux kernel development community. (E.g.=20 >reformatting the code, integrating complib, removing OS abstractions, etc.) While this is=20 >occurring, we would like to ask that all non-critical patches be delayed until these=20 >changes are complete and then recreated using the updated code base. >Thanks! >- Sean Great. Glad to see we are finally start getting the code into shape such that it follows the Linux coding style.=20 About how long do you expect this process of code re-formatting to take ? I think the first step is to just run the code through Linent. Perhaps after that we can get it put into SubVersion on the openib.org site. Then, we need to figure out how to collapse the complib into IBAL, since the=20 feedback we got from Greg and others is that they do not like us having a separate "abstraction" layer, but I suggest we wait till we get it=20 into openib.org before making those mods.=20 woody |
From: Hefty, S. <sea...@in...> - 2004-05-18 22:55:39
|
We are now ready to begin the integration process of the SourceForge Infiniband Access Layer into openib. As part of this process, we will start taking all necessary steps to prepare the code for consideration by the Linux kernel development community. (E.g. reformatting the code, integrating complib, removing OS abstractions, etc.) While this is occurring, we would like to ask that all non-critical patches be delayed until these changes are complete and then recreated using the updated code base. Thanks! - Sean |
From: Hefty, S. <sea...@in...> - 2004-05-11 16:31:35
|
Greetings, As part of the effort to merge IBAL with openib, we will be modifying a couple of the IBAL APIs. Specifically, all virtual addresses will be converted from void* to uint64_t. This affects physical and shared memory registration routines. As a result of this change, existing hardware drivers will need to be modified and recompiled. The modifications are fairly simple and mainly require removing typecasts from the existing code. If anyone has any questions or comments, please let us know. This fixes bugs 932966 and 932967. Thanks, - Sean |
From: Tillier, F. <fti...@in...> - 2004-03-26 00:52:58
|
> -----Original Message----- > From: Hefty, Sean [mailto:sea...@in...] > Sent: Thursday, March 25, 2004 4:32 PM >=20 > > First up are issues with ib_reg_shared: >=20 > ib_reg_shared is not intended to be used across multiple processes. It > is intended to implement the register shared memory region (11.2.7.7). Nothing in the spec (11.2.7.7) mentions that it is not intended to be used across multiple processes. In fact, it specifically allows the protection domains to be different. The only requirement is that the underlying HCA be the same. While IBAL's implementation of ib_reg_shared might not have been intended for use across multiple processes, I'm suggesting changing that. >=20 > > Second issue is SHMID management, which seems to exist mainly to > support > > DAPL. The current ib_create_shmid and ib_reg_shmid don't quite meet > the > > DAPL usage model: DAPL doesn't want to have a master/slave > relationship - > > there should not be any ordering requirement. I would like to propose > two > > changes. The first would merge ib_create_shmid and ib_reg_shmid into > a > > single call ib_reg_shmid that would figure out itself if it was the > first > > call or a subsequent call. ib_reg_shmid would then need to track the > page > > mappings so that use on multiple HCAs would be supported. The shmid > > object > > would not be associated with any particular HCA, and would have a > per-HCA > > list of memory regions to determine whether to call ib_reg_mem or > > ib_reg_shared (based on whether existing memory region handles are > > available > > for the target HCA). The shmid object would then auto-destroy when > the > > last > > memory region associated with it is deregistered. >=20 > Shmid was not designed specifically for DAPL. Fine, but DAPL is the primary (only?) user. > Shmid is just broken at > this point. It needs to be addressed, but has not been a priority. I > believe that correct support for shmid requires verb support from the > HCA driver. I don't see how shmid will ever work without getting ib_reg_shared to work as I suggested. Maybe that's the additional verb support you talk about? The shmid should not depend on verb support, as it should work across multiple HCAs. >=20 > My understanding is that an integer identifier is the mechanism provided > by Linux to share memory across processes. See shmget(). How Linux represents memory that is shared between processes does not have to correspond 1:1 to how IBAL represents SHMIDs. So while there is a 1:1 relationship between the integer returned by shmget and the underlying physical pages, IBAL needs to provide a 1:1 relationship between these underlying pages and some other ID. IBAL's ID does not have to be the same as shmget's ID, and making IBAL's ID support DAPL's shared memory cookie will allow elimination of DAPL's kernel mode MRDB doohickey. - Fab |
From: Hefty, S. <sea...@in...> - 2004-03-26 00:33:16
|
> I'd like to propose some changes in the shared memory and shmid management > in IBAL to better support user-mode applications. >=20 > First up are issues with ib_reg_shared: ib_reg_shared is not intended to be used across multiple processes. It is intended to implement the register shared memory region (11.2.7.7). > Second issue is SHMID management, which seems to exist mainly to support > DAPL. The current ib_create_shmid and ib_reg_shmid don't quite meet the > DAPL usage model: DAPL doesn't want to have a master/slave relationship - > there should not be any ordering requirement. I would like to propose two > changes. The first would merge ib_create_shmid and ib_reg_shmid into a > single call ib_reg_shmid that would figure out itself if it was the first > call or a subsequent call. ib_reg_shmid would then need to track the page > mappings so that use on multiple HCAs would be supported. The shmid > object > would not be associated with any particular HCA, and would have a per-HCA > list of memory regions to determine whether to call ib_reg_mem or > ib_reg_shared (based on whether existing memory region handles are > available > for the target HCA). The shmid object would then auto-destroy when the > last > memory region associated with it is deregistered. Shmid was not designed specifically for DAPL. Shmid is just broken at this point. It needs to be addressed, but has not been a priority. I believe that correct support for shmid requires verb support from the HCA driver. > Further, I believe it would be beneficial to extend the shmid identifier > from an integer to a byte array. This would remove the need for DAPL to > manage cookie-to-shmid relationships, and remove the need for DAPL to have > a > kernel agent. Additionally, it would help reduce the potential for ID > namespace collision between different applications by allowing a more > natural identifier to be specified. My understanding is that an integer identifier is the mechanism provided by Linux to share memory across processes. See shmget(). |
From: Fab T. <fti...@in...> - 2004-03-26 00:06:22
|
Folks, I'd like to propose some changes in the shared memory and shmid = management in IBAL to better support user-mode applications. First up are issues with ib_reg_shared: - ib_reg_shared entry point for user-mode will fail if the input h_mr is = not within the same process context as the registration. This prevents ib_reg_shared from working across multiple processes. This is a kernel proxy issue in IBAL. There are two options that I can think of to solve this. The first would require keeping a global map of all memory = regions so that the input h_mr could be found/validated/reference counted for the ib_reg_shared call. This would allow a user-mode process to call ib_reg_shared against a kernel-mode h_mr. The second option is to keep = a proxy-global map of all memory handles that would behave in the same = way, but would prevent a user-mode app from calling ib_reg_shared with a = kernel h_mr. - ib_reg_shared needs to check that the page mappings match between the input h_mr and the requested vaddr before succeeding the call - note = however that this check should be performed by the HCA driver, but I don't = believe that the Mellanox VPD for IBAL does this today. IBAL must also check = that the input h_mr exists on the HCA represented by the input h_pd. Second issue is SHMID management, which seems to exist mainly to support DAPL. The current ib_create_shmid and ib_reg_shmid don't quite meet the DAPL usage model: DAPL doesn't want to have a master/slave relationship = - there should not be any ordering requirement. I would like to propose = two changes. The first would merge ib_create_shmid and ib_reg_shmid into a single call ib_reg_shmid that would figure out itself if it was the = first call or a subsequent call. ib_reg_shmid would then need to track the = page mappings so that use on multiple HCAs would be supported. The shmid = object would not be associated with any particular HCA, and would have a = per-HCA list of memory regions to determine whether to call ib_reg_mem or ib_reg_shared (based on whether existing memory region handles are = available for the target HCA). The shmid object would then auto-destroy when the = last memory region associated with it is deregistered. Further, I believe it would be beneficial to extend the shmid identifier from an integer to a byte array. This would remove the need for DAPL to manage cookie-to-shmid relationships, and remove the need for DAPL to = have a kernel agent. Additionally, it would help reduce the potential for ID namespace collision between different applications by allowing a more natural identifier to be specified. Thoughts? Comments? - Fab |
From: Hefty, S. <sea...@in...> - 2004-02-26 21:59:28
|
It's take us a while, but do know of this issue and believe that we have identified its cause. The problem is that dlopen acquires a mutex internally then tries to initialize ibal. The initialization of ibal spawns a thread for additional initialization, then waits on an event that is signaled by the spawned thread. Part of the processing done by the spawned thread is to call dlopen on the user-level VPD, and it does this before signaling the event mentioned before. This call to dlopen hangs, as the first thread is holding the internal mutex. We have a possible fix for this that we are testing. > -----Original Message----- > From: inf...@li... > [mailto:inf...@li...] On Behalf Of > Hassan M. Jafri > Sent: Tuesday, January 27, 2004 11:31 AM > To: inf...@li... > Subject: [Infiniband-access_layer] dlopen hanging with alllib >=20 > Hardware: > Intel IA-32 Xeon > Mellanox a1 silicon HCA >=20 > Software: > SourceForge Alpha2 release for SDK 1.00 BK 1.163 > thca-x86-thca_1_0_release-build-011 >=20 >=20 > I have created a dynamically loadable library (call it ibal.so) that > contains all the code with talks to the ibal layer. ibal.so is linked > dynamacally with allib and complib. I load my ibal.so library dynamically > using dlopen in my program (call it "test") . The problem is that my > program hangs when dlopen is issued. Appended below is the backtrace of > the > thread that seems to be hanging. >=20 > This problem goes away when I link "test" allib.so and complib.so. In that > case, by the time dlopen is issued for ibal.so, complib and allib symbols > are already in the addresss space of "test", and that somehow keeps the > hang for occurring. >=20 >=20 >=20 > ************************************************************************ ** > *********** > #0 0x401206a8 in sigsuspend () from /lib/libc.so.6 > #1 0x400adc28 in __pthread_wait_for_restart_signal () > from /lib/libpthread.so.0 > #2 0x400a9f9b in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so > #3 0x40450814 in cl_event_wait_on () from /usr/lib/libcomplib.so.0.0 > #4 0x40446208 in create_al_mgr () from /usr/lib/liballib.so.0.0 > #5 0x40448641 in ual_init () from /usr/lib/liballib.so.0.0 > #6 0x404484fb in _init () from /usr/lib/liballib.so.0.0 > #7 0x4000c4b1 in _dl_init_internal () from /lib/ld-linux.so.2 > #8 0x4020af42 in dl_open_worker () from /lib/libc.so.6 > #9 0x4000c266 in _dl_catch_error_internal () from /lib/ld-linux.so.2 > #10 0x4020a9af in _dl_open () from /lib/libc.so.6 > #11 0x40081eeb in dlopen_doit () from /lib/libdl.so.2 > #12 0x4000c266 in _dl_catch_error_internal () from /lib/ld-linux.so.2 > #13 0x40081316 in _dlerror_run () from /lib/libdl.so.2 > #14 0x40081e92 in dlopen@GLIBC_2.0 () from /lib/libdl.so.2 > #15 0x40021db4 in VMI_Load_Device (info=3D0x80cc0d8, newDevice=3D0x80c9c78) > at vmidevmgr_utils.c:86 > #16 0x400218a2 in VMI_Device_Register (info=3D0x80cc0d8, device=3D0xbfffde64 > at vmidevmgr.c:222 > #17 0x40025225 in VMI_XMLParser_Register (handle=3D0x80cc9a0) > at xmlparser.c:827 > #18 0x40023cda in VMI_Init_Subsystems (argc=3D1, argv=3D0xbfffdf54) > at vmicore_utils.c:167 > #19 0x4002396f in VMI_Init (argc=3D1, argv=3D0xbfffdf54) at vmi.c:55 > #20 0x0804a0e8 in main (argc=3D1, argv=3D0xbfffdf54) at = bandwidth.c:949 > #21 0x4010d917 in __libc_start_main () from /lib/libc.so.6 >=20 >=20 >=20 > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > Infiniband-access_layer mailing list > Inf...@li... > https://lists.sourceforge.net/lists/listinfo/infiniband-access_layer |
From: Hefty, S. <sea...@in...> - 2004-02-19 19:51:24
|
Sent MADs already contain a 64-bit context that is given back to the user with any matching response that I think is capable of doing what you're suggesting. =20 There is nothing in the spec that dictates how a receiver interprets a TID. I can easily envision a receiving client that uses the TID to determine if a MAD is a repeated request or not. So, I would rather not have AL set any policy regarding how TIDs must be interpreted. =20 =20 -----Original Message----- From: Fab Tillier [mailto:fti...@in...]=20 Sent: Thursday, February 19, 2004 11:30 AM To: Hefty, Sean; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Don't interpret the data... =20 I was thinking the policy would be for AL to not even look at the client TID and generate a new TID for every distinct send. So the upper 32-bits of the TID would still be the client index, and the lower 32-bits would be a per-mad-service counter that increments for every send request, allowing the mad service to properly match a response to a send. One of the benefits of this is that it eliminates the ability of mad clients to confuse their mad services - a response will only ever match to a single send, even if the client provides the same TID for multiple sends. =20 As you mention, having a generated TID might cause some weird behavior at the receiving end, hence my original question - is it valid for a MAD sender to encode some meaningful value into the TID? I would expect not - the TID should be opaque to the recipient, in which case having AL generate the full TID independently of the client's TID would work well. =20 If a policy is needed I don't think it's worth making the change. If there's no policy needed, however, I think not exposing any limitation on TID usage to clients is beneficial. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 11:07 AM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 We tried to do something like this, but AL doesn't try to interpret the data. So, it's not clear what policy AL would use when assigning TIDs. Does it always generate a new TID, or does it try to re-use TIDs? The result may not be the same at the receiver's side. And trying to guess based on the TID given by the client causes all sorts of head-aches trying to cache information. =20 - Sean =20 =20 -----Original Message----- From: Fab Tillier [mailto:fti...@in...]=20 Sent: Thursday, February 19, 2004 10:53 AM To: Hefty, Sean; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 The TIDs on the wire would still be unique, with IBAL using the managing the full 64-bits rather than only 32-bits. A client would be none the wiser, though. I'm not suggesting AL shouldn't assign the TID, just expanding how much of the TID AL assigns, but preserving the illusion at the client interface that the user has access to the full 64-bit TID. AL can easily keep the association between its generated TID and the client's requested TID for response processing. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 10:51 AM To: Tillier, Fabian; Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 TIDs must be unique from the same source. If clients are given full access to the TID, then they must coordinate between each other to ensure that none of them put the same TID on the wire as another client. The easiest solution is to have AL assign part of the TID, which ensures that no two clients above AL use the same TID. =20 - Sean =20 =20 -----Original Message----- From: inf...@li... [mailto:inf...@li...] On Behalf Of Tillier, Fabian Sent: Wednesday, February 18, 2004 10:53 PM To: Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Ok, let's take a step back on this and look at the general question independently from implementation: Goal: - Make the full 64-bit TID available to MAD service clients.=20 Requirements: - Preserve IBAL's ability to route solicited responses to the proper client.=20 - Responses to sends must report the client TID specified in matching send request.=20 Cost: - On-the-wire TID would be fully independent of client's specified TID. That is, no part of the TID in a MAD specified in ib_send_mad would make it to the wire - the full 64-bits would be overridden by IBAL. Is the cost worth the goal? Are there any compliance issues in having the on-wire TID decoupled from the client's requested TID? Note that I'm not looking for implementation details, only input on the general design tradeoffs. Thanks, - Fab -----Original Message----- From: Eitan Zahavi [mailto:ei...@me...] Sent: Wednesday, February 18, 2004 10:34 PM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL The TID is used to map mads back to different clients forcing a uniqueness on the receiver side. I think that the only other option is to add an API to get a unique TID from the driver. But I think the current approach is better. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -----Original Message----- From: Fab Tillier [mailto:fti...@in...] Sent: Wednesday, February 18, 2004 9:51 PM To: inf...@li... Subject: [Infiniband-access_layer] TID management in AL Is there a requirement that any part of the TID specified by a MAD service client go on the wire? Is it legal for a client to encode a value in the TID that has meaning to the recipient of a MAD? If not, would we want to change TID usage to allow a client to use the full 64-bits of the TID in sends, and have that TID restored in any matching responses? Currently, the mad service stores the client TID in the send tracking structure's work request (h_send->mad_wr.client_tid). For the send side, the mad service already properly restores the full 64-bits of the client TID. For responses, only the 32-bits reserved for the client are used to match an incoming MAD response to a send. If the mad service had as a member a TID counter that it incremented for every send, the on-wire TID would be the client ID for the mad service in the upper 32-bits, and this counter in the lower 32-bits - no part of the client's TID would actually go on the wire. Response processing would then match to the send using this counter value, and then be able to restore the full 64-bit client TID for the response MADs. Thoughts? - Fab |
From: Fab T. <fti...@in...> - 2004-02-19 19:38:18
|
Don't interpret the data. =20 I was thinking the policy would be for AL to not even look at the client = TID and generate a new TID for every distinct send. So the upper 32-bits of = the TID would still be the client index, and the lower 32-bits would be a per-mad-service counter that increments for every send request, allowing = the mad service to properly match a response to a send. One of the benefits = of this is that it eliminates the ability of mad clients to confuse their = mad services - a response will only ever match to a single send, even if the client provides the same TID for multiple sends. =20 As you mention, having a generated TID might cause some weird behavior = at the receiving end, hence my original question - is it valid for a MAD = sender to encode some meaningful value into the TID? I would expect not - the = TID should be opaque to the recipient, in which case having AL generate the = full TID independently of the client's TID would work well. =20 If a policy is needed I don't think it's worth making the change. If there's no policy needed, however, I think not exposing any limitation = on TID usage to clients is beneficial. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 11:07 AM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 We tried to do something like this, but AL doesn't try to interpret the data. So, it's not clear what policy AL would use when assigning TIDs. Does it always generate a new TID, or does it try to re-use TIDs? The result may not be the same at the receiver's side. And trying to guess based on the TID given by the client causes all sorts of head-aches = trying to cache information. =20 - Sean =20 =20 -----Original Message----- From: Fab Tillier [mailto:fti...@in...]=20 Sent: Thursday, February 19, 2004 10:53 AM To: Hefty, Sean; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 The TIDs on the wire would still be unique, with IBAL using the managing = the full 64-bits rather than only 32-bits. A client would be none the = wiser, though. I'm not suggesting AL shouldn't assign the TID, just expanding = how much of the TID AL assigns, but preserving the illusion at the client interface that the user has access to the full 64-bit TID. AL can = easily keep the association between its generated TID and the client's = requested TID for response processing. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 10:51 AM To: Tillier, Fabian; Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 TIDs must be unique from the same source. If clients are given full = access to the TID, then they must coordinate between each other to ensure that = none of them put the same TID on the wire as another client. The easiest solution is to have AL assign part of the TID, which ensures that no two clients above AL use the same TID. =20 - Sean =20 =20 -----Original Message----- From: inf...@li... [mailto:inf...@li...] On Behalf = Of Tillier, Fabian Sent: Wednesday, February 18, 2004 10:53 PM To: Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Ok, let's take a step back on this and look at the general question independently from implementation: Goal: - Make the full 64-bit TID available to MAD service clients.=20 Requirements: - Preserve IBAL's ability to route solicited responses to the = proper client.=20 - Responses to sends must report the client TID specified in = matching send request.=20 Cost: - On-the-wire TID would be fully independent of client's specified TID. That is, no part of the TID in a MAD specified in ib_send_mad = would make it to the wire - the full 64-bits would be overridden by IBAL. Is the cost worth the goal? Are there any compliance issues in having = the on-wire TID decoupled from the client's requested TID? Note that I'm not looking for implementation details, only input on the general design tradeoffs. Thanks, - Fab -----Original Message----- From: Eitan Zahavi [mailto:ei...@me...] Sent: Wednesday, February 18, 2004 10:34 PM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL The TID is used to map mads back to different clients forcing a = uniqueness on the receiver side. I think that the only other option is to add an API to get a unique TID = from the driver. But I think the current approach is better. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -----Original Message----- From: Fab Tillier [mailto:fti...@in...] Sent: Wednesday, February 18, 2004 9:51 PM To: inf...@li... Subject: [Infiniband-access_layer] TID management in AL Is there a requirement that any part of the TID specified by a MAD = service client go on the wire? Is it legal for a client to encode a value in the TID that has meaning = to the recipient of a MAD? If not, would we want to change TID usage to allow a client to use the = full 64-bits of the TID in sends, and have that TID restored in any matching responses? Currently, the mad service stores the client TID in the send tracking structure's work request (h_send->mad_wr.client_tid). For the send = side, the mad service already properly restores the full 64-bits of the client TID. For responses, only the 32-bits reserved for the client are used = to match an incoming MAD response to a send. If the mad service had as a member a TID counter that it incremented for every send, the on-wire TID would be the client ID for the mad service in the upper 32-bits, and = this counter in the lower 32-bits - no part of the client's TID would = actually go on the wire. Response processing would then match to the send using = this counter value, and then be able to restore the full 64-bit client TID = for the response MADs. Thoughts? - Fab |
From: Hefty, S. <sea...@in...> - 2004-02-19 19:13:23
|
We tried to do something like this, but AL doesn't try to interpret the data. So, it's not clear what policy AL would use when assigning TIDs. Does it always generate a new TID, or does it try to re-use TIDs? The result may not be the same at the receiver's side. And trying to guess based on the TID given by the client causes all sorts of head-aches trying to cache information. =20 - Sean =20 =20 -----Original Message----- From: Fab Tillier [mailto:fti...@in...]=20 Sent: Thursday, February 19, 2004 10:53 AM To: Hefty, Sean; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 The TIDs on the wire would still be unique, with IBAL using the managing the full 64-bits rather than only 32-bits. A client would be none the wiser, though. I'm not suggesting AL shouldn't assign the TID, just expanding how much of the TID AL assigns, but preserving the illusion at the client interface that the user has access to the full 64-bit TID. AL can easily keep the association between its generated TID and the client's requested TID for response processing. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 10:51 AM To: Tillier, Fabian; Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 TIDs must be unique from the same source. If clients are given full access to the TID, then they must coordinate between each other to ensure that none of them put the same TID on the wire as another client. The easiest solution is to have AL assign part of the TID, which ensures that no two clients above AL use the same TID. =20 - Sean =20 =20 -----Original Message----- From: inf...@li... [mailto:inf...@li...] On Behalf Of Tillier, Fabian Sent: Wednesday, February 18, 2004 10:53 PM To: Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Ok, let's take a step back on this and look at the general question independently from implementation: Goal: - Make the full 64-bit TID available to MAD service clients.=20 Requirements: - Preserve IBAL's ability to route solicited responses to the proper client.=20 - Responses to sends must report the client TID specified in matching send request.=20 Cost: - On-the-wire TID would be fully independent of client's specified TID. That is, no part of the TID in a MAD specified in ib_send_mad would make it to the wire - the full 64-bits would be overridden by IBAL. Is the cost worth the goal? Are there any compliance issues in having the on-wire TID decoupled from the client's requested TID? Note that I'm not looking for implementation details, only input on the general design tradeoffs. Thanks, - Fab -----Original Message----- From: Eitan Zahavi [mailto:ei...@me...] Sent: Wednesday, February 18, 2004 10:34 PM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL The TID is used to map mads back to different clients forcing a uniqueness on the receiver side. I think that the only other option is to add an API to get a unique TID from the driver. But I think the current approach is better. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -----Original Message----- From: Fab Tillier [mailto:fti...@in...] Sent: Wednesday, February 18, 2004 9:51 PM To: inf...@li... Subject: [Infiniband-access_layer] TID management in AL Is there a requirement that any part of the TID specified by a MAD service client go on the wire? Is it legal for a client to encode a value in the TID that has meaning to the recipient of a MAD? If not, would we want to change TID usage to allow a client to use the full 64-bits of the TID in sends, and have that TID restored in any matching responses? Currently, the mad service stores the client TID in the send tracking structure's work request (h_send->mad_wr.client_tid). For the send side, the mad service already properly restores the full 64-bits of the client TID. For responses, only the 32-bits reserved for the client are used to match an incoming MAD response to a send. If the mad service had as a member a TID counter that it incremented for every send, the on-wire TID would be the client ID for the mad service in the upper 32-bits, and this counter in the lower 32-bits - no part of the client's TID would actually go on the wire. Response processing would then match to the send using this counter value, and then be able to restore the full 64-bit client TID for the response MADs. Thoughts? - Fab |
From: Fab T. <fti...@in...> - 2004-02-19 19:01:20
|
The TIDs on the wire would still be unique, with IBAL using the managing = the full 64-bits rather than only 32-bits. A client would be none the = wiser, though. I'm not suggesting AL shouldn't assign the TID, just expanding = how much of the TID AL assigns, but preserving the illusion at the client interface that the user has access to the full 64-bit TID. AL can = easily keep the association between its generated TID and the client's = requested TID for response processing. =20 - Fab =20 -----Original Message----- From: Hefty, Sean [mailto:sea...@in...]=20 Sent: Thursday, February 19, 2004 10:51 AM To: Tillier, Fabian; Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 TIDs must be unique from the same source. If clients are given full = access to the TID, then they must coordinate between each other to ensure that = none of them put the same TID on the wire as another client. The easiest solution is to have AL assign part of the TID, which ensures that no two clients above AL use the same TID. =20 - Sean =20 =20 -----Original Message----- From: inf...@li... [mailto:inf...@li...] On Behalf = Of Tillier, Fabian Sent: Wednesday, February 18, 2004 10:53 PM To: Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Ok, let's take a step back on this and look at the general question independently from implementation: Goal: - Make the full 64-bit TID available to MAD service clients.=20 Requirements: - Preserve IBAL's ability to route solicited responses to the = proper client.=20 - Responses to sends must report the client TID specified in = matching send request.=20 Cost: - On-the-wire TID would be fully independent of client's specified TID. That is, no part of the TID in a MAD specified in ib_send_mad = would make it to the wire - the full 64-bits would be overridden by IBAL. Is the cost worth the goal? Are there any compliance issues in having = the on-wire TID decoupled from the client's requested TID? Note that I'm not looking for implementation details, only input on the general design tradeoffs. Thanks, - Fab -----Original Message----- From: Eitan Zahavi [mailto:ei...@me...] Sent: Wednesday, February 18, 2004 10:34 PM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL The TID is used to map mads back to different clients forcing a = uniqueness on the receiver side. I think that the only other option is to add an API to get a unique TID = from the driver. But I think the current approach is better. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -----Original Message----- From: Fab Tillier [mailto:fti...@in...] Sent: Wednesday, February 18, 2004 9:51 PM To: inf...@li... Subject: [Infiniband-access_layer] TID management in AL Is there a requirement that any part of the TID specified by a MAD = service client go on the wire? Is it legal for a client to encode a value in the TID that has meaning = to the recipient of a MAD? If not, would we want to change TID usage to allow a client to use the = full 64-bits of the TID in sends, and have that TID restored in any matching responses? Currently, the mad service stores the client TID in the send tracking structure's work request (h_send->mad_wr.client_tid). For the send = side, the mad service already properly restores the full 64-bits of the client TID. For responses, only the 32-bits reserved for the client are used = to match an incoming MAD response to a send. If the mad service had as a member a TID counter that it incremented for every send, the on-wire TID would be the client ID for the mad service in the upper 32-bits, and = this counter in the lower 32-bits - no part of the client's TID would = actually go on the wire. Response processing would then match to the send using = this counter value, and then be able to restore the full 64-bit client TID = for the response MADs. Thoughts? - Fab |
From: Hefty, S. <sea...@in...> - 2004-02-19 18:57:00
|
TIDs must be unique from the same source. If clients are given full access to the TID, then they must coordinate between each other to ensure that none of them put the same TID on the wire as another client. The easiest solution is to have AL assign part of the TID, which ensures that no two clients above AL use the same TID. =20 - Sean =20 =20 -----Original Message----- From: inf...@li... [mailto:inf...@li...] On Behalf Of Tillier, Fabian Sent: Wednesday, February 18, 2004 10:53 PM To: Eitan Zahavi; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL =20 Ok, let's take a step back on this and look at the general question independently from implementation: Goal: - Make the full 64-bit TID available to MAD service clients.=20 Requirements: - Preserve IBAL's ability to route solicited responses to the proper client.=20 - Responses to sends must report the client TID specified in matching send request.=20 Cost: - On-the-wire TID would be fully independent of client's specified TID. That is, no part of the TID in a MAD specified in ib_send_mad would make it to the wire - the full 64-bits would be overridden by IBAL. Is the cost worth the goal? Are there any compliance issues in having the on-wire TID decoupled from the client's requested TID? Note that I'm not looking for implementation details, only input on the general design tradeoffs. Thanks, - Fab -----Original Message----- From: Eitan Zahavi [mailto:ei...@me...] Sent: Wednesday, February 18, 2004 10:34 PM To: Tillier, Fabian; inf...@li... Subject: RE: [Infiniband-access_layer] TID management in AL The TID is used to map mads back to different clients forcing a uniqueness on the receiver side. I think that the only other option is to add an API to get a unique TID from the driver. But I think the current approach is better. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -----Original Message----- From: Fab Tillier [mailto:fti...@in...] Sent: Wednesday, February 18, 2004 9:51 PM To: inf...@li... Subject: [Infiniband-access_layer] TID management in AL Is there a requirement that any part of the TID specified by a MAD service client go on the wire? Is it legal for a client to encode a value in the TID that has meaning to the recipient of a MAD? If not, would we want to change TID usage to allow a client to use the full 64-bits of the TID in sends, and have that TID restored in any matching responses? Currently, the mad service stores the client TID in the send tracking structure's work request (h_send->mad_wr.client_tid). For the send side, the mad service already properly restores the full 64-bits of the client TID. For responses, only the 32-bits reserved for the client are used to match an incoming MAD response to a send. If the mad service had as a member a TID counter that it incremented for every send, the on-wire TID would be the client ID for the mad service in the upper 32-bits, and this counter in the lower 32-bits - no part of the client's TID would actually go on the wire. Response processing would then match to the send using this counter value, and then be able to restore the full 64-bit client TID for the response MADs. Thoughts? - Fab |