I apologize. I misunderstood your original question.=20
Steve reviewed the VAPI path and agrees with your assessment. Private=20
data support is not yet implemented on this code path.
Unfortunately, I cannot give you a time estimate on when this will=20
be implemented. As I've mentioned before, I develop on the IB API=20
and rely on others to maintain the VAPI code paths. If you are in=20
desperate need of this functionality, I would encourage you to=20
investigate implementing this feature yourself. If you are interested=20
in this option, check out section 6.4 of the Mellanox CM manual (which=20
is part of our SourceForge API docs). The documentation says you set=20
up private data when you transition the QP to the INIT state. =20
If you were willing to contribute this feature to SourceForge, that=20
would be much appreciated.
james
On Tue, 29 Apr 2003, Biju Raman wrote:
br76> Hi James
br76>=20
br76>=20
br76> Regarding private data- we are having a mellanox setup with a
working CM
br76> library. So the problem i had reported was with respect to a
working CM
br76> and not CM_BUSTED scenario.
br76>=20
br76> Thanks
br76> biju
br76>=20
br76>=20
br76> On Tue, 29 Apr 2003, Lentini, James wrote:
br76>=20
br76> >
br76> > Answers below:
br76> >
br76> > On Mon, 28 Apr 2003, Biju Raman wrote:
br76> >
br76> > br76> Hi
br76> > br76>
br76> > br76> Thank you.
br76> > br76>
br76> > br76> There are two issues which i believe is attributed to
dapl-vapi
br76> > related
br76> > br76> integration code.
br76> > br76>
br76> > br76> 1) Private data functionality - we were not able to send
private
br76> > data from
br76> > br76> an active endpoint to a psp endpoint. The private data
size
br76> > received on
br76> > br76> the psp side was 0. We checked the dapl_vapi_cm.c for
br76> > dapls_ib_connect()
br76> > br76> code - still for !CM_BUSTED scenario - private data
funcionality
br76> > is not
br76> > br76> implemented.
br76> > br76>
br76> > br76> I guess this will be taken care of when the new dapl-vapi
code is
br76> > ready.
br76> >
br76> > Private data is only supported with a working CM. When CM_BUSTED
is
br76> > defined, DAPL uses a heuristic to connect two QPs. The code
br76> > transitions the EP's QP to the RTS state using the same process
as the
br76> > CM. However, there is no protocol to synchronize with the remote
side.
br76> >
br76> > To perform the transitions, the QPN of the remote side,
QPN_remote,
br76> > must be known. The code assumes that QPN_remote =3D QPN_local =
if
br76> > connecting to a remote LID and QPN_remote =3D QPN_local & 0x1 =
when
br76> > connecting to a local QP (DAPL_LOOPBACK).
br76> >
br76> > Since there is no protocol used to setup the connection, there
is no
br76> > way to exchange any private data.
br76> >
br76> > br76> 2) NULL DTOs EVDs on dat_ep_create() - we got error
br76> > br76> on
br76> > br76> invoking dat_ep_create() with null dtos for recv and
request evd.
br76> > the
br76> > br76> following is what we got on debugging.
br76> > br76>
br76> > br76> *********** *******************************
br76> > br76>
br76> > br76> THH(3): thhul_qpm.c[1365]: Got a request for a QP with 0
WQEs on
br76> > both SQ
br76> > br76> and RQ
br76> > br76> - rejecting !
br76> > br76> THH(1): thhul_qpm.c[1163]: : Failed allocating WQEs
buffers.
br76> > br76>
br76> > br76> failed code =3D -246
br76> > br76> dapls_ib_qp_alloc() - VAPI_create_qp() - -246
br76> > br76>
br76> > br76> *****************************************
br76> > br76>
br76> >
br76> >
br76> > The first two message are not from DAPL. They appear to have
been
br76> > generated by your IB stack. These errors result in
VAPI_create_qp()
br76> > which causes dapls_ib_qp_alloc() to fail.
br76> >
br76> > I would suggest you try the following:
br76> >
br76> > - Check the call to dat_ep_create(), verify that DAT_HANDLE_NULL
is
br76> > being passed for the send and receive EVD parameters
br76> >
br76> > - Determine the value that should be passed to VAPI_create_qp()
to
br76> > specify no send or receive CQ. While setting up a QP in this
manner
br76> > seems reasonable, it may not be supported yet. You should check
with
br76> > the supplier of your IB software to see if this feature is
implemented
br76> > and if so how to use it.
br76> >
br76> > - Verify that DAPL passes the correct CQ values to
VAPI_create_qp()
br76> > when an EVD parameter of dat_ep_create() is DAT_HANDLE_NULL
br76> >
br76> > br76>
br76> > br76> Thanks & Regards
br76> > br76> Biju,MahaA
br76> > br76>
br76> > br76>
br76> > br76> On Mon, 28 Apr 2003, Lentini, James wrote:
br76> > br76>
br76> > br76> >
br76> > br76> > Thank you very much for the patch. Sending a tar file
containing
br76> > the
br76> > br76> > updated files is perfect.
br76> > br76> >
br76> > br76> > I have incorporated your changes into my sandbox and
will
br76> > hopefully
br76> > br76> > check them in shortly. I will follow up with Mellanox to
obtain
br76> > the
br76> > br76> > new updated VAPI header files. Apparently the
SourceForge
br76> > headers are
br76> > br76> > out of date and do not contain definitions for all of
your
br76> > fixes. For
br76> > br76> > example, the SourceForge headers do not contain a
definition for
br76> > the
br76> > br76> > EVAPI_async_handler_hndl_t type.
br76> > br76> >
br76> > br76> > I imagine that this is why your makefile defined the
Mellanox
br76> > include
br76> > br76> > directory as
br76> > br76> >
br76> > br76> > /usr/mellanox/include (Mellanox SDK headers)
br76> > br76> >
br76> > br76> > instead of
br76> > br76> >
br76> > br76> > ../include/ib/MELLANOX (DAPL SourceForge headers)
br76> > br76> >
br76> > br76> > Thanks,
br76> > br76> > james
br76> > br76> >
br76> > br76> > On Fri, 25 Apr 2003, Biju Raman wrote:
br76> > br76> >
br76> > br76> > br76>
br76> > br76> > br76> Hi James
br76> > br76> > br76>
br76> > br76> > br76>
br76> > br76> > br76> I have attached the tar file which contains the
following
br76> > files
br76> > br76> > br76>
br76> > br76> > br76> 1) dapl/udapl/Makefile
br76> > br76> > br76> 2) dapl/vapi/dapl_vapi_util.c and
br76> > dapl/vapi/dapl_vapi_util.h
br76> > br76> > br76>
br76> > br76> > br76> Can you please update with these files ??
br76> > br76> > br76>
br76> > br76> > br76> Frankly we are not aware of how a patch needs to
be
br76> > submitted. So
br76> > br76> > if
br76> > br76> > br76> there is any specific procedure which needs to be
followed
br76> > in
br76> > br76> > order to
br76> > br76> > br76> submit files please do tell us.
br76> > br76> > br76>
br76> > br76> > br76>
br76> > br76> > br76> Thanks & regards
br76> > br76> > br76> Biju, Maha
br76> > br76> > br76>
br76> > br76> > br76>
br76> > br76> > br76> On Fri, 25 Apr 2003, Vu Pham wrote:
br76> > br76> > br76>
br76> > br76> > br76> > Hi Biju,
br76> > br76> > br76> > I am responsible for updating/contributing
all files
br76> > under
br76> > br76> > ~dapl/vapi
br76> > br76> > br76> > along with our MSDK
br76> > br76> > br76> > Howerver, James and referenced team recently
are
br76> > moving with
br76> > br76> > br76> > speed-of-light and I am behind to update
~dapl/vapi with
br76> > new
br76> > br76> > dapl
br76> > br76> > br76> > alpha-release.
br76> > br76> > br76> > Since the project is in SF domain, please
feel free
br76> > to
br76> > br76> > contribute your
br76> > br76> > br76> > changes if possible
br76> > br76> > br76> >
br76> > br76> > br76> > Thanks,
br76> > br76> > br76> > Vu
br76> > br76> > br76> >
br76> > br76> > br76> > -----Original Message-----
br76> > br76> > br76> > From: Biju Raman [mailto:br76@...]
br76> > br76> > br76> > Sent: Friday, April 25, 2003 12:47 PM
br76> > br76> > br76> > To: Lentini, James
br76> > br76> > br76> > Cc: dapl-devel@...;
br76> > maha@...;
br76> > br76> > Kevin Deierling
br76> > br76> > br76> > Subject: RE: [Dapl-devel] About dapl
br76> > br76> > br76> >
br76> > br76> > br76> > Hi James and Vu
br76> > br76> > br76> >
br76> > br76> > br76> > I guess i made it work.
br76> > br76> > br76> >
br76> > br76> > br76> > As i mentioned in my previous mail, i had faced
this
br76> > problem in
br76> > br76> > alpha10
br76> > br76> > br76> > release and Vu had asked to do the following
steps :
br76> > br76> > br76> >
br76> > br76> > br76> > 1) + Disable line 54 and enable line 53
br76> > br76> > br76> > 2)+ Using this patch files (dapl_vapi_util.h
and
br76> > br76> > dapl_vapi_util.c) for
br76> > br76> > br76> > new
br76> > br76> > br76> > SDK 0.1.x
br76> > br76> > br76> > 3) lastly add -DOLD_QP_STATE_TO_INIT to
udapl/Makefile
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> > well in dapl_alpha12 release, i tried the same
and it
br76> > ended up
br76> > br76> > giving me
br76> > br76> > br76> > a compilation error of not finding the
prototype for
br76> > br76> > br76> > dapls_modify_qp_state_to_error() function. so i
provided
br76> > the
br76> > br76> > same from the
br76> > br76> > br76> > dapl_vapi_util.h file of alpha12 release.
br76> > br76> > br76> >
br76> > br76> > br76> > And it seems to be working. frankly i am not
sure of its
br76> > side
br76> > br76> > effects or
br76> > br76> > br76> > any problems it may raise in future.
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> > But it works !!. Vu, If what i had done is
correct, then
br76> > the
br76> > br76> > vapi files
br76> > br76> > br76> > may require to be updated.
br76> > br76> > br76> >
br76> > br76> > br76> > If i am wrong, do pardon me.
br76> > br76> > br76> >
br76> > br76> > br76> > Thanks & regards
br76> > br76> > br76> > biju,maha
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> > On Fri, 25 Apr 2003, Biju Raman wrote:
br76> > br76> > br76> >
br76> > br76> > br76> > > Hi James
br76> > br76> > br76> > >
br76> > br76> > br76> > > First of all i apologise for not framing my
previous
br76> > query
br76> > br76> > properly. I
br76> > br76> > br76> > > guess that would have made your reply short.
:-))
br76> > br76> > br76> > >
br76> > br76> > br76> > > As you had guessed i am using the default mode
for
br76> > dapltest
br76> > br76> > with static
br76> > br76> > br76> > > registry.
br76> > br76> > br76> > >
br76> > br76> > br76> > > The problem is the following :-
br76> > br76> > br76> > >
br76> > br76> > br76> > > Initially the dat.conf file - contained the
entry for
br76> > br76> > br76> > > dapl_alpha10/libdapl.so file. So when i tested
the
br76> > alpha12
br76> > br76> > release with
br76> > br76> > br76> > > the same file it worked.
br76> > br76> > br76> > >
br76> > br76> > br76> > > But when i replaced the entry(in dat.conf)
file with
br76> > br76> > alpha12/libdapl.so,
br76> > br76> > br76> > > it started giving errors.
br76> > br76> > br76> > >
br76> > br76> > br76> > > VAPI_create_qp() returns failed code -235.
This api
br76> > is
br76> > br76> > called from
br76> > br76> > br76> > > dapls_ib_qp_alloc() which inturn is invoked
during
br76> > br76> > dapl_ep_create().
br76> > br76> > br76> > >
br76> > br76> > br76> > > Since VAPI_create_qp() is a mellanox verb api
, i am
br76> > not able
br76> > br76> > to debug
br76> > br76> > br76> > > further.
br76> > br76> > br76> > >
br76> > br76> > br76> > > Please understand that this is not a reference
br76> > implementation
br76> > br76> > br76> > > error.
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > > Even during alpha10 release,we faced this
problem.At
br76> > that
br76> > br76> > time, Vu had
br76> > br76> > br76> > > asked me to do the following
br76> > br76> > br76> > >
br76> > br76> > br76> > > 1) + Disable line 54 and enable line 53
br76> > br76> > br76> > > 2)+ Using this patch files (dapl_vapi_util.h
and
br76> > br76> > dapl_vapi_util.c) for new
br76> > br76> > br76> > > SDK 0.1.x
br76> > br76> > br76> > > 3) lastly add -DOLD_QP_STATE_TO_INIT to
udapl/Makefile
br76> > br76> > br76> > >
br76> > br76> > br76> > > I tried doing the same but ended up in
compilatio
br76> > errors.
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > > so if i believe it has to do with VAPI related
code.
br76> > Vu any
br76> > br76> > suggestions ??
br76> > br76> > br76> > >
br76> > br76> > br76> > > Thanks & Regards
br76> > br76> > br76> > > Biju, Maha
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> > > On Fri, 25 Apr 2003, Lentini, James wrote:
br76> > br76> > br76> > >
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > Answers below:
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > On Thu, 24 Apr 2003, Biju Raman wrote:
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > br76> Hi
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> Thanks for the info
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> 1) Regarding the issue of
dat_ep_create()
br76> > generating
br76> > br76> > br76> > > > DAT_INTERNAL_ERROR
br76> > br76> > br76> > > > br76> - i noticed that as long as there is
no entry
br76> > in
br76> > br76> > dat.conf for the
br76> > br76> > br76> > > > alpha12
br76> > br76> > br76> > > > br76> version. The dapltest works fine else
it
br76> > generates the
br76> > br76> > following
br76> > br76> > br76> > > > error
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> ************************
br76> > br76> > br76> > > > br76> bash-2.05$ ./dapltest -T S -d -D
InfiniHost0
br76> > br76> > br76> > > > br76> Server_Cmd.debug: 1
br76> > br76> > br76> > > > br76> Server_Cmd.dapl_name: InfiniHost0
br76> > br76> > br76> > > > br76> DT_cs_Server: IA InfiniHost0 opened
br76> > br76> > br76> > > > br76> DT_cs_Server: PZ created
br76> > br76> > br76> > > > br76> failed code =3D -235
br76> > br76> > br76> > > > br76> DT_cs_Server: dat_ep_create error:
br76> > DAT_INTERNAL_ERROR
br76> > br76> > br76> > > > br76> DT_cs_Server: Waiting for clients ...
br76> > br76> > br76> > > > br76> DT_cs_Server: Cleaning up ...
br76> > br76> > br76> > > > br76> DT_cs_Server: IA InfiniHost0 closed
br76> > br76> > br76> > > > br76> DT_cs_Server (InfiniHost0): Exiting.
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> ********************
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> The following are the contents of the
br76> > /etc/dat.conf
br76> > br76> > file
br76> > br76> > br76> > > > br76> **************************
br76> > br76> > br76> > > > br76> InfiniHost0 u1.0 nonthreadsafe default
br76> > br76> > br76> > > > br76>
br76> > br76> >
/root/infiniband/dapl_alpha12/dapl/udapl/Target/libdapl.so 1.0
br76> > ""
br76> > br76> > br76> > > > br76> **********************************
br76> > br76> > br76> > > > br76> Is something wrong with this
configuration
br76> > setting ???
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> (our infiniband setup - mellanox
hardware (ver
br76> > A0) and
br76> > br76> > sdk version
br76> > br76> > br76> > > > 1.0.)
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > Before answering your questions, let me
provide a
br76> > bit of
br76> > br76> > background
br76> > br76> > br76> > > > on the static registry. There are two mays
in which
br76> > a uDAPL
br76> > br76> > provider
br76> > br76> > br76> > > > library may be loaded into the address space
of a
br76> > uDAPL
br76> > br76> > consumer:
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > - static registry
br76> > br76> > br76> > > > - explicit reference
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > The first option requires an entry to be
placed in
br76> > the
br76> > br76> > static registry
br76> > br76> > br76> > > > database associating a particular IA name
with a
br76> > uDPAL
br76> > br76> > provider
br76> > br76> > br76> > > > library.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > The second option requires the uDAPL
consumer to
br76> > setup a
br76> > br76> > dependency
br76> > br76> > br76> > > > between their application's executable and
the
br76> > provider
br76> > br76> > library. Some
br76> > br76> > br76> > > > of the mechanisms that might be used include
br76> > statically
br76> > br76> > linking a
br76> > br76> > br76> > > > provider library to the executable,
dynamically
br76> > linking the
br76> > br76> > provider
br76> > br76> > br76> > > > library to the executable, or dynamically
loading
br76> > (e.g. via
br76> > br76> > dlopen()
br76> > br76> > br76> > > > on Linux) the provider library. Regardless
of the
br76> > method
br76> > br76> > chosen the
br76> > br76> > br76> > > > end result is that the provider library is
included
br76> > in the
br76> > br76> > uDAPL
br76> > br76> > br76> > > > consumer's address space outside the scope
of the
br76> > static
br76> > br76> > registry.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > When building dapltest, either the static
registry
br76> > or
br76> > br76> > explicit method
br76> > br76> > br76> > > > may be used. Therefore it is important to
know the
br76> > platform
br76> > br76> > you are
br76> > br76> > br76> > > > running on (Linux or Windows) and the
command you
br76> > use to
br76> > br76> > build.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > Since you are using the static registry
file, I'll
br76> > assume
br76> > br76> > you are
br76> > br76> > br76> > > > working on Linux.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > To build dapltest, you can simply type
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > > make
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > in the dapltest directory and create a
dapltest
br76> > executable
br76> > br76> > that will
br76> > br76> > br76> > > > exclusively use the static registry. To
verify this,
br76> > type
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > > ldd dapltest
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > you should see output similar to
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > libdat.so =3D>
br76> > br76> > test/udapl/dapltest/../../../dat/udat/Target/libdat.so
br76> > br76> > br76> > > > (0x40014000)
br76> > br76> > br76> > > > libpthread.so.0 =3D> =
/lib/i686/libpthread.so.0
br76> > (0x40036000)
br76> > br76> > br76> > > > libc.so.6 =3D> /lib/i686/libc.so.6
(0x42000000)
br76> > br76> > br76> > > > libdl.so.2 =3D> /lib/libdl.so.2 (0x4004a000)
br76> > br76> > br76> > > > /lib/ld-linux.so.2 =3D> /lib/ld-linux.so.2
br76> > (0x40000000)
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > To setup an explicit reference to the uDAPL
br76> > reference
br76> > br76> > implementations
br76> > br76> > br76> > > > provider library, type
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > > make EXPLICIT_LINK=3D1
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > This will create a dynamic dependency
between the
br76> > dapltest
br76> > br76> > executable
br76> > br76> > br76> > > > and the uDAPL reference implementation
provider
br76> > library:
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > > ldd dapltest
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > libdat.so =3D>
br76> > br76> > test/udapl/dapltest/../../../dat/udat/Target/libdat.so
br76> > br76> > br76> > > > (0x40015000)
br76> > br76> > br76> > > > libpthread.so.0 =3D> =
/lib/i686/libpthread.so.0
br76> > (0x40036000)
br76> > br76> > br76> > > > libdapl.so =3D>
br76> > br76> >
test/udapl/dapltest/../../../dapl/udapl/Target/libdapl.so
br76> > br76> > br76> > > > (0x4004a000)
br76> > br76> > br76> > > > libc.so.6 =3D> /lib/i686/libc.so.6
(0x42000000)
br76> > br76> > br76> > > > libdl.so.2 =3D> /lib/libdl.so.2 (0x40061000)
br76> > br76> > br76> > > > libJniTavorVerbs.so =3D>
/usr/lib/libJniTavorVerbs.so
br76> > br76> > (0x40064000)
br76> > br76> > br76> > > > /lib/ld-linux.so.2 =3D> /lib/ld-linux.so.2
br76> > (0x40000000)
br76> > br76> > br76> > > > librt.so.1 =3D> /lib/librt.so.1 (0x4007b000)
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > Which way did you build dapltest? What is
the output
br76> > of ldd
br76> > br76> > dapltest?
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > Assuming you built dapltest using the
default
br76> > method, your
br76> > br76> > results are
br76> > br76> > br76> > > > very strange. In this case, no entries in
the static
br76> > br76> > registry file
br76> > br76> > br76> > > > should have resulted in no IA addresses
being found
br76> > and the
br76> > br76> > dapltest
br76> > br76> > br76> > > > server failing.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > The fact that dapltest fails when there is a
static
br76> > registry
br76> > br76> > entry
br76> > br76> > br76> > > > present likely does not have anything to do
with the
br76> > static
br76> > br76> > registry.
br76> > br76> > br76> > > > >From the output you included, it appears
that the
br76> > provider
br76> > br76> > library was
br76> > br76> > br76> > > > found and that DAT API calls are failing
(first
br76> > br76> > dat_pz_create() then
br76> > br76> > br76> > > > dat_ep_create()). The source of this problem
is
br76> > likely quite
br76> > br76> > br76> > > > different.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > The error message " failed code =3D -235"
appears to
br76> > mean that
br76> > br76> > br76> > > > the call to VAPI_alloc_pd() failed in
br76> > br76> > br76> > > > dapls_ib_pd_alloc()
dapl/vapi/dapl_vapi_util.c line
br76> > 404.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > A scan of the VAPI headers shows that error
code
br76> > -235
br76> > br76> > appears to be
br76> > br76> > br76> > > > VAPI_EINVAL_PKEY_IX. Unfortunately I don't
know how
br76> > to use
br76> > br76> > that
br76> > br76> > br76> > > > information to zero in on the problem. I
suggest you
br76> > place a
br76> > br76> > br76> > > > breakpoint in VAPI_alloc_pd() and step
through the
br76> > code to
br76> > br76> > determine
br76> > br76> > br76> > > > why this error is being returned.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > br76> 2) Regarding dat_cr_reject() - the
br76> > segmentation fault
br76> > br76> > occurs on
br76> > br76> > br76> > > > executing
br76> > br76> > br76> > > > br76> CM_reject_connect_req().
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > By "on executing" I assume you mean that the
br76> > segmentation
br76> > br76> > fault is
br76> > br76> > br76> > > > within CM_reject_connect_req(). If this is
the case
br76> > then it
br76> > br76> > is
br76> > br76> > br76> > > > unlikely to be a bug in the reference
br76> > implementation.
br76> > br76> > br76> > > >
br76> > br76> > br76> > > > br76> Thanks & Regards
br76> > br76> > br76> > > > br76> Biju, Maha
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> On Thu, 24 Apr 2003, Kevin Deierling
wrote:
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76> > Hi,
br76> > br76> > br76> > > > br76> > I agree with James that it is
likely the
br76> > issue
br76> > br76> > is in the
br76> > br76> > br76> > > > Mellanox CM
br76> > br76> > br76> > > > br76> > that is currently in the SDK. We are
moving
br76> > to the
br76> > br76> > IB Access
br76> > br76> > br76> > > > Layer (IBAL) CM
br76> > br76> > br76> > > > br76> > ( http://infiniband.sourceforge.net/
br76> > br76> > br76> > > > <http://infiniband.sourceforge.net/> )
br76> > br76> > br76> > > > br76> > and will shortly have a uDAPL
implementation
br76> > on top
br76> > br76> > of this API
br76> > br76> > br76> > > > and CM. The
br76> > br76> > br76> > > > br76> > IBAL CM is more robust and scalable
and I
br76> > believe
br76> > br76> > will solve the
br76> > br76> > br76> > > > issue you
br76> > br76> > br76> > > > br76> > are seeing. We expect to have an
alpha
br76> > version of
br76> > br76> > uDAPL/IBAL the
br76> > br76> > br76> > > > first week
br76> > br76> > br76> > > > br76> > in May but will need to work with
Netapp to
br76> > get this
br76> > br76> > integrated
br76> > br76> > br76> > > > into the
br76> > br76> > br76> > > > br76> > DAPL SourceForge distribution and
stabilize
br76> > the
br76> > br76> > code.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > KD
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > -----Original Message-----
br76> > br76> > br76> > > > br76> > From: Lentini, James
br76> > br76> > [mailto:James.Lentini@...]
br76> > br76> > br76> > > > br76> > Sent: Thursday, April 24, 2003 9:25
AM
br76> > br76> > br76> > > > br76> > To: Biju Raman
br76> > br76> > br76> > > > br76> > Cc:
dapl-devel@...;
br76> > br76> > maha@...
br76> > br76> > br76> > > > br76> > Subject: Re: [Dapl-devel] About dapl
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > Answers below:
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > On Thu, 24 Apr 2003, Biju Raman
wrote:
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > br76> Hi
br76> > br76> > br76> > > > br76> > br76>
br76> > br76> > br76> > > > br76> > br76> I have some doubts about udapl
apis
br76> > br76> > br76> > > > br76> > br76>
br76> > br76> > br76> > > > br76> > br76>
br76> > br76> > br76> > > > br76> > br76> 1) As per the udapl specs, in
br76> > dat_ia_open()-
br76> > br76> > the first
br76> > br76> > br76> > > > consumer that
br76> > br76> > br76> > > > br76> > opens
br76> > br76> > br76> > > > br76> > br76> IA is supposed to pass
br76> > "DAT_HANDLE_NULL" for
br76> > br76> > br76> > > > async_evd_handle.
br76> > br76> > br76> > > > br76> > Subsequent
br76> > br76> > br76> > > > br76> > br76> attempts to open IA should
have
br76> > br76> > DAT_ASYNC_EXISTS value for
br76> > br76> > br76> > > > br76> > br76> async_evd_handle.
br76> > br76> > br76> > > > br76> > br76>
br76> > br76> > br76> > > > br76> > br76> Does this restriction apply to
br76> > consumers
br76> > br76> > within a single
br76> > br76> > br76> > > > br76> > br76> process or across processes (i
mean
br76> > different
br76> > br76> > dat
br76> > br76> > br76> > > > applications )with
br76> > br76> > br76> > > > br76> > in
br76> > br76> > br76> > > > br76> > br76> a host machine which has a
single hca
br76> > ??
br76> > br76> > little confused
br76> > br76> > br76> > > > abt the
br76> > br76> > br76> > > > br76> > statement
br76> > br76> > br76> > > > br76> > br76> in udapl specs .
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > Within a single DAT consumer, there
may be
br76> > only one
br76> > br76> > asynchronous
br76> > br76> > br76> > > > EVD
br76> > br76> > br76> > > > br76> > registered. This restriction is
inherited
br76> > from
br76> > br76> > InfiniBand.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > In InfiniBand, only one asynchronous
event
br76> > handler
br76> > br76> > may be
br76> > br76> > br76> > > > br76> > registered (see the InfiniBand
br76> > Specification,
br76> > br76> > release 1.1, vol
br76> > br76> > br76> > > > I,
br76> > br76> > br76> > > > br76> > section 11.5.2, page 559).
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > The enforcement of this restriction
is left
br76> > to uDAPL
br76> > br76> > consumers.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > I can understand your confusion
regarding
br76> > the number
br76> > br76> > of
br76> > br76> > br76> > > > consumers
br76> > br76> > br76> > > > br76> > which may obtain async evds. The
point the
br76> > text is
br76> > br76> > trying to
br76> > br76> > br76> > > > br76> > make is the following:
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > If an application wishes to obtain
an async
br76> > evd and
br76> > br76> > does not
br76> > br76> > br76> > > > have
br76> > br76> > br76> > > > br76> > special knowledge regarding the
existence of
br76> > other
br76> > br76> > async evds,
br76> > br76> > br76> > > > then it
br76> > br76> > br76> > > > br76> > must request that the dispatcher be
created
br76> > br76> > (DAT_HANDLE_NULL)
br76> > br76> > br76> > > > and not
br76> > br76> > br76> > > > br76> > attempt to use an existing async evd
br76> > br76> > (DAT_EVD_ASYNC_EXISTS).
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > If you have suggestions for how to
state
br76> > this more
br76> > br76> > clearly, I'd
br76> > br76> > br76> > > > br76> > encourage you to send email to the
DAT
br76> > Collaborative
br76> > br76> > email
br76> > br76> > br76> > > > reflector:
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > dat-discussions@...
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > The authors of the DAT specification
read
br76> > this email
br76> > br76> > list and
br76> > br76> > br76> > > > may use
br76> > br76> > br76> > > > br76> > your feedback to improve the text.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > br76> 2)How is the async_evd_handle
useful
br76> > to a dat
br76> > br76> > consumer ? i
br76> > br76> > br76> > > > mean how it
br76> > br76> > br76> > > > br76> > can
br76> > br76> > br76> > > > br76> > br76> be used by a dat consumer ???
is it
br76> > equivalent
br76> > br76> > to
br76> > br76> > br76> > > > VipErrorCallback of
br76> > br76> > br76> > > > br76> > VI
br76> > br76> > br76> > > > br76> > br76> architecture ??
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > All of the EVD operations may be
performed
br76> > on the
br76> > br76> > br76> > > > async_evd_handle.
br76> > br76> > br76> > > > br76> > For example, a consumer could call
br76> > dat_evd_wait() on
br76> > br76> > the
br76> > br76> > br76> > > > br76> > async_evd_handle to wait for an
error to
br76> > occur.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > The same sorts of error messages
that are
br76> > delivered
br76> > br76> > to a VIPL
br76> > br76> > br76> > > > consumer
br76> > br76> > br76> > > > br76> > by VipErrorCallback are delivered
via the
br76> > async evd.
br76> > br76> > However,
br76> > br76> > br76> > > > the
br76> > br76> > br76> > > > br76> > async evd mechanism is much cleaner
than the
br76> > br76> > VipErrorCallback
br76> > br76> > br76> > > > method.
br76> > br76> > br76> > > > br76> > First, event delivery in DAPL is
performed
br76> > using a
br76> > br76> > single,
br76> > br76> > br76> > > > consistent
br76> > br76> > br76> > > > br76> > mechanism. Second, the context of
the
br76> > br76> > VipErrorCallback handler
br76> > br76> > br76> > > > was not
br76> > br76> > br76> > > > br76> > specified. Since the VIPL was not
reentrant,
br76> > it was
br76> > br76> > technically
br76> > br76> > br76> > > > a
br76> > br76> > br76> > > > br76> > violation of the VIPL specification
to call
br76> > VIPL API
br76> > br76> > functions
br76> > br76> > br76> > > > from
br76> > br76> > br76> > > > br76> > the VipErrorCallback handler.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > br76> 1) We recently shifted from
alpha10 to
br76> > alpha12
br76> > br76> > release ,
br76> > br76> > br76> > > > but on
br76> > br76> > br76> > > > br76> > executing
br76> > br76> > br76> > > > br76> > br76> the dapltest and our
applications,
br76> > br76> > DAT_INTERNAL_ERROR
br76> > br76> > br76> > > > occurs on
br76> > br76> > br76> > > > br76> > executing
br76> > br76> > br76> > > > br76> > br76> dat_ep_create ?? The same code
works
br76> > with
br76> > br76> > alpha10.
br76> > br76> > br76> > > > br76> > br76>
br76> > br76> > br76> > > > br76> > br76> our infiniband setup consists
of
br76> > mellanox
br76> > br76> > hardware (ver
br76> > br76> > br76> > > > A0) and sdk
br76> > br76> > br76> > > > br76> > br76> version 1.0.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > We have tested alpha12 extensively
on a
br76> > hardware and
br76> > br76> > software
br76> > br76> > br76> > > > stack
br76> > br76> > br76> > > > br76> > obtained from JNI and not
experienced this
br76> > problem.
br76> > br76> > My guess
br76> > br76> > br76> > > > would be
br76> > br76> > br76> > > > br76> > that either:
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > - there is a disconnect between the
updated
br76> > DAPL
br76> > br76> > software and
br76> > br76> > br76> > > > the
br76> > br76> > br76> > > > br76> > Mellanox SDK
br76> > br76> > br76> > > > br76> > - there is a configuration error of
some
br76> > kind
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > If we rule out the second option, I
think
br76> > the next
br76> > br76> > step is
br76> > br76> > br76> > > > narrowing
br76> > br76> > br76> > > > br76> > down where the error occurs. If you
could
br76> > provide
br76> > br76> > more
br76> > br76> > br76> > > > information
br76> > br76> > br76> > > > br76> > about were in the code the error is
br76> > generated we
br76> > br76> > could give more
br76> > br76> > br76> > > > help.
br76> > br76> > br76> > > > br76> > Is there a specific verb failing?
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > br76> 2)On using dapl_alph10, we
found that
br76> > br76> > dat_cr_reject ()
br76> > br76> > br76> > > > results in
br76> > br76> > br76> > > > br76> > br76> segmentation fault. The error
occurs
br76> > during
br76> > br76> > invocation of
br76> > br76> > br76> > > > br76> > br76> CM_reject_connect_req()
function
br76> > within
br76> > br76> > br76> > > > dapls_ib_reject_connection ()
br76> > br76> > br76> > > > br76> > in
br76> > br76> > br76> > > > br76> > br76> dapl_vapi_cm.c
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > Again, we have not seen this
behavior.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > Do you have a stack trace of the
br76> > segmentation fault?
br76> > br76> > Did the
br76> > br76> > br76> > > > error
br76> > br76> > br76> > > > br76> > occur in the CM_reject_connect_req()
Verb or
br76> > in
br76> > br76> > br76> > > > br76> > dapls_ib_reject_connection()?
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> > If the error occurs in
br76> > CM_reject_connect_req() and
br76> > br76> > the
br76> > br76> > br76> > > > parameters
br76> > br76> > br76> > > > br76> > being passed by DAPL appear
reasonable, you
br76> > should
br76> > br76> > bring this to
br76> > br76> > br76> > > > the
br76> > br76> > br76> > > > br76> > attention of the Mellanox SDK
developers.
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76> >
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > > br76>
br76> > br76> > -------------------------------------------------------
br76> > br76> > br76> > > > br76> This sf.net email is sponsored
by:ThinkGeek
br76> > br76> > br76> > > > br76> Welcome to geek heaven.
br76> > br76> > br76> > > > br76> http://thinkgeek.com/sf
br76> > br76> > br76> > > > br76>
br76> > _______________________________________________
br76> > br76> > br76> > > > br76> Dapl-devel mailing list
br76> > br76> > br76> > > > br76> Dapl-devel@...
br76> > br76> > br76> > > > br76>
br76> > br76> > https://lists.sourceforge.net/lists/listinfo/dapl-devel
br76> > br76> > br76> > > > br76>
br76> > br76> > br76> > > >
br76> > br76> > br76> > > >
br76> > br76> > br76> > >
br76> > br76> > br76> > >
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> >
br76> > br76> > br76> >
-------------------------------------------------------
br76> > br76> > br76> > This sf.net email is sponsored by:ThinkGeek
br76> > br76> > br76> > Welcome to geek heaven.
br76> > br76> > br76> > http://thinkgeek.com/sf
br76> > br76> > br76> > _______________________________________________
br76> > br76> > br76> > Dapl-devel mailing list
br76> > br76> > br76> > Dapl-devel@...
br76> > br76> > br76> >
https://lists.sourceforge.net/lists/listinfo/dapl-devel
br76> > br76> > br76> >
br76> > br76> > br76>
br76> > br76> >
br76> > br76>
br76> > br76>
br76> > br76>
br76> > br76> -------------------------------------------------------
br76> > br76> This sf.net email is sponsored by:ThinkGeek
br76> > br76> Welcome to geek heaven.
br76> > br76> http://thinkgeek.com/sf
br76> > br76> _______________________________________________
br76> > br76> Dapl-devel mailing list
br76> > br76> Dapl-devel@...
br76> > br76> https://lists.sourceforge.net/lists/listinfo/dapl-devel
br76> > br76>
br76> >
br76>=20
br76>=20
|