sandiaportals-devel Mailing List for Sandia Portals

sandiaportals-devel — development and design discussions

You can subscribe to this list here.

2002	_Jan	_Feb	_Mar	_Apr (2)	_May	_Jun	_Jul	_Aug (3)	_Sep (31)	_Oct (9)	_Nov (23)	_Dec (9)
2003	_Jan (3)	_Feb (11)	_Mar (36)	_Apr (4)	_May (5)	_Jun (4)	_Jul (1)	_Aug (4)	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug	_Sep	_Oct	_Nov	_Dec

1 2 3 .. 6 > >> (Page 1 of 6)

[Sandiaportals-devel] about user_ptr

From: radhika s. <de...@gm...> - 2015-07-15 16:25:10

hi there,
      So I was trying to learn portal 4 and was going through PtlPut which
takes list of arguments. The second argument (local_offset) is the pointer
to the data to be sent ( I am still confused why it has a type ptl_size_t
though) and third argument the length of data.
      My question is what is user_ptr? I have read the sentence for
user_ptr but I am still confused. Can some one please help me understand
this.



- Solti

Re: [Sandiaportals-devel] cplant Portals collective routines

From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-08-28 17:40:54

Ron,

Thanks for clearing that up... I was getting confused/concerned :).

Dave
On Wed, Aug 27, 2003 at 03:47:45PM -0600, Ron Brightwell wrote:
> > This is slightly offtopic but when did Portals start to concern itself with
> > collective routines?  I don't remember seeing that in any specification.
> > 
> 
> These were collective routines (broadcast and reduce) built on top of Portals,
> so they weren't part of the Portals API.
> 
> -Ron
> 

-- 
David Leimbach
Software Engineer
MPI Software Technology Inc.
Phone: 662-320-4300 x43
Fax:   662-320-4301
http://www.mpi-softtech.com

The information contained in this communication may be confidential and is
intended only for the use of the recipient(s) named above.  If the reader of
this communication is not the intended recipient(s), you are hereby notified
that any dissemination, distribution, or copying of this communication, or
any of its contents, is strictly prohibited.  If you are not a named
recipient or received this communication by mistake, please notify the sender
and delete the communication and all copies of it.

Re: [Sandiaportals-devel] cplant Portals collective routines

From: Ron B. <rb...@va...> - 2003-08-27 21:57:57

> This is slightly offtopic but when did Portals start to concern itself with
> collective routines?  I don't remember seeing that in any specification.
> 

These were collective routines (broadcast and reduce) built on top of Portals,
so they weren't part of the Portals API.

-Ron

Re: [Sandiaportals-devel] cplant Portals collective routines

From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-08-27 21:47:48

This is slightly offtopic but when did Portals start to concern itself with
collective routines?  I don't remember seeing that in any specification.

Dave
On Wed, Aug 27, 2003 at 11:55:32AM -0600, Trammell Hudson wrote:
> I know the Portals 3 collective routines have been removed from the
> mainline Portals code base, but they are in the most recent cplant code
> drop that I received.   I'm in the process of implementing a separate
> fanout routine and noticed that top/compute/lib/p30/api-p30/bcast.c does
> its sends in the reverse order that it should.
> 
> It sends to its lower rank children before the higher ones, which
> does not optimally overlap the tree.  For instance, in an 8 node fanout,
> this is the order that it would use:
> 
>      0 ->      1
>      0 ->      2
>      0 ->      4
> 
>      1 <-      0
> 
>      2 <-      0
>      2 ->      3
> 
>      3 <-      2
> 
>      4 <-      0
>      4 ->      5
>      4 ->      6
> 
>      5 <-      4
> 
>      6 <-      4
>      6 ->      7
> 
>      7 <-      6
> 
> 0 would send to 1 first, then 2 then 4, but sending to 4 first would
> allow 4 to overlap its sends to 5 and 6 with 0's sends to 2 and 4.
> In general, the optimal mapping would be to send in the opposite order
> than the algorithm generates.
> 
> I can not see into the MPI library on Janus to say if this is what
> happens as well.  I think that I wrote that particular library, but it
> was so many years ago that I do not remember how I did it.  However,
> I expect that I copied the code from it for the Portals collective
> routine.
> 
> Trammell
> -- 
>   -----|----- hu...@os...                   W 240-283-1700
> *>=====[]L\   hu...@ro...                   M 505-463-1896
> '     -'-`-   http://www.swcp.com/~hudson/                    KC5RNF
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Sandiaportals-devel mailing list
> San...@li...
> https://lists.sourceforge.net/lists/listinfo/sandiaportals-devel

-- 
David Leimbach
Software Engineer
MPI Software Technology Inc.
Phone: 662-320-4300 x43
Fax:   662-320-4301
http://www.mpi-softtech.com


The information contained in this communication may be confidential and is
intended only for the use of the recipient(s) named above.  If the reader of
this communication is not the intended recipient(s), you are hereby notified
that any dissemination, distribution, or copying of this communication, or
any of its contents, is strictly prohibited.  If you are not a named
recipient or received this communication by mistake, please notify the sender
and delete the communication and all copies of it.

[Sandiaportals-devel] cplant Portals collective routines

From: Trammell H. <hu...@os...> - 2003-08-27 17:55:56

I know the Portals 3 collective routines have been removed from the
mainline Portals code base, but they are in the most recent cplant code
drop that I received.   I'm in the process of implementing a separate
fanout routine and noticed that top/compute/lib/p30/api-p30/bcast.c does
its sends in the reverse order that it should.

It sends to its lower rank children before the higher ones, which
does not optimally overlap the tree.  For instance, in an 8 node fanout,
this is the order that it would use:

     0 ->      1
     0 ->      2
     0 ->      4

     1 <-      0

     2 <-      0
     2 ->      3

     3 <-      2

     4 <-      0
     4 ->      5
     4 ->      6

     5 <-      4

     6 <-      4
     6 ->      7

     7 <-      6

0 would send to 1 first, then 2 then 4, but sending to 4 first would
allow 4 to overlap its sends to 5 and 6 with 0's sends to 2 and 4.
In general, the optimal mapping would be to send in the opposite order
than the algorithm generates.

I can not see into the MPI library on Janus to say if this is what
happens as well.  I think that I wrote that particular library, but it
was so many years ago that I do not remember how I did it.  However,
I expect that I copied the code from it for the Portals collective
routine.

Trammell
-- 
  -----|----- hu...@os...                   W 240-283-1700
*>=====[]L\   hu...@ro...                   M 505-463-1896
'     -'-`-   http://www.swcp.com/~hudson/                    KC5RNF

Re: [Sandiaportals-devel] Strangeness in event fields

From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-07-09 13:36:31

> > Yes, this is definitely a bug.  Are you using the HEAD?  Or are you
> > using Cray Portals?
> 
> Cray.

Is Cray portals available somehow?              

-- 
David Leimbach
Software Engineer
MPI Software Technology Inc.
Phone: 662-320-4300 x43
Fax:   662-320-4301
http://www.mpi-softtech.com


The information contained in this communication may be confidential and is
intended only for the use of the recipient(s) named above.  If the reader of
this communication is not the intended recipient(s), you are hereby notified
that any dissemination, distribution, or copying of this communication, or
any of its contents, is strictly prohibited.  If you are not a named
recipient or received this communication by mistake, please notify the sender
and delete the communication and all copies of it.

RE: [Sandiaportals-devel] Strangeness in event fields

From: Jared R. <ja...@cr...> - 2003-06-27 20:53:04

> Interesting... I never thought of doing this.  The spec doesn't
> explicitly state that PtlMDUpdate resets an MD's local offset
> but maybe
> it should.  Do you have a good reason for wanting to doing this?

It's just a way to wake up an MD without creating a new one, once it's data
is no longer cared about.  I think the offset has to be reset, since the API
md structure can't keep this information itself, and PtlMDUpdate doesn't
really care that the new md happens to be at the same starting address as
the old one.  I'm pretty sure the problem would still manifest itself even
if you were to unlink the MD and reattach it into the match list, since the
problem appears to be in the md structure data, not in MDUpdate.  One
possible use for that might be keeping a chain of MDs on a single matchlist
so that incoming messages don't get dropped when one MD fills, then being
able to reattach the MD at the end of the list without having to worry about
which particular MD we are referring to.

> Yes, this is definitely a bug.  Are you using the HEAD?  Or are you
> using Cray Portals?

Cray.

>
> This is a bug also.  The api-side should be translating
> lib-side handle
> to the api-side handle before revealing it to you.
>
> I can take a stab at fixing these problems.  Do you have a
> test program
> handy?
>

Yes, I'll send them to you in a separate mail.

--Jared

Re: [Sandiaportals-devel] Strangeness in event fields

From: Kevin P. <ped...@ie...> - 2003-06-27 18:26:06

Jared Roberts wrote:

>I was exploring the possibilty of using PtlMDUpdate to reset the local
>offset of an MD after a put when I came across some unexpected behavior.  
>

Interesting... I never thought of doing this.  The spec doesn't 
explicitly state that PtlMDUpdate resets an MD's local offset but maybe 
it should.  Do you have a good reason for wanting to doing this?

>I
>made a copy of the MD returned in the PUT event and used that as the new
>argument in PtlMDUpdate, along with the md_handle given back in the event--
>and got back PTL_NOINIT (!).  It looks like the nal_idx field of the
>md_handle in the event isn't being set, so PtlMDUpdate sees that it is out
>of range and whines.  
>

Yes, this is definitely a bug.  Are you using the HEAD?  Or are you 
using Cray Portals?


>This is easy enough to work around, but even after
>doing this, I end up with a PTL_INV_EQ error.  It looks like the handle_idx
>field of the eventq handle in the MD in the event is pointing to a lib_eq_t
>where the API-side expects it to be pointing to a ptl_eq_t.  Is this
>behavior documented anywhere?  
>

This is a bug also.  The api-side should be translating lib-side handle 
to the api-side handle before revealing it to you. 

I can take a stab at fixing these problems.  Do you have a test program 
handy?


Kevin


>It's seems easy enough to avoid if you know
>where to look for it, but I couldn't find anything warning of this kind of
>behavior.
>
>
>--Jared
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: INetU
>Attention Web Developers & Consultants: Become An INetU Hosting Partner.
>Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
>INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
>_______________________________________________
>Sandiaportals-devel mailing list
>San...@li...
>https://lists.sourceforge.net/lists/listinfo/sandiaportals-devel
>  
>

[Sandiaportals-devel] Strangeness in event fields

From: Jared R. <ja...@cr...> - 2003-06-26 19:48:58

I was exploring the possibilty of using PtlMDUpdate to reset the local
offset of an MD after a put when I came across some unexpected behavior.  I
made a copy of the MD returned in the PUT event and used that as the new
argument in PtlMDUpdate, along with the md_handle given back in the event--
and got back PTL_NOINIT (!).  It looks like the nal_idx field of the
md_handle in the event isn't being set, so PtlMDUpdate sees that it is out
of range and whines.  This is easy enough to work around, but even after
doing this, I end up with a PTL_INV_EQ error.  It looks like the handle_idx
field of the eventq handle in the MD in the event is pointing to a lib_eq_t
where the API-side expects it to be pointing to a ptl_eq_t.  Is this
behavior documented anywhere?  It's seems easy enough to avoid if you know
where to look for it, but I couldn't find anything warning of this kind of
behavior.

--Jared

[Sandiaportals-devel] Run kernel NAL test using Portals only?

From: Hsing-bung C. <hb...@la...> - 2003-06-03 14:42:19

Hi,
I have couple questions about "Run kernel NAL test using Portals only".
I am developing Infiniband Kernel NAL and would like to run testings to verify
my ibnal code. Here is what I would to do
1. bring up Portals 0.6.0.3
2. create /dev/portals device
3. bring up tcp and ibnal modules
4. use startserver.sh, startclient.sh, stopserver.sh, stopclient.sh
         under portals/linux/tests
         to run some ping testing
5. How do I use ptlctl to run testing ?
6. Do I need to add Infiniband to ptlctl ?
7. How is a end-to-end testing look like?
8. How to use the utilities
               under portals/linux/utils
       and
              under portal/user/tests
               portals/user/tests

Thanks.

HB Chen
LANL
hb...@la...
505-665-3591

[Sandiaportals-devel] Portals 3.3 specification

From: Ron B. <rb...@va...> - 2003-05-16 19:12:37

Attachments: portals3.3.pdf

Greetings:

Attached is the Portals 3.3 specification document.

The document has gone through some changes.  The section with examples was
deleted since the examples were becoming increasingly inconsistent with
the API.

Several new functions were added, including:

  PtlEQPoll() - look for an event on multiple event queues

  PtlGetJid() - associate a job id with a process

  PtlGetPut() - atomic swap operation

  PtlGetRegion() - allow a get operation into part of an MD

  PtlPutRegion() - allow a put operation from part of an MD

Support for gather/scatter MD's and event handlers was also added.

-Ron

re: [Sandiaportals-devel] RFC

From: Eric H. <ho...@cr...> - 2003-05-15 04:06:25

    >2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the
    >   MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ).  We can't bound the
    >   time that memory is "exposed" to the network otherwise.  This has been
    >   implemented in sandiaportals, but not documented in the spec.

  I'm not sure about this one.  Unless the target is down, a PtlPut() or 
  PtlGet() should complete pretty quickly.  If the target is down, the END 
  event (in Portals 3.2/3.3) will eventually happen and indicate an 
  error.  When the END event happens is implementation specific but will 
  probably be after a longish timeout when the message's retry limit is 
  reached. 

there is a substantial window in the remote case allowing (for
example) the target MD handle to be freed and reallocated causing
unexpected results. without an instance-unique identifier (your
suggested generation number or a random cookie), there doesn't seem
any way to prevent this without placing additional restrictions on the
use of the API.

Re: [Sandiaportals-devel] RFC

From: Kevin P. <kt...@ie...> - 2003-05-15 03:25:01

Eric Barton wrote:

>Guys,
>
>I've got a few observations I'd like to share and get feedback on.
>
>1/ I think MD handles (and if them, why not all) need to be "single shot"
>   in implementations that purport to be thread-safe.
>
>   This is needed to ensure that PtlMDUnlink() can only unlink the intended
>   MD when it is racing with an incoming message that could unlink the same
>   MD.  If PtlMDUnlink() loses the race, the caller should get PTL_INV_MD.
>   However if the handle happens to get re-used (say by a concurrent thread
>   doing PtlMDAttach()), completely the wrong thing will happen!
>
>   If you concur, it might be a good idea to point this out in the Portals
>   spec so implementers get the picture.
>

I agree.  This is something we didn't think of when trying to make 
Portals thread safe.  Or at least I didn't think of it.  By "single 
shot" you mean adding a random number or generation count to each 
handle, right? 

>
>2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the
>   MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ).  We can't bound the
>   time that memory is "exposed" to the network otherwise.  This has been
>   implemented in sandiaportals, but not documented in the spec.
>

I'm not sure about this one.  Unless the target is down, a PtlPut() or 
PtlGet() should complete pretty quickly.  If the target is down, the END 
event (in Portals 3.2/3.3) will eventually happen and indicate an 
error.  When the END event happens is implementation specific but will 
probably be after a longish timeout when the message's retry limit is 
reached. 

>
>3/ Why bother with specifying error return codes that _always_ mean the
>   programmer screwed up, rather than there was some resource shortage or
>   a lost race?
>
>   For example it's highly unlikely there are any real programs that test
>   for PTL_NOINIT, and I bet most Portals implementations break if this has
>   to work "under fire" (i.e. comms racing with interfaces begin brought up
>   and down and the RC is being used to determine what state it's in).
>
>   Why not core dump instead so (a) crap programmers get to know that they
>   _have_ screwed up and where, and (b) decent programmers don't have to
>   clutter their programs with unnecessary conditionals.
>
>   Maybe this is a bit tongue in cheek, but some of the advertised error
>   codes can't be tested for efficiently, and if we did this, we could get
>   the number of return codes down to 3 or 4!
>  
>

This sounds logical to me.  It's my understanding that Portals provides 
low-level building blocks that library writers can use to make 
higher-level, and friendlier, communication libraries (MPICH, Lustre, 
etc.).  core dumping gets the message across and is easier to implement 
than things like returning PTL_NOINIT and handling PtlFini() and 
PtlNIFini() correctly. 

The Portals 3.3 draft that Ron is preparing contains the following section:

3.3 Return Codes
The API specifies return codes that indicate success or failure of a 
function call.  In the case where the failure is due to invalid 
arguments being passed into the function, the exact behavior of an 
implementation is undefined.  The API suggests error codes that provide 
more detail about specific invalid parameters, but an implementation is 
not required to return these specific error codes.  For example, an 
implementation is free to allow the caller to fault when given an 
invalid address, rather than return PTL_SEGV.  In addition, an 
implementation is free to map these return codes to standard return 
codes where appropriate.  For example, a Linux kernel-space 
implementation may want to map Portals return codes to POSIX-compliant 
return codes.

[Sandiaportals-devel] RFC

From: Eric B. <er...@ba...> - 2003-05-14 20:16:17

Guys,

I've got a few observations I'd like to share and get feedback on.

1/ I think MD handles (and if them, why not all) need to be "single shot"
   in implementations that purport to be thread-safe.

   This is needed to ensure that PtlMDUnlink() can only unlink the intended
   MD when it is racing with an incoming message that could unlink the same
   MD.  If PtlMDUnlink() loses the race, the caller should get PTL_INV_MD.
   However if the handle happens to get re-used (say by a concurrent thread
   doing PtlMDAttach()), completely the wrong thing will happen!

   If you concur, it might be a good idea to point this out in the Portals
   spec so implementers get the picture.

2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the
   MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ).  We can't bound the
   time that memory is "exposed" to the network otherwise.  This has been
   implemented in sandiaportals, but not documented in the spec.

3/ Why bother with specifying error return codes that _always_ mean the
   programmer screwed up, rather than there was some resource shortage or
   a lost race?

   For example it's highly unlikely there are any real programs that test
   for PTL_NOINIT, and I bet most Portals implementations break if this has
   to work "under fire" (i.e. comms racing with interfaces begin brought up
   and down and the RC is being used to determine what state it's in).

   Why not core dump instead so (a) crap programmers get to know that they
   _have_ screwed up and where, and (b) decent programmers don't have to
   clutter their programs with unnecessary conditionals.

   Maybe this is a bit tongue in cheek, but some of the advertised error
   codes can't be tested for efficiently, and if we did this, we could get
   the number of return codes down to 3 or 4!

-- 

                Cheers,
                        Eric

----------------------------------------------------
|Eric Barton        Barton Software                |
|9 York Gardens     Tel:    +44 (117) 330 1575     |
|Clifton            Mobile: +44 (7909) 680 356     |
|Bristol BS8 4LL    Fax:    call first             |
|United Kingdom     E-Mail: er...@ba...|
----------------------------------------------------

[Sandiaportals-devel] HELLO message

From: Eric B. <er...@ba...> - 2003-05-06 16:05:21

Guys,

This doesn't affect the portals API _at_all_, but will be of interest to
implementors. 

I've implemented a new message type, so that portals over TCP/IP can verify
that the peer on the other end of a socket is running the same protocol
version and discover/verify its NID.

The new message type, should be exchanged when a new portals/TCP/IP
connection is established.  In this message, all fields are zeroed except
for...

type:  set to PTL_MSG_HELLO

src_nid: the NID of the newly connected peer.

dst_nid: A magic number which identifies the byte stream as a portals
message stream, and version numbers which effectively describe how to parse
the stream.  Note that the dst_nid comprises the first few bytes on the
wire.

Note that the "common" payload length field (currently implemented as the macro
PTL_HDR_LENGTH(h) since this length has been buried in the variant part of
the header for historical reasons) is zero, implying no further data to
follow.

This message type has been implemented on CVS branch b_devel, but disabled
by default.  The structure definitions are in include/portals/lib-types.h
and the linux 'acceptor' and 'ptlctl' utilities implement the exchange.

In the not too distant future, these utilities will disappear, leaving the
NALs to establish connections themselves.  When this occurs, exchange of
the "hello" message when connections are established should become
mandatory.

-- 

                Cheers,
                        Eric

----------------------------------------------------
|Eric Barton        Barton Software                |
|9 York Gardens     Tel:    +44 (117) 330 1575     |
|Clifton            Mobile: +44 (7909) 680 356     |
|Bristol BS8 4LL    Fax:    call first             |
|United Kingdom     E-Mail: er...@ba...|
----------------------------------------------------

[FW:] [Sandiaportals-devel] Implementation vs Standard?

From: Ron B. <rb...@va...> - 2003-04-07 17:18:42

> From: "David Leimbach" <dle...@mp...>
> Subject: [Sandiaportals-devel] Implementation vs Standard?
> Date: Mon, 7 Apr 2003 11:41:52 -0500
> To: san...@li...
> 
> I don't know how I ever missed some of the inconsistencies but I was 
> just looking
> at the CPlant Portals function PtlGetId.  On CPlant this function takes 
> 3 parameters
> but in every version of the standard I have seen it only takes 2.
> 
> Portals 3.0:
> 3.4.2 PtlGetId
> int PtlGetId( ptl_process_id_t* id, ptl_id_t* gsize );
> 
> Portals 3.2 rev1:
> 3.6.2 PtlGetId
> int PtlGetId(ptl_handle_ni_t *ni_handle, ptl_process_id_t * id);
> 
> I think the size field is useful... what happened to it? :)
> 
> The CPlant implementation conforms to neither of the above.
> 

This is one of those places where Portals was overlapping with the runtime
system.  The group size notion is specific to a collection of Portals
processes running in the compute partition of parallel machine.  We got rid
of some of these things (eg. PtlBarrier) when we tried to make Portals more
general-purpose in the changes from 3.1 to 3.2.

The Cplant implementation has lagged for several reasons, but probably
mostly because we didn't want to mess with code for a production platform
that was working.

-Ron

[Sandiaportals-devel] Implementation vs Standard?

From: David L. <dle...@mp...> - 2003-04-07 16:42:00

I don't know how I ever missed some of the inconsistencies but I was 
just looking
at the CPlant Portals function PtlGetId.  On CPlant this function takes 
3 parameters
but in every version of the standard I have seen it only takes 2.

Portals 3.0:
3.4.2 PtlGetId
int PtlGetId( ptl_process_id_t* id, ptl_id_t* gsize );

Portals 3.2 rev1:
3.6.2 PtlGetId
int PtlGetId(ptl_handle_ni_t *ni_handle, ptl_process_id_t * id);

I think the size field is useful... what happened to it? :)

The CPlant implementation conforms to neither of the above.

Dave
---
David Leimbach
Software Engineer
MPI Software Technology Inc.
Phone: 662-320-4300 x43
Fax:   662-320-4301
http://www.mpi-softtech.com


The information contained in this communication may be confidential and 
is intended only for the use of the recipient(s) named above.  If the 
reader of this communication is not the intended recipient(s), you are 
hereby notified that any dissemination, distribution, or copying of 
this communication, or any of its contents, is strictly prohibited.  If 
you are not a named recipient or received this communication by 
mistake, please notify the sender and delete the communication and all 
copies of it.

[Sandiaportals-devel] Re: questions about lib_parse in portals

From: Trammell H. <hu...@os...> - 2003-04-06 17:56:12

[ cc'd to the Sandia portals development mailing list ]

On Fri, Apr 04, 2003 at 05:11:43PM -0700, Zhiyong wrote:
> [...] I have some questions in lib-parse. There are parse-put
> and parse-get.  In parse_put, it call cb_recv.

The library is only responsible for decoding the Portals message
header, so most NAL implementations will only read the header
or first packet from the wire upon receiving an interrupt.  The
library decodes the header and in the event of a put operation
it will ask the NAL to receive the rest of the message and copy
the data into user memory.

Once the message is fully received, the NAL is to do what ever
it needs to do to reset the NIC to be ready for an incoming
message.

> In parse-get, it first called cb_recv, then it called
> cb_send.

The call to cb_nal->cb_recv() with a zero length and null user
address token is to allow the NAL to flush anything from the
wire that it needs to or to reset the state of the network.
cb_nal->cb_send is then called to copy data from user memory
to the wire.

Hope this helps.

Trammell
-- 
  -----|----- hu...@os...                   W 240-283-1700
*>=====[]L\   hu...@ro...                   M 505-463-1896
'     -'-`-   http://www.swcp.com/~hudson/                    KC5RNF

[Sandiaportals-devel] Portals api-side validate

From: Kevin P. <ped...@ie...> - 2003-04-03 19:29:09

Trammell,
   I thought you would be the best person to address this but please, 
others chime in if you can help.

Why does the api-side nal only have validate()?  invalidate() is 
missing.  I can see how api-side memory validation/invalidation might 
involve the kernel whereas forward might always go to the NIC.  However, 
if the nic can invalidate memory (which I'm assuming it can since 
invalidate() is missing) why can't it also validate it?

Thanks,
Kevin

Re: [Sandiaportals-devel] RE: Portals 3.2 issues

From: Ron B. <rb...@va...> - 2003-03-31 23:35:54

> > Let me try.  Suppose an MPI process on node A sends message m1 then
> > message m2 to a process on node B.  By MPI semantics, m1 must arrive
> > before m2 on node B. However, if the underlying MPI implementation
> > starts the send of m2 before the send for m1 is complete (there are 
> > good
> > reasons for wanting to do this.....), it is possible that the sending 
> > of
> > m2 will end before the sending of m1.  This, Ron needs to know which 
> > one
> > started first and doesn't care which one ended first.
> >
> 
> Does portals guarantee that the message that "started" to send first 
> will
> arrive first?  Or does it guarantee that the one that "finished" 
> sending first
> will arrive first?  Or neither? :)

Portals gurantees the same pairwise ordering semantics that MPI does.
If A starts m1 and then starts m2, they will arrive (traverse the Portals
structures) in that order and, if they end up generating events on the same
event queue, m1's PUT_START event will be delivered from the API before
m2's PUT_START event.  Portals doesn't say anything about the order of the
completion of m1 and m2.


> 
> In Portals 3.0 all I had to go by was the single event for the send... 
> and now I
> am confused :)

Portals 3.0 guranteed that the messages arrived (traversed the Portals
structures) in order, but did not restrict how the subsequent events were
delivered from the API.  From an MPI implementor's point of view, you could
never be sure that there wasn't a "newer" message that might show up in the
event queue.

> 
> My internal wiring tells me its the "send-end" that matters not the 
> "send-start"
> but that could just be my thought about internal buffering of the sent 
> message
> which I guess ideally there is none of.  [reduce the amount of copies 
> of the message]
> 

The PUT_START events are only important if you need to insure pairwise
ordering.

-Ron

Re: [Sandiaportals-devel] RE: Portals 3.2 issues

From: David L. <dleimbac@MPI-SoftTech.Com> - 2003-03-31 19:21:35

On Monday, March 31, 2003, at 09:50 AM, Arthur B. (Barney) Maccabe 
wrote:

>>
>>> Start events are only there to preserve order.  Most of the 
>>> higher-level
>>> protocols that we implement on top of Portals need to preserve 
>>> ordering.
>>> You may not need it in the kernel, but we need to do it for the 
>>> user-level.
>>
>> Given that END events are not guaranteed ordered (i.e. completion of 
>> 'y'
>> initiated after 'x' does not guarantee completion of 'x') I'm not 
>> sure I follow.
>> If it's not too much of a drag, could you give me an example?
>
> Let me try.  Suppose an MPI process on node A sends message m1 then
> message m2 to a process on node B.  By MPI semantics, m1 must arrive
> before m2 on node B. However, if the underlying MPI implementation
> starts the send of m2 before the send for m1 is complete (there are 
> good
> reasons for wanting to do this.....), it is possible that the sending 
> of
> m2 will end before the sending of m1.  This, Ron needs to know which 
> one
> started first and doesn't care which one ended first.
>

Does portals guarantee that the message that "started" to send first 
will
arrive first?  Or does it guarantee that the one that "finished" 
sending first
will arrive first?  Or neither? :)

In Portals 3.0 all I had to go by was the single event for the send... 
and now I
am confused :)

My internal wiring tells me its the "send-end" that matters not the 
"send-start"
but that could just be my thought about internal buffering of the sent 
message
which I guess ideally there is none of.  [reduce the amount of copies 
of the message]

Dave

---
David Leimbach
Software Engineer
MPI Software Technology Inc.
Phone: 662-320-4300 x43
Fax:   662-320-4301
http://www.mpi-softtech.com

The information contained in this communication may be confidential and 
is intended only for the use of the recipient(s) named above.  If the 
reader of this communication is not the intended recipient(s), you are 
hereby notified that any dissemination, distribution, or copying of 
this communication, or any of its contents, is strictly prohibited.  If 
you are not a named recipient or received this communication by 
mistake, please notify the sender and delete the communication and all 
copies of it.

RE: [Sandiaportals-devel] RE: Portals 3.2 issues

From: Arthur B. (B. M. <ma...@cs...> - 2003-03-31 17:27:59

On Mon, 2003-03-31 at 09:36, Eric Barton wrote:
> Barney,
> 
> If I read you right...
> 
> 1/ There are only 2 types of events; START and END.
Yes.

> 
> 2/ I can disable either or both types via the MD options
>    PTL_MD_DISABLE_START_EVENT or PTL_MD_DISABLE_END_EVENT
Yes.

> 
> 3/ The event tells me (somehow) if the MD is now inactive
>    and/or has been unlinked.
Yes.

> 
> ...then I'm a happy chappie :)

I thought you might say that..... well, not exactly that, but the
general idea :)

> 
> BTW, is it only OK to specify PTL_EQ_NONE if both PTL_MD_DISABLE_START_EVENT
> and PTL_MD_DISABLE_END_EVENT are present?
There are possible inconsistencies.  You might also provide an event
queue and disable both types of events.  (I told you, bit maps are
evil.)

> 
> Also, I guess half of (3) can be inferred by looking at the MD in the
> event...  
> 
> inactive = (threshold == 0 ||
>              ((options & PTL_MD_MAX_SIZE) != 0 && 
>               (offset > length - max_size)))
> 
> ...but I need to remember if 'unlink_op' was set on the attach to 
> infer that it was automatically unlinked.

I think the reason(s) for unlinking should be explicit -- assuming that
we have space in the event struct.

> 
> I can live with that...
> 
>                 Cheers,
>                         Eric
> 
> ----------------------------------------------------
> |Eric Barton        Barton Software                |
> |9 York Gardens     Tel:    +44 (117) 330 1575     |
> |Clifton            Mobile: +44 (7909) 680 356     |
> |Bristol BS8 4LL    Fax:    call first             |
> |United Kingdom     E-Mail: er...@ba...|
> ----------------------------------------------------
-- 
Arthur B. (Barney) Maccabe
Associate Professor                        Computer Science Department
Associate Director       The UNM Center for High Performance Computing
email: ma...@cs...               http://www.cs.unm.edu/~maccabe
The University of New Mexico                     voice: (505) 277-6504
Albuquerque, NM  87131-1386                        FAX: (505) 277-6927

RE: [Sandiaportals-devel] RE: Portals 3.2 issues

From: Eric B. <er...@ba...> - 2003-03-31 16:37:29

Barney,

If I read you right...

1/ There are only 2 types of events; START and END.

2/ I can disable either or both types via the MD options
   PTL_MD_DISABLE_START_EVENT or PTL_MD_DISABLE_END_EVENT

3/ The event tells me (somehow) if the MD is now inactive
   and/or has been unlinked.

...then I'm a happy chappie :)

BTW, is it only OK to specify PTL_EQ_NONE if both PTL_MD_DISABLE_START_EVENT
and PTL_MD_DISABLE_END_EVENT are present?

Also, I guess half of (3) can be inferred by looking at the MD in the
event...  

inactive = (threshold == 0 ||
             ((options & PTL_MD_MAX_SIZE) != 0 && 
              (offset > length - max_size)))

...but I need to remember if 'unlink_op' was set on the attach to 
infer that it was automatically unlinked.

I can live with that...

                Cheers,
                        Eric

----------------------------------------------------
|Eric Barton        Barton Software                |
|9 York Gardens     Tel:    +44 (117) 330 1575     |
|Clifton            Mobile: +44 (7909) 680 356     |
|Bristol BS8 4LL    Fax:    call first             |
|United Kingdom     E-Mail: er...@ba...|
----------------------------------------------------

Re: [Sandiaportals-devel] RE: Portals 3.2 issues

From: Arthur B. (B. M. <ma...@cs...> - 2003-03-31 15:52:59

On Mon, 2003-03-31 at 03:40, Eric Barton wrote:

> I guess my feeling was that exceeding the maximum offset and the threshold
> reaching zero, were both ways in which an MD could be "exhausted".
> PTL_EVENT_INACTIVE certainly says this clearly, but I _do_ feel performance
> takes a hit as the number of events delivered proliferates.
> 
> Could we turn the event type into a bitmask?  Then a single event (and it's
> accompanying locking/context switching overhead) could tell you everything
> that happened; e.g. (PTL_EVENT_GET_END | PTL_EVENT_UNLINK | PTL_EVENT_INACTIVE).

GACK!  A bit mask.....  me fears this might be overkill......

As I understand events (so far), we essentially have a START and and END
with an indication of whether or not the operation succeeded or failed. 
By the way, this was a good suggestion, it clears up cruft that was
creeping into the API.  Given the semantics of unlinking, along with the
END we could deliver an indication of whether or not the MD was unlinked
and why (offset exceeded or threshold).  Thus, every operation generates
exactly two events, START and END.  The END event tells you a lot about
what is happening.  

It might make more sense to associate unlinking with the START event,
this is when the MD should be unlinked by the underlying
implementation.  However, *some people* don't want to look at START
events and everyone will need to look at END events, to make this easier
I suggest that we minimize anything that we associate with START events

> 
> > Start events are only there to preserve order.  Most of the higher-level
> > protocols that we implement on top of Portals need to preserve ordering.
> > You may not need it in the kernel, but we need to do it for the user-level.
> 
> Given that END events are not guaranteed ordered (i.e. completion of 'y'
> initiated after 'x' does not guarantee completion of 'x') I'm not sure I follow.
> If it's not too much of a drag, could you give me an example?

Let me try.  Suppose an MPI process on node A sends message m1 then
message m2 to a process on node B.  By MPI semantics, m1 must arrive
before m2 on node B. However, if the underlying MPI implementation
starts the send of m2 before the send for m1 is complete (there are good
reasons for wanting to do this.....), it is possible that the sending of
m2 will end before the sending of m1.  This, Ron needs to know which one
started first and doesn't care which one ended first.

> 
> > >    Anyway, assuming START events are here to stay, and I neither want to
> > >    receive them, nor take the locking/context switching overhead of
> > >    ignoring them, why not let me specify that I don't want them when I
> > >    create the MD?  It's already possible to filter out _all_ events by
> > >    specifying PTL_EQ_NONE when the MD gets built, and we've got lots of
> > >    options bits left :)
> > 
> > That would be ok with me.  It may be a good way to convey the ordering
> > semantics that one would like to have.  I think Barney was the most vocal
> > opponent of this...
> 
> If my suggestion above for the event type to be a bitmask is good, it would be
> easy :))

Bit mask this, bit mask that, ...... one day, you will learn that bit
masks, macros, and var args are the true axis of evil.....

As the "most vocal opponent," I ought to chime in ......

My position has been that you are solving a non-problem and adding cruft
to the API that solves a non-problem messes up the API.  I think we were
pretty clear that there is very little cost for actually delivering the
event to the application layer.  The cost it seems will be in the
application that has to deal with events it doesn't want.  Why not write
a macro that will drop the events you don't want to see?  (That is,
substitute one form of evil for another....)

OK, that was fun......  

How about we add two more options to the MD options: 
PTL_MD_DISABLE_START_EVENT and PTL_MD_DISABLE_END_EVENT.

(The negative was intentional -- by default, you get both types of
events, you have to do something to disable one type or the other.)

-- 
Arthur B. (Barney) Maccabe
Associate Professor                        Computer Science Department
Associate Director       The UNM Center for High Performance Computing
email: ma...@cs...               http://www.cs.unm.edu/~maccabe
The University of New Mexico                     voice: (505) 277-6504
Albuquerque, NM  87131-1386                        FAX: (505) 277-6927

[Sandiaportals-devel] RE: Portals 3.2 issues

From: Eric B. <er...@ba...> - 2003-03-31 10:41:12

Ron,

Thanks for your mail.

> >    Incidentally, is there any reason why we don't set the MD's threshold to
> >    zero when offset exceeds max_offset/(length - max_size)?  It would
> >    provide a clear indication in the corresponding event that the MD is now
> >    inactive.
> 
> Yes, because they are two different things.  It sounds like what you want is
> an easier way to tell that an MD is inactive.  PTL_EVENT_UNLINK (which is in
> the current spec) is the easy way to tell for an MD that automatically unlinks.
> Would you want something like PTL_EVENT_INACTIVE?

I guess my feeling was that exceeding the maximum offset and the threshold
reaching zero, were both ways in which an MD could be "exhausted".
PTL_EVENT_INACTIVE certainly says this clearly, but I _do_ feel performance
takes a hit as the number of events delivered proliferates.

Could we turn the event type into a bitmask?  Then a single event (and it's
accompanying locking/context switching overhead) could tell you everything
that happened; e.g. (PTL_EVENT_GET_END | PTL_EVENT_UNLINK | PTL_EVENT_INACTIVE).

> Start events are only there to preserve order.  Most of the higher-level
> protocols that we implement on top of Portals need to preserve ordering.
> You may not need it in the kernel, but we need to do it for the user-level.

Given that END events are not guaranteed ordered (i.e. completion of 'y'
initiated after 'x' does not guarantee completion of 'x') I'm not sure I follow.
If it's not too much of a drag, could you give me an example?

> >    Anyway, assuming START events are here to stay, and I neither want to
> >    receive them, nor take the locking/context switching overhead of
> >    ignoring them, why not let me specify that I don't want them when I
> >    create the MD?  It's already possible to filter out _all_ events by
> >    specifying PTL_EQ_NONE when the MD gets built, and we've got lots of
> >    options bits left :)
> 
> That would be ok with me.  It may be a good way to convey the ordering
> semantics that one would like to have.  I think Barney was the most vocal
> opponent of this...

If my suggestion above for the event type to be a bitmask is good, it would be
easy :))

-- 

                Cheers,
                        Eric

----------------------------------------------------
|Eric Barton        Barton Software                |
|9 York Gardens     Tel:    +44 (117) 330 1575     |
|Clifton            Mobile: +44 (7909) 680 356     |
|Bristol BS8 4LL    Fax:    call first             |
|United Kingdom     E-Mail: er...@ba...|
----------------------------------------------------

Flat | Threaded

1 2 3 .. 6 > >> (Page 1 of 6)

2002	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug (3)	Sep (31)	Oct (9)	Nov (23)	Dec (9)
2003	Jan (3)	Feb (11)	Mar (36)	Apr (4)	May (5)	Jun (4)	Jul (1)	Aug (4)	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec