You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(31) |
Oct
(9) |
Nov
(23) |
Dec
(9) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(3) |
Feb
(11) |
Mar
(36) |
Apr
(4) |
May
(5) |
Jun
(4) |
Jul
(1) |
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: radhika s. <de...@gm...> - 2015-07-15 16:25:10
|
hi there, So I was trying to learn portal 4 and was going through PtlPut which takes list of arguments. The second argument (local_offset) is the pointer to the data to be sent ( I am still confused why it has a type ptl_size_t though) and third argument the length of data. My question is what is user_ptr? I have read the sentence for user_ptr but I am still confused. Can some one please help me understand this. - Solti |
From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-08-28 17:40:54
|
Ron, Thanks for clearing that up... I was getting confused/concerned :). Dave On Wed, Aug 27, 2003 at 03:47:45PM -0600, Ron Brightwell wrote: > > This is slightly offtopic but when did Portals start to concern itself with > > collective routines? I don't remember seeing that in any specification. > > > > These were collective routines (broadcast and reduce) built on top of Portals, > so they weren't part of the Portals API. > > -Ron > -- David Leimbach Software Engineer MPI Software Technology Inc. Phone: 662-320-4300 x43 Fax: 662-320-4301 http://www.mpi-softtech.com The information contained in this communication may be confidential and is intended only for the use of the recipient(s) named above. If the reader of this communication is not the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you are not a named recipient or received this communication by mistake, please notify the sender and delete the communication and all copies of it. |
From: Ron B. <rb...@va...> - 2003-08-27 21:57:57
|
> This is slightly offtopic but when did Portals start to concern itself with > collective routines? I don't remember seeing that in any specification. > These were collective routines (broadcast and reduce) built on top of Portals, so they weren't part of the Portals API. -Ron |
From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-08-27 21:47:48
|
This is slightly offtopic but when did Portals start to concern itself with collective routines? I don't remember seeing that in any specification. Dave On Wed, Aug 27, 2003 at 11:55:32AM -0600, Trammell Hudson wrote: > I know the Portals 3 collective routines have been removed from the > mainline Portals code base, but they are in the most recent cplant code > drop that I received. I'm in the process of implementing a separate > fanout routine and noticed that top/compute/lib/p30/api-p30/bcast.c does > its sends in the reverse order that it should. > > It sends to its lower rank children before the higher ones, which > does not optimally overlap the tree. For instance, in an 8 node fanout, > this is the order that it would use: > > 0 -> 1 > 0 -> 2 > 0 -> 4 > > 1 <- 0 > > 2 <- 0 > 2 -> 3 > > 3 <- 2 > > 4 <- 0 > 4 -> 5 > 4 -> 6 > > 5 <- 4 > > 6 <- 4 > 6 -> 7 > > 7 <- 6 > > 0 would send to 1 first, then 2 then 4, but sending to 4 first would > allow 4 to overlap its sends to 5 and 6 with 0's sends to 2 and 4. > In general, the optimal mapping would be to send in the opposite order > than the algorithm generates. > > I can not see into the MPI library on Janus to say if this is what > happens as well. I think that I wrote that particular library, but it > was so many years ago that I do not remember how I did it. However, > I expect that I copied the code from it for the Portals collective > routine. > > Trammell > -- > -----|----- hu...@os... W 240-283-1700 > *>=====[]L\ hu...@ro... M 505-463-1896 > ' -'-`- http://www.swcp.com/~hudson/ KC5RNF > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Sandiaportals-devel mailing list > San...@li... > https://lists.sourceforge.net/lists/listinfo/sandiaportals-devel -- David Leimbach Software Engineer MPI Software Technology Inc. Phone: 662-320-4300 x43 Fax: 662-320-4301 http://www.mpi-softtech.com The information contained in this communication may be confidential and is intended only for the use of the recipient(s) named above. If the reader of this communication is not the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you are not a named recipient or received this communication by mistake, please notify the sender and delete the communication and all copies of it. |
From: Trammell H. <hu...@os...> - 2003-08-27 17:55:56
|
I know the Portals 3 collective routines have been removed from the mainline Portals code base, but they are in the most recent cplant code drop that I received. I'm in the process of implementing a separate fanout routine and noticed that top/compute/lib/p30/api-p30/bcast.c does its sends in the reverse order that it should. It sends to its lower rank children before the higher ones, which does not optimally overlap the tree. For instance, in an 8 node fanout, this is the order that it would use: 0 -> 1 0 -> 2 0 -> 4 1 <- 0 2 <- 0 2 -> 3 3 <- 2 4 <- 0 4 -> 5 4 -> 6 5 <- 4 6 <- 4 6 -> 7 7 <- 6 0 would send to 1 first, then 2 then 4, but sending to 4 first would allow 4 to overlap its sends to 5 and 6 with 0's sends to 2 and 4. In general, the optimal mapping would be to send in the opposite order than the algorithm generates. I can not see into the MPI library on Janus to say if this is what happens as well. I think that I wrote that particular library, but it was so many years ago that I do not remember how I did it. However, I expect that I copied the code from it for the Portals collective routine. Trammell -- -----|----- hu...@os... W 240-283-1700 *>=====[]L\ hu...@ro... M 505-463-1896 ' -'-`- http://www.swcp.com/~hudson/ KC5RNF |
From: Dave L. <dleimbac@MPI-SoftTech.Com> - 2003-07-09 13:36:31
|
> > Yes, this is definitely a bug. Are you using the HEAD? Or are you > > using Cray Portals? > > Cray. Is Cray portals available somehow? -- David Leimbach Software Engineer MPI Software Technology Inc. Phone: 662-320-4300 x43 Fax: 662-320-4301 http://www.mpi-softtech.com The information contained in this communication may be confidential and is intended only for the use of the recipient(s) named above. If the reader of this communication is not the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you are not a named recipient or received this communication by mistake, please notify the sender and delete the communication and all copies of it. |
From: Jared R. <ja...@cr...> - 2003-06-27 20:53:04
|
> Interesting... I never thought of doing this. The spec doesn't > explicitly state that PtlMDUpdate resets an MD's local offset > but maybe > it should. Do you have a good reason for wanting to doing this? It's just a way to wake up an MD without creating a new one, once it's data is no longer cared about. I think the offset has to be reset, since the API md structure can't keep this information itself, and PtlMDUpdate doesn't really care that the new md happens to be at the same starting address as the old one. I'm pretty sure the problem would still manifest itself even if you were to unlink the MD and reattach it into the match list, since the problem appears to be in the md structure data, not in MDUpdate. One possible use for that might be keeping a chain of MDs on a single matchlist so that incoming messages don't get dropped when one MD fills, then being able to reattach the MD at the end of the list without having to worry about which particular MD we are referring to. > Yes, this is definitely a bug. Are you using the HEAD? Or are you > using Cray Portals? Cray. > > This is a bug also. The api-side should be translating > lib-side handle > to the api-side handle before revealing it to you. > > I can take a stab at fixing these problems. Do you have a > test program > handy? > Yes, I'll send them to you in a separate mail. --Jared |
From: Kevin P. <ped...@ie...> - 2003-06-27 18:26:06
|
Jared Roberts wrote: >I was exploring the possibilty of using PtlMDUpdate to reset the local >offset of an MD after a put when I came across some unexpected behavior. > Interesting... I never thought of doing this. The spec doesn't explicitly state that PtlMDUpdate resets an MD's local offset but maybe it should. Do you have a good reason for wanting to doing this? >I >made a copy of the MD returned in the PUT event and used that as the new >argument in PtlMDUpdate, along with the md_handle given back in the event-- >and got back PTL_NOINIT (!). It looks like the nal_idx field of the >md_handle in the event isn't being set, so PtlMDUpdate sees that it is out >of range and whines. > Yes, this is definitely a bug. Are you using the HEAD? Or are you using Cray Portals? >This is easy enough to work around, but even after >doing this, I end up with a PTL_INV_EQ error. It looks like the handle_idx >field of the eventq handle in the MD in the event is pointing to a lib_eq_t >where the API-side expects it to be pointing to a ptl_eq_t. Is this >behavior documented anywhere? > This is a bug also. The api-side should be translating lib-side handle to the api-side handle before revealing it to you. I can take a stab at fixing these problems. Do you have a test program handy? Kevin >It's seems easy enough to avoid if you know >where to look for it, but I couldn't find anything warning of this kind of >behavior. > > >--Jared > > > >------------------------------------------------------- >This SF.Net email is sponsored by: INetU >Attention Web Developers & Consultants: Become An INetU Hosting Partner. >Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! >INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php >_______________________________________________ >Sandiaportals-devel mailing list >San...@li... >https://lists.sourceforge.net/lists/listinfo/sandiaportals-devel > > |
From: Jared R. <ja...@cr...> - 2003-06-26 19:48:58
|
I was exploring the possibilty of using PtlMDUpdate to reset the local offset of an MD after a put when I came across some unexpected behavior. I made a copy of the MD returned in the PUT event and used that as the new argument in PtlMDUpdate, along with the md_handle given back in the event-- and got back PTL_NOINIT (!). It looks like the nal_idx field of the md_handle in the event isn't being set, so PtlMDUpdate sees that it is out of range and whines. This is easy enough to work around, but even after doing this, I end up with a PTL_INV_EQ error. It looks like the handle_idx field of the eventq handle in the MD in the event is pointing to a lib_eq_t where the API-side expects it to be pointing to a ptl_eq_t. Is this behavior documented anywhere? It's seems easy enough to avoid if you know where to look for it, but I couldn't find anything warning of this kind of behavior. --Jared |
From: Hsing-bung C. <hb...@la...> - 2003-06-03 14:42:19
|
Hi, I have couple questions about "Run kernel NAL test using Portals only". I am developing Infiniband Kernel NAL and would like to run testings to verify my ibnal code. Here is what I would to do 1. bring up Portals 0.6.0.3 2. create /dev/portals device 3. bring up tcp and ibnal modules 4. use startserver.sh, startclient.sh, stopserver.sh, stopclient.sh under portals/linux/tests to run some ping testing 5. How do I use ptlctl to run testing ? 6. Do I need to add Infiniband to ptlctl ? 7. How is a end-to-end testing look like? 8. How to use the utilities under portals/linux/utils and under portal/user/tests portals/user/tests Thanks. HB Chen LANL hb...@la... 505-665-3591 |
From: Ron B. <rb...@va...> - 2003-05-16 19:12:37
|
Greetings: Attached is the Portals 3.3 specification document. The document has gone through some changes. The section with examples was deleted since the examples were becoming increasingly inconsistent with the API. Several new functions were added, including: PtlEQPoll() - look for an event on multiple event queues PtlGetJid() - associate a job id with a process PtlGetPut() - atomic swap operation PtlGetRegion() - allow a get operation into part of an MD PtlPutRegion() - allow a put operation from part of an MD Support for gather/scatter MD's and event handlers was also added. -Ron |
From: Eric H. <ho...@cr...> - 2003-05-15 04:06:25
|
>2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the > MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ). We can't bound the > time that memory is "exposed" to the network otherwise. This has been > implemented in sandiaportals, but not documented in the spec. I'm not sure about this one. Unless the target is down, a PtlPut() or PtlGet() should complete pretty quickly. If the target is down, the END event (in Portals 3.2/3.3) will eventually happen and indicate an error. When the END event happens is implementation specific but will probably be after a longish timeout when the message's retry limit is reached. there is a substantial window in the remote case allowing (for example) the target MD handle to be freed and reallocated causing unexpected results. without an instance-unique identifier (your suggested generation number or a random cookie), there doesn't seem any way to prevent this without placing additional restrictions on the use of the API. |
From: Kevin P. <kt...@ie...> - 2003-05-15 03:25:01
|
Eric Barton wrote: >Guys, > >I've got a few observations I'd like to share and get feedback on. > >1/ I think MD handles (and if them, why not all) need to be "single shot" > in implementations that purport to be thread-safe. > > This is needed to ensure that PtlMDUnlink() can only unlink the intended > MD when it is racing with an incoming message that could unlink the same > MD. If PtlMDUnlink() loses the race, the caller should get PTL_INV_MD. > However if the handle happens to get re-used (say by a concurrent thread > doing PtlMDAttach()), completely the wrong thing will happen! > > If you concur, it might be a good idea to point this out in the Portals > spec so implementers get the picture. > I agree. This is something we didn't think of when trying to make Portals thread safe. Or at least I didn't think of it. By "single shot" you mean adding a random number or generation count to each handle, right? > >2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the > MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ). We can't bound the > time that memory is "exposed" to the network otherwise. This has been > implemented in sandiaportals, but not documented in the spec. > I'm not sure about this one. Unless the target is down, a PtlPut() or PtlGet() should complete pretty quickly. If the target is down, the END event (in Portals 3.2/3.3) will eventually happen and indicate an error. When the END event happens is implementation specific but will probably be after a longish timeout when the message's retry limit is reached. > >3/ Why bother with specifying error return codes that _always_ mean the > programmer screwed up, rather than there was some resource shortage or > a lost race? > > For example it's highly unlikely there are any real programs that test > for PTL_NOINIT, and I bet most Portals implementations break if this has > to work "under fire" (i.e. comms racing with interfaces begin brought up > and down and the RC is being used to determine what state it's in). > > Why not core dump instead so (a) crap programmers get to know that they > _have_ screwed up and where, and (b) decent programmers don't have to > clutter their programs with unnecessary conditionals. > > Maybe this is a bit tongue in cheek, but some of the advertised error > codes can't be tested for efficiently, and if we did this, we could get > the number of return codes down to 3 or 4! > > This sounds logical to me. It's my understanding that Portals provides low-level building blocks that library writers can use to make higher-level, and friendlier, communication libraries (MPICH, Lustre, etc.). core dumping gets the message across and is easier to implement than things like returning PTL_NOINIT and handling PtlFini() and PtlNIFini() correctly. The Portals 3.3 draft that Ron is preparing contains the following section: 3.3 Return Codes The API specifies return codes that indicate success or failure of a function call. In the case where the failure is due to invalid arguments being passed into the function, the exact behavior of an implementation is undefined. The API suggests error codes that provide more detail about specific invalid parameters, but an implementation is not required to return these specific error codes. For example, an implementation is free to allow the caller to fault when given an invalid address, rather than return PTL_SEGV. In addition, an implementation is free to map these return codes to standard return codes where appropriate. For example, a Linux kernel-space implementation may want to map Portals return codes to POSIX-compliant return codes. |
From: Eric B. <er...@ba...> - 2003-05-14 20:16:17
|
Guys, I've got a few observations I'd like to share and get feedback on. 1/ I think MD handles (and if them, why not all) need to be "single shot" in implementations that purport to be thread-safe. This is needed to ensure that PtlMDUnlink() can only unlink the intended MD when it is racing with an incoming message that could unlink the same MD. If PtlMDUnlink() loses the race, the caller should get PTL_INV_MD. However if the handle happens to get re-used (say by a concurrent thread doing PtlMDAttach()), completely the wrong thing will happen! If you concur, it might be a good idea to point this out in the Portals spec so implementers get the picture. 2/ I _do_ believe there is a need to allow PtlMDUnlink() to apply to the MDs passed to PltMDGet() and PtlMDPut(PTL_ACK_REQ). We can't bound the time that memory is "exposed" to the network otherwise. This has been implemented in sandiaportals, but not documented in the spec. 3/ Why bother with specifying error return codes that _always_ mean the programmer screwed up, rather than there was some resource shortage or a lost race? For example it's highly unlikely there are any real programs that test for PTL_NOINIT, and I bet most Portals implementations break if this has to work "under fire" (i.e. comms racing with interfaces begin brought up and down and the RC is being used to determine what state it's in). Why not core dump instead so (a) crap programmers get to know that they _have_ screwed up and where, and (b) decent programmers don't have to clutter their programs with unnecessary conditionals. Maybe this is a bit tongue in cheek, but some of the advertised error codes can't be tested for efficiently, and if we did this, we could get the number of return codes down to 3 or 4! -- Cheers, Eric ---------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: er...@ba...| ---------------------------------------------------- |
From: Eric B. <er...@ba...> - 2003-05-06 16:05:21
|
Guys, This doesn't affect the portals API _at_all_, but will be of interest to implementors. I've implemented a new message type, so that portals over TCP/IP can verify that the peer on the other end of a socket is running the same protocol version and discover/verify its NID. The new message type, should be exchanged when a new portals/TCP/IP connection is established. In this message, all fields are zeroed except for... type: set to PTL_MSG_HELLO src_nid: the NID of the newly connected peer. dst_nid: A magic number which identifies the byte stream as a portals message stream, and version numbers which effectively describe how to parse the stream. Note that the dst_nid comprises the first few bytes on the wire. Note that the "common" payload length field (currently implemented as the macro PTL_HDR_LENGTH(h) since this length has been buried in the variant part of the header for historical reasons) is zero, implying no further data to follow. This message type has been implemented on CVS branch b_devel, but disabled by default. The structure definitions are in include/portals/lib-types.h and the linux 'acceptor' and 'ptlctl' utilities implement the exchange. In the not too distant future, these utilities will disappear, leaving the NALs to establish connections themselves. When this occurs, exchange of the "hello" message when connections are established should become mandatory. -- Cheers, Eric ---------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: er...@ba...| ---------------------------------------------------- |
From: Ron B. <rb...@va...> - 2003-04-07 17:18:42
|
> From: "David Leimbach" <dle...@mp...> > Subject: [Sandiaportals-devel] Implementation vs Standard? > Date: Mon, 7 Apr 2003 11:41:52 -0500 > To: san...@li... > > I don't know how I ever missed some of the inconsistencies but I was > just looking > at the CPlant Portals function PtlGetId. On CPlant this function takes > 3 parameters > but in every version of the standard I have seen it only takes 2. > > Portals 3.0: > 3.4.2 PtlGetId > int PtlGetId( ptl_process_id_t* id, ptl_id_t* gsize ); > > Portals 3.2 rev1: > 3.6.2 PtlGetId > int PtlGetId(ptl_handle_ni_t *ni_handle, ptl_process_id_t * id); > > I think the size field is useful... what happened to it? :) > > The CPlant implementation conforms to neither of the above. > This is one of those places where Portals was overlapping with the runtime system. The group size notion is specific to a collection of Portals processes running in the compute partition of parallel machine. We got rid of some of these things (eg. PtlBarrier) when we tried to make Portals more general-purpose in the changes from 3.1 to 3.2. The Cplant implementation has lagged for several reasons, but probably mostly because we didn't want to mess with code for a production platform that was working. -Ron |
From: David L. <dle...@mp...> - 2003-04-07 16:42:00
|
I don't know how I ever missed some of the inconsistencies but I was just looking at the CPlant Portals function PtlGetId. On CPlant this function takes 3 parameters but in every version of the standard I have seen it only takes 2. Portals 3.0: 3.4.2 PtlGetId int PtlGetId( ptl_process_id_t* id, ptl_id_t* gsize ); Portals 3.2 rev1: 3.6.2 PtlGetId int PtlGetId(ptl_handle_ni_t *ni_handle, ptl_process_id_t * id); I think the size field is useful... what happened to it? :) The CPlant implementation conforms to neither of the above. Dave --- David Leimbach Software Engineer MPI Software Technology Inc. Phone: 662-320-4300 x43 Fax: 662-320-4301 http://www.mpi-softtech.com The information contained in this communication may be confidential and is intended only for the use of the recipient(s) named above. If the reader of this communication is not the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you are not a named recipient or received this communication by mistake, please notify the sender and delete the communication and all copies of it. |
From: Trammell H. <hu...@os...> - 2003-04-06 17:56:12
|
[ cc'd to the Sandia portals development mailing list ] On Fri, Apr 04, 2003 at 05:11:43PM -0700, Zhiyong wrote: > [...] I have some questions in lib-parse. There are parse-put > and parse-get. In parse_put, it call cb_recv. The library is only responsible for decoding the Portals message header, so most NAL implementations will only read the header or first packet from the wire upon receiving an interrupt. The library decodes the header and in the event of a put operation it will ask the NAL to receive the rest of the message and copy the data into user memory. Once the message is fully received, the NAL is to do what ever it needs to do to reset the NIC to be ready for an incoming message. > In parse-get, it first called cb_recv, then it called > cb_send. The call to cb_nal->cb_recv() with a zero length and null user address token is to allow the NAL to flush anything from the wire that it needs to or to reset the state of the network. cb_nal->cb_send is then called to copy data from user memory to the wire. Hope this helps. Trammell -- -----|----- hu...@os... W 240-283-1700 *>=====[]L\ hu...@ro... M 505-463-1896 ' -'-`- http://www.swcp.com/~hudson/ KC5RNF |
From: Kevin P. <ped...@ie...> - 2003-04-03 19:29:09
|
Trammell, I thought you would be the best person to address this but please, others chime in if you can help. Why does the api-side nal only have validate()? invalidate() is missing. I can see how api-side memory validation/invalidation might involve the kernel whereas forward might always go to the NIC. However, if the nic can invalidate memory (which I'm assuming it can since invalidate() is missing) why can't it also validate it? Thanks, Kevin |
From: Ron B. <rb...@va...> - 2003-03-31 23:35:54
|
> > Let me try. Suppose an MPI process on node A sends message m1 then > > message m2 to a process on node B. By MPI semantics, m1 must arrive > > before m2 on node B. However, if the underlying MPI implementation > > starts the send of m2 before the send for m1 is complete (there are > > good > > reasons for wanting to do this.....), it is possible that the sending > > of > > m2 will end before the sending of m1. This, Ron needs to know which > > one > > started first and doesn't care which one ended first. > > > > Does portals guarantee that the message that "started" to send first > will > arrive first? Or does it guarantee that the one that "finished" > sending first > will arrive first? Or neither? :) Portals gurantees the same pairwise ordering semantics that MPI does. If A starts m1 and then starts m2, they will arrive (traverse the Portals structures) in that order and, if they end up generating events on the same event queue, m1's PUT_START event will be delivered from the API before m2's PUT_START event. Portals doesn't say anything about the order of the completion of m1 and m2. > > In Portals 3.0 all I had to go by was the single event for the send... > and now I > am confused :) Portals 3.0 guranteed that the messages arrived (traversed the Portals structures) in order, but did not restrict how the subsequent events were delivered from the API. From an MPI implementor's point of view, you could never be sure that there wasn't a "newer" message that might show up in the event queue. > > My internal wiring tells me its the "send-end" that matters not the > "send-start" > but that could just be my thought about internal buffering of the sent > message > which I guess ideally there is none of. [reduce the amount of copies > of the message] > The PUT_START events are only important if you need to insure pairwise ordering. -Ron |
From: David L. <dleimbac@MPI-SoftTech.Com> - 2003-03-31 19:21:35
|
On Monday, March 31, 2003, at 09:50 AM, Arthur B. (Barney) Maccabe wrote: >> >>> Start events are only there to preserve order. Most of the >>> higher-level >>> protocols that we implement on top of Portals need to preserve >>> ordering. >>> You may not need it in the kernel, but we need to do it for the >>> user-level. >> >> Given that END events are not guaranteed ordered (i.e. completion of >> 'y' >> initiated after 'x' does not guarantee completion of 'x') I'm not >> sure I follow. >> If it's not too much of a drag, could you give me an example? > > Let me try. Suppose an MPI process on node A sends message m1 then > message m2 to a process on node B. By MPI semantics, m1 must arrive > before m2 on node B. However, if the underlying MPI implementation > starts the send of m2 before the send for m1 is complete (there are > good > reasons for wanting to do this.....), it is possible that the sending > of > m2 will end before the sending of m1. This, Ron needs to know which > one > started first and doesn't care which one ended first. > Does portals guarantee that the message that "started" to send first will arrive first? Or does it guarantee that the one that "finished" sending first will arrive first? Or neither? :) In Portals 3.0 all I had to go by was the single event for the send... and now I am confused :) My internal wiring tells me its the "send-end" that matters not the "send-start" but that could just be my thought about internal buffering of the sent message which I guess ideally there is none of. [reduce the amount of copies of the message] Dave --- David Leimbach Software Engineer MPI Software Technology Inc. Phone: 662-320-4300 x43 Fax: 662-320-4301 http://www.mpi-softtech.com The information contained in this communication may be confidential and is intended only for the use of the recipient(s) named above. If the reader of this communication is not the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you are not a named recipient or received this communication by mistake, please notify the sender and delete the communication and all copies of it. |
From: Arthur B. (B. M. <ma...@cs...> - 2003-03-31 17:27:59
|
On Mon, 2003-03-31 at 09:36, Eric Barton wrote: > Barney, > > If I read you right... > > 1/ There are only 2 types of events; START and END. Yes. > > 2/ I can disable either or both types via the MD options > PTL_MD_DISABLE_START_EVENT or PTL_MD_DISABLE_END_EVENT Yes. > > 3/ The event tells me (somehow) if the MD is now inactive > and/or has been unlinked. Yes. > > ...then I'm a happy chappie :) I thought you might say that..... well, not exactly that, but the general idea :) > > BTW, is it only OK to specify PTL_EQ_NONE if both PTL_MD_DISABLE_START_EVENT > and PTL_MD_DISABLE_END_EVENT are present? There are possible inconsistencies. You might also provide an event queue and disable both types of events. (I told you, bit maps are evil.) > > Also, I guess half of (3) can be inferred by looking at the MD in the > event... > > inactive = (threshold == 0 || > ((options & PTL_MD_MAX_SIZE) != 0 && > (offset > length - max_size))) > > ...but I need to remember if 'unlink_op' was set on the attach to > infer that it was automatically unlinked. I think the reason(s) for unlinking should be explicit -- assuming that we have space in the event struct. > > I can live with that... > > Cheers, > Eric > > ---------------------------------------------------- > |Eric Barton Barton Software | > |9 York Gardens Tel: +44 (117) 330 1575 | > |Clifton Mobile: +44 (7909) 680 356 | > |Bristol BS8 4LL Fax: call first | > |United Kingdom E-Mail: er...@ba...| > ---------------------------------------------------- -- Arthur B. (Barney) Maccabe Associate Professor Computer Science Department Associate Director The UNM Center for High Performance Computing email: ma...@cs... http://www.cs.unm.edu/~maccabe The University of New Mexico voice: (505) 277-6504 Albuquerque, NM 87131-1386 FAX: (505) 277-6927 |
From: Eric B. <er...@ba...> - 2003-03-31 16:37:29
|
Barney, If I read you right... 1/ There are only 2 types of events; START and END. 2/ I can disable either or both types via the MD options PTL_MD_DISABLE_START_EVENT or PTL_MD_DISABLE_END_EVENT 3/ The event tells me (somehow) if the MD is now inactive and/or has been unlinked. ...then I'm a happy chappie :) BTW, is it only OK to specify PTL_EQ_NONE if both PTL_MD_DISABLE_START_EVENT and PTL_MD_DISABLE_END_EVENT are present? Also, I guess half of (3) can be inferred by looking at the MD in the event... inactive = (threshold == 0 || ((options & PTL_MD_MAX_SIZE) != 0 && (offset > length - max_size))) ...but I need to remember if 'unlink_op' was set on the attach to infer that it was automatically unlinked. I can live with that... Cheers, Eric ---------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: er...@ba...| ---------------------------------------------------- |
From: Arthur B. (B. M. <ma...@cs...> - 2003-03-31 15:52:59
|
On Mon, 2003-03-31 at 03:40, Eric Barton wrote: > I guess my feeling was that exceeding the maximum offset and the threshold > reaching zero, were both ways in which an MD could be "exhausted". > PTL_EVENT_INACTIVE certainly says this clearly, but I _do_ feel performance > takes a hit as the number of events delivered proliferates. > > Could we turn the event type into a bitmask? Then a single event (and it's > accompanying locking/context switching overhead) could tell you everything > that happened; e.g. (PTL_EVENT_GET_END | PTL_EVENT_UNLINK | PTL_EVENT_INACTIVE). GACK! A bit mask..... me fears this might be overkill...... As I understand events (so far), we essentially have a START and and END with an indication of whether or not the operation succeeded or failed. By the way, this was a good suggestion, it clears up cruft that was creeping into the API. Given the semantics of unlinking, along with the END we could deliver an indication of whether or not the MD was unlinked and why (offset exceeded or threshold). Thus, every operation generates exactly two events, START and END. The END event tells you a lot about what is happening. It might make more sense to associate unlinking with the START event, this is when the MD should be unlinked by the underlying implementation. However, *some people* don't want to look at START events and everyone will need to look at END events, to make this easier I suggest that we minimize anything that we associate with START events > > > Start events are only there to preserve order. Most of the higher-level > > protocols that we implement on top of Portals need to preserve ordering. > > You may not need it in the kernel, but we need to do it for the user-level. > > Given that END events are not guaranteed ordered (i.e. completion of 'y' > initiated after 'x' does not guarantee completion of 'x') I'm not sure I follow. > If it's not too much of a drag, could you give me an example? Let me try. Suppose an MPI process on node A sends message m1 then message m2 to a process on node B. By MPI semantics, m1 must arrive before m2 on node B. However, if the underlying MPI implementation starts the send of m2 before the send for m1 is complete (there are good reasons for wanting to do this.....), it is possible that the sending of m2 will end before the sending of m1. This, Ron needs to know which one started first and doesn't care which one ended first. > > > > Anyway, assuming START events are here to stay, and I neither want to > > > receive them, nor take the locking/context switching overhead of > > > ignoring them, why not let me specify that I don't want them when I > > > create the MD? It's already possible to filter out _all_ events by > > > specifying PTL_EQ_NONE when the MD gets built, and we've got lots of > > > options bits left :) > > > > That would be ok with me. It may be a good way to convey the ordering > > semantics that one would like to have. I think Barney was the most vocal > > opponent of this... > > If my suggestion above for the event type to be a bitmask is good, it would be > easy :)) Bit mask this, bit mask that, ...... one day, you will learn that bit masks, macros, and var args are the true axis of evil..... As the "most vocal opponent," I ought to chime in ...... My position has been that you are solving a non-problem and adding cruft to the API that solves a non-problem messes up the API. I think we were pretty clear that there is very little cost for actually delivering the event to the application layer. The cost it seems will be in the application that has to deal with events it doesn't want. Why not write a macro that will drop the events you don't want to see? (That is, substitute one form of evil for another....) OK, that was fun...... How about we add two more options to the MD options: PTL_MD_DISABLE_START_EVENT and PTL_MD_DISABLE_END_EVENT. (The negative was intentional -- by default, you get both types of events, you have to do something to disable one type or the other.) -- Arthur B. (Barney) Maccabe Associate Professor Computer Science Department Associate Director The UNM Center for High Performance Computing email: ma...@cs... http://www.cs.unm.edu/~maccabe The University of New Mexico voice: (505) 277-6504 Albuquerque, NM 87131-1386 FAX: (505) 277-6927 |
From: Eric B. <er...@ba...> - 2003-03-31 10:41:12
|
Ron, Thanks for your mail. > > Incidentally, is there any reason why we don't set the MD's threshold to > > zero when offset exceeds max_offset/(length - max_size)? It would > > provide a clear indication in the corresponding event that the MD is now > > inactive. > > Yes, because they are two different things. It sounds like what you want is > an easier way to tell that an MD is inactive. PTL_EVENT_UNLINK (which is in > the current spec) is the easy way to tell for an MD that automatically unlinks. > Would you want something like PTL_EVENT_INACTIVE? I guess my feeling was that exceeding the maximum offset and the threshold reaching zero, were both ways in which an MD could be "exhausted". PTL_EVENT_INACTIVE certainly says this clearly, but I _do_ feel performance takes a hit as the number of events delivered proliferates. Could we turn the event type into a bitmask? Then a single event (and it's accompanying locking/context switching overhead) could tell you everything that happened; e.g. (PTL_EVENT_GET_END | PTL_EVENT_UNLINK | PTL_EVENT_INACTIVE). > Start events are only there to preserve order. Most of the higher-level > protocols that we implement on top of Portals need to preserve ordering. > You may not need it in the kernel, but we need to do it for the user-level. Given that END events are not guaranteed ordered (i.e. completion of 'y' initiated after 'x' does not guarantee completion of 'x') I'm not sure I follow. If it's not too much of a drag, could you give me an example? > > Anyway, assuming START events are here to stay, and I neither want to > > receive them, nor take the locking/context switching overhead of > > ignoring them, why not let me specify that I don't want them when I > > create the MD? It's already possible to filter out _all_ events by > > specifying PTL_EQ_NONE when the MD gets built, and we've got lots of > > options bits left :) > > That would be ok with me. It may be a good way to convey the ordering > semantics that one would like to have. I think Barney was the most vocal > opponent of this... If my suggestion above for the event type to be a bitmask is good, it would be easy :)) -- Cheers, Eric ---------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: er...@ba...| ---------------------------------------------------- |