#5 Multi-part messages can get out of sync if source=-1

closed
None
5
2005-02-23
2003-09-09
Nathan P. Stien
No

When using pyMPI on a big cluster in a heavy-traffic
situation, it is possible that the two-part message
mechanism used in mpi.recv() can get confused.

Specifically, this happens when trying to receive when
source = -1 (denoting a receive should be done from the
first available source). The initial part of a message
is received in pympi_pt_to_pt_recv_implementation() and
pyMPI does another MPI_Recv to grab the remaining
bytes. If the source is -1, it will receive any
message, no matter what it is, even if it's the init
message coming from an altogether different node. That
second init message (plus any garbage memory to fill
the gap) will be copied into the buffer instead of the
real message remainder.

The fix is to detect where the init message came from
and then MPI_Recv from that source only. I have
enclosed a small patch that does this by modifying
comm_point_to_point.c.

Discussion

  • Patch to make sure messages stay in sync

     
    Attachments
  • zudatron
    zudatron
    2004-06-16

    Logged In: YES
    user_id=1064893

    Thanks for submitting this. This bug has been the root of
    my problems for the last week. I almost had to abandon
    pyMPI. I see the fix is incorporated in v2.0b0 (also
    addresses tags) but I'm stuck using 1.3 for now.

     
  • Patrick Miller
    Patrick Miller
    2005-02-23

    • assigned_to: nobody --> patmiller
    • status: open --> closed
     
  • Patrick Miller
    Patrick Miller
    2005-02-23

    Logged In: YES
    user_id=30074

    Fixed in 2.0