Menu

data placement for openmp

2004-08-20
2004-08-20
  • Nobody/Anonymous

    I'm interested in using this library for my C++ simulation codes. I think the library has many potentials to improve performance for OpenMP-parallel C++ codes on ccNuma machines (SGI altix is the main target).

    We find that the data placement is very critical for the performance and scaling for our simulations. To do so, we have developed a simple allocator that manages a chunk of memory on each physical processor. The main objects are shared but they reside on the physical processor by the allocator. In our simulations, we can structure the code so that the objects are read-only by remote processors, while the processor who owns the object can read and write. However, the classes are not generic so that I can use the allocator for any object nor compatible with STL.

    My question is if it is possible to use (with some modifications if necessary) your library to control data placement to each physical processor to implement the ideas I described above?

    Thanks in advance.

     
    • Marc Bumble

      Marc Bumble - 2004-08-20

      Perhaps  I   do  not  understand   your  question.
      Effectively,  this   allocator  works  by  calling
      shm_open(), a  POSIX system  call, to open  a UNIX
      shared memory segment.  Then,  mmap is used to map
      that shared memory segment to an address specified
      by the allocator parameter  keys.  The rest of the
      allocator deterministically  divides up the memory
      and  provides element  sized portions  to  the STL
      container.  Shared access  is provided by allowing
      secondary  processes to  use an  identical  key to
      open existing shared memory segments and attach to
      existing   containers  placed   in   those  shared
      segments.

      What you  could perhaps  do is, assuming  you have
      enough  control,  allow  the  different  processor
      processes to have  separate keys.  So assuming you
      have 4  processors, 1, 2,  3, and 4.   Processor 1
      could create  allocators using  keys 1, 5,  9, 13,
      ...   Processor 2  could  create allocators  using
      keys 2, 6, 10, 14, ..., and so forth.  Processor 2
      could access  the elements created  by Processor 1
      and  stored using  key 1.   And Processor  1 could
      access the elements  stored in the allocator using
      key  6 created  by Processor  2.  The  issue would
      just be having  the underlying system calls create
      the shared memory segments in memory controlled by
      specific   processors.   This   assumes   you  are
      describing a single multiprocessor machine.  Also,
      it assumes that the  processes are not migrated by
      the  operating system  among  the processors,  in
      which  case   the  situation  is   obviously  more
      dynamic.   If   system  memory  is   allocated  to
      processors based on physical address, then perhaps
      your goal is reasonable.

      Also, if you are  thinking about memory control in
      terms of say how  elements in a matrix are aligned
      in memory, you do  not have that much control with
      C++ STL allocators.  The allocators simply request
      blocks   of  adjacent   memory  cells   for  their
      containers.  So  the container elements  end up in
      adjacent cells, but its  not quite the same as row
      major versus column  major oriented array layouts,
      as  in  Fortran.   There  is  not  that  level  of
      control.

      Please accept my apologies if I have misunderstood
      your question.  Thank you for your interest.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.