pnumpy does not impose any domain decomposition on your array, you're free to use any MPI topology that you like. In order to fetch remote data; however, you will need to know which processes are neighbor to a given sub-array. The following will help you achieve optimal decomposition in the sense of reducing the ratio of surface (communication) to volume (work) and determine the neighbors to a rank.
import pnumpy
# create a domain decomposition for 24 procs, the topology is a 2x3x4 cube
decomp = pnumpy.CubeDecomp(nprocs = 24, dims = (2, 3, 4))
# find the rank one block down from rank 5
neighRk = decomp.getNeighborProc(5, offset = (0, 0, -1))
Note that getNeighborProc can return None (no neighbor). You can specify which axes wrap around with te additional periodic argument
neighRk = decomp.getNeighborProc(5, offset = (0, 0, -1), periodic = (True, False, True))
Also note that there are cases where no regular domain decomposition can be found, which happens anytime the number of processes does not match the product of the dimensions.