JAGS on mulitple cores

  • Michael Rutter
    Michael Rutter

    Based on the install manual, there are a number of ways to compile JAGS
    against different BLAS libraries. I have tried a number of them, but I am
    still unable to get JAGS to to run multiple chains on multiple cores. Has
    anyone successfully had JAGS running on multiple cores, short of starting
    multiple instances of JAGS?


  • xevi

    Yes, I have successfully run JAGS on multiple cores on a Gentoo GNU/Linux
    using the Atlas BLAS library and compiling JAGS against it.

    Which OS do you use? Which "configure" command do you run? Please, specify
    more details so that we can help you.

  • Michael Rutter
    Michael Rutter

    I am running 64-bit Ubuntu (lucid) with an AMD processor. I have tried a
    couple of different "configure" commands, all based on what is recommended in
    the manual. For example, I tried to use the AMD math libraries with:

    LDFLAGS="-L/opt/acml4.3.0/gfortran64_mp/lib" \

    CXXFLAGS="-O2 -g -fopenmp" ./configure --with-blas="-lacml_mp -lacml_mv

    I have also used the GOTO BLAS libraries, using the stock "./configure" after
    replacing the default BLAS libraries with GOTO. I know the GOTO BLAS libraries
    are multi-core, as it works under R.

    Thanks for any help

  • Martyn Plummer
    Martyn Plummer

    A few points:

    • Not all models will use BLAS/LAPACK calls. They are only needed when you have multivariate distributions such as multivariate normal and Wishart.
    • BLAS speed probably isn't a bottleneck, even when you are using it
    • As stated in the manual, when linking against ACML, the number of threads is controlled by OMP_NUM_THREADS, so you will need to set this to the number of cores you want to use in order to get multi-core performance.
  • xevi

    I have not tried the ACML, but I have successfully build jags to run with
    GotoBLAS2, with multiple chains. I use:

    LDFLAGS="-L/usr/local/GotoBLAS2" ./configure CFLAGS="-march=core2 -O2 -fomit-
    frame-pointer" FFLAGS="-march=core2 -O2 -fomit-frame-pointer"
    CXXFLAGS="-march=core2 -O2 -fomit-frame-pointer"

    And GotoBLAS2:


    As martyn_plummer points out, probably BLAS is not the bottleneck, even if it
    can make some differences.

  • Martyn Plummer
    Martyn Plummer

    I just tested JAGS compiled against ACML on my quad-core desktop using the
    model "birats2.bug" from the classic bugs examples, volume 1. It does run
    multi-threaded, but in performance terms, single threaded is the best!


    real 0m15.046s

    user 0m11.311s

    sys 0m3.699s


    real 0m25.980s

    user 0m45.282s

    sys 0m6.583s


    real 0m27.650s

    user 1m16.371s

    sys 0m6.377s


    real 0m28.409s

    user 1m47.518s

    sys 0m5.762s

    All the multi-threaded runs take between 25-30 seconds, as opposed to 15
    seconds from the single threaded version. As we add more threads, the "user"
    time increases (this is the sum of the different cores) but the real time does
    not decrease.

  • Michael Rutter
    Michael Rutter

    Thank you for the useful replies. I have just acquired a quad-core processor,
    so as soon as I get that installed and running, I can begin to test these

    Thanks again.