Has anybody managed to utilize the 2 cores of a dual-core AMD64 CPU, without adapting the code? I get only 50% CPU load with ITPP simulations, just one CPU core.
I wonder if the ACML-MP or MKL-MP variants can do the parallization for long calculations (FFT, solvers) internally. Or does the "multi-core support" in the features annoucement mean, I have to do this manually by multi-thread coding?
Similar disappointments I had in MATLAB. They have the preference-option "enable multithreaded computation" on multiple cores. But it does not work. Just 50% overall CPU load.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is the question to ACML or MKL support.
IT++ is written in a single threaded manner, although I agree that it would be nice to have some support for openMP parallelism included.
Personally I use MPI when I need to write a parallel simulator suited to my needs, which can be run on a cluster of dozens of nodes.
/Adam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not familiar with the OpenMP and MPI concepts. Cluster computing is another issue. At the moment I'm thinking about the 50% unused dual-core capacity. I'm just wondering, if it is possible, with existing MATLAB-like IT++ code, to do some automatic parrallelism. With a simple re-compile with the appropriate MP library. I.e. distribute a large FFT symmetrically over 2 cores. I hoped that the MP-variants of ACML and MKL will do this, but apparently not (MP = just tread safe variant). If it's not possible, Ok, then I will try to do this by manually creating 2 FFT threads. I will ask in the ACML Forum.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I checked the MKL 10 documents. They say, even if the user (or IT++) code itself is not multithreaded or thread-safe, the MKL library will do multi-core parallelization for some functions like FFT automatically (internal MKL computing threads). ACML 4 doc: "Furthermore, key LAPACK routines have been treated using OpenMP to take advantage of multiple processors when running on SMP machines. Your application will automatically benefit when you link with the
OpenMP versions of ACML." I saw some OpenMP examples, where they came from 50% CPU load to 100% on Dualcore-CPU's just by introducing OpenMP, without changing the algoritm ("for" loops) at all. The execution time for this loop test was just half of the non-MP-compiled version.
For ACML 3.6 GCC on Windows they don't offer OpenMP. So it would be very interesting to have ACML 4 support on Windows in future versions of IT++ (solving the name mangling problem).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It works. OpenMP parallelism is great. However, the compiler should be enabled. A new GCC 4.2.3 for Cygwin is made like this (watch for "libgomp", this enables OpenMP):
GCC can call the ACML 4 DLL Intel Fortran version without problem. I got matrix multiplication in ACML 4 working on 2 cores, without changing the code, just by compiling with the "-fopenmp" switch. FFT I got in parallel with an #pragma switch in my FFT loop.
So now let's hope that IT++ will support ACML4 soon.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> So now let's hope that IT++ will support ACML4 soon.
In fact, it already does but on Linux only.
Besides, you should not expect having the support for another platforms automatically included in IT++. IT++ is an open-source library licensed under the GNU GPL license. This practically means that there are no people behind IT++, who are paid for working on it. Therefore, unless you or other users provide a ready (or almost) ready to use solution, which can be easily incorporated into IT++ without braking other things, you may only dream that "IT++ will support ACML4 (built with Intel Fortran) soon".
If you think differently, you can always try to contact any of the IT++ developers directly and offer him some gratification for the particular work you would like him to do. But this does not guarantee that the things you request will be accepted.
Sorry, but this is how open-source model works for most projects.
BR,
/Adam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Frank
Could you provide a small example of your matrix multiplication program using two cores. I have not succeed to do the same thing on Linux and I am interested in this topic.
regards
Bogdan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, I know, I had it run with ACML 4 on Linux already. This is the most elegant way. But Windows is not a minority platform. Maybe Cygwin, but I suppose Visual-Studio users will run into the same problem, since func_ is demanded but only FUNC provided by the Lib.
I tried to correct it manually by
define fortranfunct_ FORTRANFUNCT
But it needs some architecual changes in the autoconf and the switches between the libraries. But before changing, it needs to understand the problem, also with other compilers from other users. It does not help if you delete the bug report.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To me it is not a bug but a lack of a particular feature. And as it is clearly stated (in red) in the bug submission page that the bug tracker is only for confirmed bugs in IT++. The Help forum is for discussing problems, missing functionality, etc. Therefore, I had to remove your report from the bug tracker. Sorry!
If you really have something to contribute in this area, you can open a new Feature Request ticket and there attach some patches, etc.
But please to not expect that someone else will immediately start working on this issue, just because you would like to see it in IT++.
BR,
/Adam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The parallel matrix multiplication was from the examples directory in the ACML. Just call
make OMP_NUM_THREADS=2
Parallel FFT does not seem to work automatically. But I made a FFT loop in C, with a #pragma in the source. This also was running the loop partitions in parallel.
You need the GCC switch -fopenmp and a GCC which does support it, see above.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't understand the autoconf scripting. No interest from anybody to make IT++ ACML4-compatible on the Windows platform? Before taking any action, the problem has to be discussed in deep, to understand the linker problems.
If you're interested, this is how I made the FFT loop working in parallel:
It's running with the ACML4 (IFORT32 DLL), compiled with Cygwin/GCC 4.2.3, both on Athlon64 and Intel CoreDuo (I don't have the MKL). Note, that the cff1d function interface is calling the C variant. The Fortran variant would be CFFT1D. IT++ is calling Fortran interfaces, except for the FFT functions. Run time went down from 50sec single-core to 30sec dual-core (on Pentium). This was without IT++. I think the IT++ code does not have to change for a similar loop in IT++. The user can do the #pragma in an IT++ loop.
A single big FFT 1D does not run in parallel without the #pragma, while the DGEMM does.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Has anybody managed to utilize the 2 cores of a dual-core AMD64 CPU, without adapting the code? I get only 50% CPU load with ITPP simulations, just one CPU core.
I wonder if the ACML-MP or MKL-MP variants can do the parallization for long calculations (FFT, solvers) internally. Or does the "multi-core support" in the features annoucement mean, I have to do this manually by multi-thread coding?
Similar disappointments I had in MATLAB. They have the preference-option "enable multithreaded computation" on multiple cores. But it does not work. Just 50% overall CPU load.
This is the question to ACML or MKL support.
IT++ is written in a single threaded manner, although I agree that it would be nice to have some support for openMP parallelism included.
Personally I use MPI when I need to write a parallel simulator suited to my needs, which can be run on a cluster of dozens of nodes.
/Adam
I'm not familiar with the OpenMP and MPI concepts. Cluster computing is another issue. At the moment I'm thinking about the 50% unused dual-core capacity. I'm just wondering, if it is possible, with existing MATLAB-like IT++ code, to do some automatic parrallelism. With a simple re-compile with the appropriate MP library. I.e. distribute a large FFT symmetrically over 2 cores. I hoped that the MP-variants of ACML and MKL will do this, but apparently not (MP = just tread safe variant). If it's not possible, Ok, then I will try to do this by manually creating 2 FFT threads. I will ask in the ACML Forum.
I checked the MKL 10 documents. They say, even if the user (or IT++) code itself is not multithreaded or thread-safe, the MKL library will do multi-core parallelization for some functions like FFT automatically (internal MKL computing threads). ACML 4 doc: "Furthermore, key LAPACK routines have been treated using OpenMP to take advantage of multiple processors when running on SMP machines. Your application will automatically benefit when you link with the
OpenMP versions of ACML." I saw some OpenMP examples, where they came from 50% CPU load to 100% on Dualcore-CPU's just by introducing OpenMP, without changing the algoritm ("for" loops) at all. The execution time for this loop test was just half of the non-MP-compiled version.
For ACML 3.6 GCC on Windows they don't offer OpenMP. So it would be very interesting to have ACML 4 support on Windows in future versions of IT++ (solving the name mangling problem).
It works. OpenMP parallelism is great. However, the compiler should be enabled. A new GCC 4.2.3 for Cygwin is made like this (watch for "libgomp", this enables OpenMP):
../gcc-4.2.3/configure --disable-nls --enable-threads=posix --enable-libgomp --with-x --enable-java-awt=gtk,xlib --without-included-gettext --enable-version-specific-runtime-libs --with-system-zlib --disable-win32-registry --enable-sjlj-exceptions --enable-hash-synchronization --enable-libstdcxx-debug
GCC can call the ACML 4 DLL Intel Fortran version without problem. I got matrix multiplication in ACML 4 working on 2 cores, without changing the code, just by compiling with the "-fopenmp" switch. FFT I got in parallel with an #pragma switch in my FFT loop.
So now let's hope that IT++ will support ACML4 soon.
Hi Frank,
> So now let's hope that IT++ will support ACML4 soon.
In fact, it already does but on Linux only.
Besides, you should not expect having the support for another platforms automatically included in IT++. IT++ is an open-source library licensed under the GNU GPL license. This practically means that there are no people behind IT++, who are paid for working on it. Therefore, unless you or other users provide a ready (or almost) ready to use solution, which can be easily incorporated into IT++ without braking other things, you may only dream that "IT++ will support ACML4 (built with Intel Fortran) soon".
If you think differently, you can always try to contact any of the IT++ developers directly and offer him some gratification for the particular work you would like him to do. But this does not guarantee that the things you request will be accepted.
Sorry, but this is how open-source model works for most projects.
BR,
/Adam
Hi Frank
Could you provide a small example of your matrix multiplication program using two cores. I have not succeed to do the same thing on Linux and I am interested in this topic.
regards
Bogdan
Yes, I know, I had it run with ACML 4 on Linux already. This is the most elegant way. But Windows is not a minority platform. Maybe Cygwin, but I suppose Visual-Studio users will run into the same problem, since func_ is demanded but only FUNC provided by the Lib.
I tried to correct it manually by
define fortranfunct_ FORTRANFUNCT
But it needs some architecual changes in the autoconf and the switches between the libraries. But before changing, it needs to understand the problem, also with other compilers from other users. It does not help if you delete the bug report.
Frank,
To me it is not a bug but a lack of a particular feature. And as it is clearly stated (in red) in the bug submission page that the bug tracker is only for confirmed bugs in IT++. The Help forum is for discussing problems, missing functionality, etc. Therefore, I had to remove your report from the bug tracker. Sorry!
If you really have something to contribute in this area, you can open a new Feature Request ticket and there attach some patches, etc.
But please to not expect that someone else will immediately start working on this issue, just because you would like to see it in IT++.
BR,
/Adam
The parallel matrix multiplication was from the examples directory in the ACML. Just call
make OMP_NUM_THREADS=2
Parallel FFT does not seem to work automatically. But I made a FFT loop in C, with a #pragma in the source. This also was running the loop partitions in parallel.
You need the GCC switch -fopenmp and a GCC which does support it, see above.
I don't understand the autoconf scripting. No interest from anybody to make IT++ ACML4-compatible on the Windows platform? Before taking any action, the problem has to be discussed in deep, to understand the linker problems.
If you're interested, this is how I made the FFT loop working in parallel:
pragma omp parallel
for(j=0; j<ntimes; j++)
cfft1d(0,n,x,comm,&info);
#pragma omp barrier
It's running with the ACML4 (IFORT32 DLL), compiled with Cygwin/GCC 4.2.3, both on Athlon64 and Intel CoreDuo (I don't have the MKL). Note, that the cff1d function interface is calling the C variant. The Fortran variant would be CFFT1D. IT++ is calling Fortran interfaces, except for the FFT functions. Run time went down from 50sec single-core to 30sec dual-core (on Pentium). This was without IT++. I think the IT++ code does not have to change for a similar loop in IT++. The user can do the #pragma in an IT++ loop.
A single big FFT 1D does not run in parallel without the #pragma, while the DGEMM does.