I am a new IT++ user and I am using it to simulate an OFDM-like system. I did a profile in my program and verified that most of the time is spent on the generation of channel coefficients (it is even slower than doing the same operations with matlab). I took a look at the source code of, for example, the Rice_Fading_Generator and noted some points that might be optimized for performance. For example,
I also implemented the modifications that I mentioned and recompiled the library. In fact, the increase in the speed was not improved. I am really sorry for this error of mine. I will continue to use IT++ and if I find and proof something interesting, I will share it with you.
BR,
tfmaciel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Encouraged by your suggestion, I tried to optimise the Rice_Fading_Generator today by the following steps:
- moving of 2pin_dopp calculation to the function where f1 and f2 vectors are calculated,
- replacing sum(elem_mult(...)) operations with one loop (sum uses a 'for' loop and elem_mult uses another one):
for (int j=0; j < c1.size(); j++) {
out_re += c1 * cos((i + time_offset) * f1(j) + th1(j));
...
}
output(i) = complex<double>(out_re, out_im);
And the simulation time improvement was negligible. My OFDM simulator with ITU_Vehicular_A channel model (Rice MEDS method) run for 237s instead of 241s before modifications. So, as you can see, the gain is not too big - only about 1.5%. I compiled the IT++ and my simulator using a few g++ optimisations for my processor (-march=pentium3 -ffast-math -mfpmath=sse -msse -O3).
Summarising, the main inefficiency is not due to the way the code is writen, since the compiler optimises the code during compilation, but due to a general algorithm complexity.
Anyway, if you find some hints how and where to improve the codes to make them more efficient and you can prove it by some simulation results we will be glad to hear from you again.
BR
/ediap
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everybody,
I am a new IT++ user and I am using it to simulate an OFDM-like system. I did a profile in my program and verified that most of the time is spent on the generation of channel coefficients (it is even slower than doing the same operations with matlab). I took a look at the source code of, for example, the Rice_Fading_Generator and noted some points that might be optimized for performance. For example,
00197 void Rice_Fading_Generator::generate(const int no_samples, cvec &output)
00198 {
00199 if (init_flag == false)
00200 init();
00201
00202 if (n_dopp == 0.0)
00203 generate_zero_doppler(no_samples, output);
00204 else {
00205 output.set_size(no_samples, false);
00206
00207 for (int i=0; i<no_samples; i++) {
00208 output(i) = std::complex<double>( sum( elem_mult( c1, cos(2pif1n_dopp(i+time_offset)+th1) ) ),
00209 sum( elem_mult( c2, cos(2pif2n_dopp(i+time_offset)+th2) ) ) );
00210 }
00211
00212 if(los_power > 0.0) { // LOS component exist
00213 double diffuse = std::sqrt(1.0/(1.0+los_power));
00214 double direct = diffusestd::sqrt(los_power);
00215 for (int i=0; i<no_samples; i++)
00216 output(i) = diffuseoutput(i) + directstd::complex<double>(std::cos(2pilos_doppn_dopp(i+time_offset)),std::sin(2pilos_doppn_dopp*(i+time_offset)));
00217 }
00218 time_offset += no_samples;
00219 }
00220 }
In this function, 2pif1n_dopp and 2pif2n_dopp are constant and these multiplications could be avoided (at the expenses of more memory).
Dear ediap,
I also implemented the modifications that I mentioned and recompiled the library. In fact, the increase in the speed was not improved. I am really sorry for this error of mine. I will continue to use IT++ and if I find and proof something interesting, I will share it with you.
BR,
tfmaciel
Dear tfmaciel,
Have you tested the proposed solution?
Encouraged by your suggestion, I tried to optimise the Rice_Fading_Generator today by the following steps:
- moving of 2pin_dopp calculation to the function where f1 and f2 vectors are calculated,
- replacing sum(elem_mult(...)) operations with one loop (sum uses a 'for' loop and elem_mult uses another one):
for (int j=0; j < c1.size(); j++) {
out_re += c1 * cos((i + time_offset) * f1(j) + th1(j));
...
}
output(i) = complex<double>(out_re, out_im);
And the simulation time improvement was negligible. My OFDM simulator with ITU_Vehicular_A channel model (Rice MEDS method) run for 237s instead of 241s before modifications. So, as you can see, the gain is not too big - only about 1.5%. I compiled the IT++ and my simulator using a few g++ optimisations for my processor (-march=pentium3 -ffast-math -mfpmath=sse -msse -O3).
Summarising, the main inefficiency is not due to the way the code is writen, since the compiler optimises the code during compilation, but due to a general algorithm complexity.
Anyway, if you find some hints how and where to improve the codes to make them more efficient and you can prove it by some simulation results we will be glad to hear from you again.
BR
/ediap