Re: [Eidors Help] fwd_model_parameters efficiency

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

HI Alistair,

After some more thinking about this, I decided to check
the sparse matrix efficiency of matlab. It turns out that it's
unbelieveably badly coded. Since (to be general) we use
sparse matrices as selector matrices, this cost is serious
for EIDORS.

Here's some code
rhs = rand(10e3,1e3);
lhs = sparse(16,10e3);
lhs(:,end-15:end) = eye(16);

tic; for i=1:100
   ans = lhs * rhs;              % MATLAB MULTIPLY
end; toc

tic; for i=1:100
   ans = rhs(end-15:end, :);     % DIRECT SELECT
end; toc

idx= any(lhs);
tic; for i=1:100
   ans = lhs(:,idx)*rhs(idx,:);  % SELECT WITH IDX
end; toc

idx= find(any(lhs));
tic; for i=1:100
   ans = lhs(:,idx)*rhs(idx,:);  % SELECT WITH FIND
end; toc

TIMES ARE:
Elapsed time is 0.866917 seconds.
Elapsed time is 0.005294 seconds.
Elapsed time is 0.014516 seconds.
Elapsed time is 0.014434 seconds.

A *60 times* improvement.

QUESTION:
To solve this, we can
  - rewrite every matrix multiply in EIDORS
  - create a function 'left_multiply' that is more efficient
  - write a mex file with our own class which overrides the Matlab ones

Ideas?

(I've checked in some code which makes some of this faster)
--
Andy Adler <ad...@sc...> +1-613-520-2600x8785

On 3 November 2013 22:47, Andy Adler <ad...@sc...> wrote:
> Hi Alistair,
>
> I've spend a few hours looking at it, and I've made some improvements, but
> not enough to really speed things up.
>
> First, the qr code in left_divide is giving quite a lot of help.
>
> My fixes will make fwd_model_parameters do a better job of caching (ie.
> not recalculating parameters unless strictly necessary)
>
> The problem comes mostly from
> 1. eidors_var_id (ie. the caching function)
> 2. the loops in fwd_solve_1st_order (the loops to reassemble)
>
> After some thinking, I'm still not sure how  to make big improvements
> in the timing of these functions, while still being general. For example,
> we want to have arbitrary channel gains, and arbitrary numbers of
> measurements per stimulation pattern.
>
> For the record, here are some timing results
>   FEM SIZE (elements)      53k         418k
> Patterns =     2  time =  1.63 time = 21.79
> Patterns =     5  time =  1.82 time = 20.38
> Patterns =    10  time =  1.74 time = 20.33
> Patterns =    20  time =  2.05 time = 20.32
> Patterns =    50  time =  1.97 time = 20.37
> Patterns =   100  time =  2.00 time = 20.84
> Patterns =   200  time =  2.09 time = 20.95
> Patterns =   500  time =  2.37 time = 22.51
> Patterns =  1000  time =  2.94 time = 24.13
> Patterns =  2000  time =  4.59 time = 25.89
> Patterns =  5000  time =  8.75 time = 34.14
> Patterns = 10000  time = 16.08 time = 48.61
>
> CODE:
> N_elec = 16; N_tries = 10;
> img = mk_image( ng_mk_cyl_models([2,1,.1],[N_elec,1],[0.1]), 1 );
> for N_pat =  [2,5,10,20,50,100,200,500,1000,2000,5000,10000]
>    time = 0;
>    for i=1:N_tries % Tries
>       patterns = ceil( N_elec*rand(N_pat,4) );
>       img.fwd_model.stimulation = stim_meas_list(patterns,16);
>       t= cputime;
>         vh= fwd_solve(img);
>       time = time + (cputime - t);
>    end
>    fprintf('Patterns = %5d  time = %4.2f \n', [N_pat, time/N_tries]);
> end
>
> --
> Andy Adler <ad...@sc...> +1-613-520-2600x8785
>
>
> On 2 November 2013 03:54, Alistair Boyle <bo...@sc...> wrote:
>> Andy, (and anyone else interested in getting EIDORS to run faster)
>>
>> We had a conversation last week about the issue I ran into when trying
>> to get my rather enormous stimulation/measurement patterns to run
>> (37k+ measurements). As I mentioned, I built a function to rearrange
>> and merge the stim/meas patterns (and undo this arranging to get the
>> original ordering once the measurements are calculated from the
>> forward model) to speed up my run times.
>>
>> I went from 45 minutes to 4 minutes for a forward solution. 3.25min
>> being the search for a better arrangement and 30sec for the caching
>> overhead on my large cache configuration. Then 3-15s for the actual
>> solution. Quite the improvement. Yeah.
>>
>> Part of this function addresses the comment in left_divide() (line 22)
>> % V= E\I;
>> % This takes MUCH longer when you have  more vectors in I,
>> %  even if they are repeated. There must be some way to simplify
>> %  this to speed it up. Matlab's sparse operators really should
>> %  do this for you.
>>
>> Along the way, I was digging in EIDORS' guts trying to figure out why
>> one optimization I thought aught to work: concatenating stim_pattern's
>> (stimulation(i).stim_pattern(:,1:2) = S;) wasn't working when the
>> fwd_solve() ran. The first cause can be found at
>> fwd_model_parameters() line 145
>> where the pp.QQ(:,i) = src; assignment assumes src is a single column
>> that needs to be assigned and similarly on the following line that
>> this should only result in a single set of meas_pattern selections for
>> the resulting measurement voltages.
>>
>> All columns of pp.QQ are then used in the left_divide() of
>> fwd_solve_1st_order() (line 45) (good) and then at line 54 there is
>> another for loop over the stim_patterns (one per stimulation assumed)
>> to select out the measurements.
>>
>> The for loops are really slow when the number of iterations gets big
>> and these are what I was cutting down to reduce run times. Allowing
>> multiple stim_pattern columns would reduce the number of iterations
>> further.
>>
>> These loops don't seem to be particularly problematic for the most
>> common electrode stim/meas patterns with 16 electrodes. For common
>> patterns my optimization routine, without the stim_pattern merging,
>> only catches a few opportunities with patterns such as the
>> dipole-dipole geophysics patterns. I am working with a fixed
>> dipole-dipole 32 electrode stim/meas pattern and adding electrode
>> movements on top of this. A combinatorial explosion results in a great
>> deal of redundant calculation and many iterations of for loops.
>>
>> So to answer your question from last week, there is no reason the
>> left_divide QR optimizations wouldn't be doing a great job here. Its
>> the for loops on the incoming and outgoing sides that were slowing
>> things down for me. We could have a go at cleaning those up sometime
>> but the immediate need for speed has abated somewhat... or at least
>> left room for another order of magnitude more complexity somewhere.
>>
>> Alistair