No rating

# Optimizing Max Kernel Operation in IDL

Atle Borsholm

I found this optimization question on the comp.lang.idl-pvwave, and decided to give it a try. The question was to implement an algorithm that replaces every element in a 2-D array with its neighborhood maximum value. In this case the neighborhood size was 101x101 (i.e. -50 to +50), and the array size was 3200x3248. The nested FOR loop approach looks like the following code snippet.

data = randomu(seed,3200,3248)
dim = size(data,/dimension)
nx = dim
ny = dim

t0 = tic('Nested FOR')
result2 = data
for i=0,nx-1 do begin
for j=0,ny-1 do begin
result2[i,j] = max(data[(i-50)>0:(i+50)<(nx-1),(j-50)>0:(j+50)<(ny-1)])
endfor
endfor
toc,t0

My first thought was to use the > operator which returns the maximum of 2 arguments. It operates on arrays, and in conjunction with the SHIFT function it serves to return the larger of 2 neighbors. The other trick here is that since we are looking for a 101x101 neighborhood maximum, we can use a combination of smaller neighborhood maxima as input in a structured way in order to achieve the exact 101x101 neighborhood size. The code that I ended up with after some trial and error was the following.

t0 = tic('Iterative >')
; Using SHIFT and > in an iterative way
tmp9 = shift(tmp3,3,0) > tmp3 > shift(tmp3,-3,0)
tmp27 = shift(tmp9,9,0) > tmp9 > shift(tmp9,-9,0)
tmp81 = shift(tmp27,27,0) > tmp27 > shift(temporary(tmp27),-27,0)
tmp99 = shift(tmp9,44,0) > temporary(tmp81) > shift(temporary(tmp9),-44,0)
tmp101 = shift(tmp3,49,0) > temporary(tmp99) > shift(temporary(tmp3),-49,0)

; Same for Y-dim
tmp3 = shift(tmp101,0,1) > tmp101 > shift(temporary(tmp101),0,-1)
tmp9 = shift(tmp3,0,3) > tmp3 > shift(tmp3,0,-3)
tmp27 = shift(tmp9,0,9) > tmp9 > shift(tmp9,0,-9)
tmp81 = shift(tmp27,0,27) > tmp27 > shift(temporary(tmp27),0,-27)
tmp99 = shift(tmp9,0,44) > temporary(tmp81) > shift(temporary(tmp9),0,-44)
tmp101 = shift(tmp3,0,49) > temporary(tmp99) > shift(temporary(tmp3),0,-49)

result1 = (temporary(tmp101))[50:50+nx-1,50:50+ny-1]
toc,t0

I didn’t say that optimized code always looks pretty, but the goal here is to run fast. Adding in some result comparison checking to make sure the results are equivalent.

print, array_equal(result1,result2) ? 'Results are matching' : 'SOMETHING went wrong'

Finally, here are the results, which yielded an impressive amount of speed-up, the execution time went from 189.4 seconds to 1.4 seconds, and the results are identical:

% Time elapsed Nested FOR: 189.39470 seconds.
% Time elapsed Iterative >: 1.4241931 seconds.
Results are matching