-
Adjust over-relaxation factor as a function of problem size. Remove the second array, and update in-place. Factor branches and indexing out of the inner loop, instead precompute a list of pixels inside the brush mask and what neighbors they have. Switch from scalar double to simd float. Speedup (of the laplace part, excluding gamma correction): 10x-20x, depending on brush size.
8602fdbe