Skip to content
  • Yongjia Zhang's avatar
    This is a better way to accomplish the box-blur cl operation by using ocl's · e7e640f3
    Yongjia Zhang authored
    
    local memory from the opencv source code. It use the local shared memory to
    reduce global memory access, which significantly reduces the kernel's processing
    time by 70 percent compared to the original one. Because of the barriers and
    local worksize limitation, processing with a radius larger than 110 becomes
    slower than original algorithm, so I keep the original kernels in order to deal
    with box-blur with radius larger than 110.
    All the tests are based on Intel Beginet and Intel IvyBridge CPU and GPU.
    
    v2:add kernel attribute to restrict the local size to (256,1,1).
    
    Signed-off-by: default avatarYongjia <Zhang&lt;yongjia.zhang@intel.com>
    e7e640f3