Addressing mode of sampler objects for image types in OpenCL reviewed


When working with image objects in OpenCL (e.g. image2d_t) special image functions like read_imagef must be used for accessing the individual pixels. One of the parameters, beside the image object and the coordinates, is a sampler_t object. This object defines the details about the image access. One detail is what should happen if coordinates are accessed which are out of the image boundaries. Several addressing modes can be used and this article discussed the difference on a small example.

If you are familiar with OpenCV's border types, you know that there are many ways to handle out of border accesses. Unfortunately, in OpenCL 2.0 the possible modes are much more limited. What is more, some of the addressing modes are limited to normalized coordinates \(\left([0;1[\right)\) only. The following table lists the addressing modes I tested here (c.f. the documentation for the full description).

Addressing mode Example Normalized only? OpenCV's counterpart
CLK_ADDRESS_MIRRORED_REPEAT cba|abcd|dcb yes cv::BORDER_REFLECT
CLK_ADDRESS_REPEAT bcd|abcd|abc yes cv::BORDER_WRAP
CLK_ADDRESS_CLAMP_TO_EDGE aaa|abcd|ddd no cv::BORDER_REPLICATE
CLK_ADDRESS_CLAMP 000|abcd|000 no cv::BORDER_CONSTANT

The example column is already the result from my test program since the official documentation lacks an example here. In my test project (GitHub) I used a 1D image object with ten values incrementing from 1 to 10, i.e. a vector defined as


std::vector<float> dataIn = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

For simplicity, I used a global size of exactly 1 work item and the following kernel code:


kernel void sampler_test(read_only image1d_t imgIn, write_only image1d_t imgOut, const int sizeIn, sampler_t sampler, global float* coords)
{
    int j = 0;                              // Index to access the output image
    for (int i = - sizeIn/2; i < sizeIn + sizeIn/2 + 1; ++i)
    {
        float coordIn = i / (float)sizeIn;  // Normalized coordinates in the range [-0.5;1.5] in steps of 0.1
        float color = read_imagef(imgIn, sampler, coordIn).x;

        coords[j] = coordIn;
        write_imagef(imgOut, j, color);     // The accessed color is just written to the output image
        j++;
    }
}

So, I access the coordinates in the range \([-0.5;1.5]\) in steps of 0.1 leading to 21 values in total. This results in five accesses outside the image on the left side, one access exactly at 1.0 and five more outside accesses on the right side. The fact that the access to 1.0 results already in a mode dependent value is indeed an indicator for the \([0;1[\) range (note that the right side is not included) of the normalized coordinates. The output of the test program running on an AMD R9 290 GPU device and on an Intel CPU device together with the differences:

            
            Platforms
                    0: Intel(R) OpenCL - OpenCL 1.2
                    1: Experimental OpenCL 2.1 CPU Only Platform - OpenCL 2.1
                    2: AMD Accelerated Parallel Processing - OpenCL 2.0 AMD-APP (2348.3)
            Choose plattform: 2
            Devices on plattform AMD Accelerated Parallel Processing
                    0: Hawaii - OpenCL 2.0 AMD-APP (2348.3)
                    1: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz - OpenCL 1.2 AMD-APP (2348.3)
            Choose device: 0
              coords | MIRRORED_REPEAT | REPEAT | CLAMP_TO_EDGE | CLAMP
            ---------|-----------------|--------|---------------|------
            -0.50000 |               5 |      6 |             1 |     0
            -0.40000 |               4 |      7 |             1 |     0
            -0.30000 |               3 |      8 |             1 |     0
            -0.20000 |               2 |      9 |             1 |     0
            -0.10000 |               1 |     10 |             1 |     0
             0.00000 |               1 |      1 |             1 |     1
             0.10000 |               2 |      2 |             2 |     2
             0.20000 |               3 |      3 |             3 |     3
             0.30000 |               4 |      4 |             4 |     4
             0.40000 |               5 |      5 |             5 |     5
             0.50000 |               6 |      6 |             6 |     6
             0.60000 |               7 |      7 |             7 |     7
             0.70000 |               8 |      8 |             8 |     8
             0.80000 |               9 |      9 |             9 |     9
             0.90000 |              10 |     10 |            10 |    10
             1.00000 |              10 |      1 |            10 |     0
             1.10000 |               9 |      2 |            10 |     0
             1.20000 |               8 |      3 |            10 |     0
             1.30000 |               7 |      4 |            10 |     0
             1.40000 |               6 |      5 |            10 |     0
             1.50000 |               5 |      6 |            10 |     0
            
            
            
            Platforms
                    0: Intel(R) OpenCL - OpenCL 1.2
                    1: Experimental OpenCL 2.1 CPU Only Platform - OpenCL 2.1
                    2: AMD Accelerated Parallel Processing - OpenCL 2.0 AMD-APP (2348.3)
            Choose plattform: 1
            Devices on plattform Experimental OpenCL 2.1 CPU Only Platform
                    0: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz - OpenCL 2.1 (Build 18)
            Choose device: 0
              coords | MIRRORED_REPEAT | REPEAT | CLAMP_TO_EDGE | CLAMP
            ---------|-----------------|--------|---------------|------
            -0.50000 |               6 |      6 |             1 |     0
            -0.40000 |               5 |      7 |             1 |     0
            -0.30000 |               4 |      8 |             1 |     0
            -0.20000 |               3 |      9 |             1 |     0
            -0.10000 |               2 |     10 |             1 |     0
             0.00000 |               1 |      1 |             1 |     1
             0.10000 |               2 |      2 |             2 |     2
             0.20000 |               3 |      3 |             3 |     3
             0.30000 |               4 |      4 |             4 |     4
             0.40000 |               5 |      5 |             5 |     5
             0.50000 |               6 |      6 |             6 |     6
             0.60000 |               7 |      7 |             7 |     7
             0.70000 |               8 |      8 |             8 |     8
             0.80000 |               9 |      9 |             9 |     9
             0.90000 |              10 |     10 |            10 |    10
             1.00000 |              10 |      1 |            10 |     0
             1.10000 |              10 |      2 |            10 |     0
             1.20000 |               8 |      3 |            10 |     0
             1.30000 |               8 |      3 |            10 |     0
             1.40000 |               7 |      4 |            10 |     0
             1.50000 |               6 |      6 |            10 |     0
            
            

Interestingly, the two devices produce slightly different values for the CLK_ADDRESS_MIRRORED_REPEAT and CLK_ADDRESS_REPEAT modes. I am not sure why this happens, but the AMD output seems more reasonable to me.