Welcome to Open Discussion

  • Welcome to Open Discussion

  • Hi,

    I am very interested in this and would like to use it in my personal project. I can't contribute money, but i can contribute some of my time, so if you need anything done for you, let me know.

    I have downloaded the files and have tried the sample ImageCrossFade. However, OpenCLImageTest fails for me. It gives me an error code MEM_OBJECT_ALLOCATION_FAILURE when it's doing OpenCL.EnqueueCopyImageToBuffer. I have the following device info:

    {[448177536, Name: GeForce GTX 580
    Vendor: NVIDIA Corporation
    VendorID: 4318
    DriverVersion: 306.97
    Profile: FULL_PROFILE
    Version: OpenCL 1.1 CUDA
    Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
    DeviceType: GPU
    MaxComputeUnits: 16
    MaxWorkItemDimensions: 3
    MaxWorkItemSizes: 0=1024 1=1024 2=64
    MaxWorkGroupSize: 1024
    PreferredVectorWidthChar: 1
    PreferredVectorWidthShort: 1
    PreferredVectorWidthInt: 1
    PreferredVectorWidthLong: 1
    PreferredVectorWidthFloat: 1
    PreferredVectorWidthDouble: 1
    MaxClockFrequency: 1544
    AddressBits: 32
    MaxMemAllocSize: 402571264
    ImageSupport: True
    MaxReadImageArgs: 128
    MaxWriteImageArgs: 8
    Image2DMaxWidth: 32768
    Image2DMaxHeight: 32768
    Image3DMaxWidth: 2048
    Image3DMaxHeight: 2048
    Image3DMaxDepth: 2048
    MaxSamplers: 16
    MaxParameterSize: 4352
    MemBaseAddrAlign: 4096
    MinDataTypeAlignSize: 128
    SingleFPConfig: 63
    GlobalMemCacheType: READ_WRITE_CACHE
    GlobalMemCacheLineSize: 128
    GlobalMemCacheSize: 262144
    GlobalMemSize: 1610285056
    MaxConstantBufferSize: 65536
    MaxConstantArgs: 9
    LocalMemType: LOCAL
    LocalMemSize: 49152
    ErrorCorrectionSupport: False
    ProfilingTimerResolution: 1000
    EndianLittle: True
    Available: True
    CompilerAvailable: True
    ExecutionCapabilities: 1
    QueueProperties: 3

    I am investigating, and will post the reason for this test failure on this forum.

  • After much experimentation, I found out what my GLX580 was complaining about. Of course, the error code and error message you get from running the OpenCLImageTest example provides no clue whatsoever as to what was wrong. I tried several things: I read in the NVidia forum that older versions of the drivers work better, so I tried downgrading to earlier versions: it still failed. I downloaded the AMD OpenCL driver, and even though I don't have an AMD graphics card, their driver works with the OpenCLImageTest example running on my Intel i7 CPU. I also tried revising the code after noticing that the only difference between the scaling example and this one is the fact that whereas the scaling example uses unstructured OpenCL buffers all throughout, this OpenCLImageTest example uses OpenCL images.

    The problem turned out to be the kernel itself in the example. I tried several changes, focusing on the possibility that the kernel is writing beyond the memory bounds of the output image. The revised code below works with the GLX580. This card (or its driver) just doesn't like writing beyond the bounds of an image.

    kernel void FilterImage( float inputLeft,
                             float inputTop,
                             float inputWidth,
                             float inputHeight,
                             float outputLeft,
                             float outputTop,
                             float outputWidth,
                             float outputHeight,
                             read_only image2d_t input,
                             write_only image2d_t output,
                             sampler_t sampler )
    size_t x = get_global_id(0);
    size_t y = get_global_id(1);
    int width = get_global_size(0);
    int height = get_global_size(1);

    float nX = x/(float)width;
    float nY = y/(float)height;
    float inputX = inputLeft+inputWidth*nX;
    float inputY = inputTop+inputHeight*nY;
    float outputX = outputLeft+outputWidth*x;
    float outputY = outputTop+outputHeight*y;
    uint4 rgba = read_imageui(input, sampler, (float2)(inputX,inputY));

  • What I still don't understand in the kernel are the following two lines:

    float outputX = outputLeft+outputWidth*x;
    float outputY = outputTop+outputHeight*y;

    This calculates the location of the output pixels for the work-item (x, y). Why does this even work? Clearly, multiplying the output width (and height) with the x id of the work-item gives a result which must go beyond the bounds of the output image. If you are reading this and you understand it, please explain.