kernel not doing is job on ati card

Help
mux85
2010-05-19
2012-12-21
  • mux85

    mux85 - 2010-05-19

    i recently bought a new pc with a radeon hd5850 gpu and a core i7 920 cpu. before this i was using a laptop with a geforce 8800m gtx gpu and a core 2 duo cpu. i am developing an application that among the other things compute a background from a video. this background extraction operation worked very well on the geforce card. on the new pc i get right results when running on cpu (after that console problem). on the other hand when running it on the ati card i get almost no result, all the pixels in a frame have the same color. i am not using image2d or image3d but an array of structures composed by 3 bytes (r, g and b). the strange thing is that i always get g=r+1 and b=g+1. the base value of this progression increases slowly with time, that is in the first frame background i get 0-1-2, in the second 9-10-11, in the third 15-16-17, and so on. this makes absolutely no sense because there is no such operation in the kernel. i think that the problem may be linked in some way to memory initialization or communication (read\write on the gpu memory). any suggestions? thanks

     
  • nythrix

    nythrix - 2010-05-19

    Hmmm…
    The only issues I can think of here are memory misalignment and byte addressability. Have you triple checked your kernel fulfills these?

     
  • mux85

    mux85 - 2010-05-19

    1)memory misalignment
    ??????
    2)byte addressability
    i have #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable and get no build error. is this enough?

     
  • mux85

    mux85 - 2010-05-19

    I've done some try simplifying the kernel to find out the problem. it turns out that simply transferring input to the output works well. the problem comes up when i put some pixel info (from pixels at the same position in different frames) into a temporary fixed size array and then extract one of them (in the complete version i extract the median of this temporary array, but also extracting the first,last or middle element causes the problem). the array as size of 50 rgb element (150byte) i doubt this is too much. i can successfully use a 2x2 array of rgb pixels from which i extract the average to downsample the image to half size.
    what should i try?

     
  • nythrix

    nythrix - 2010-05-19

    I think the RGB struct should probably be aligned to (at least) 32 bits. Try attaching a dummy field to it. The name isn't important but the size of its type is (should be a single byte).
    struct RGB { byte R; byte G; byte B; byte alignment_fill; }
    Does this help?

     
  • mux85

    mux85 - 2010-05-19

    i can't do this. i use the struct also as input and the program passes the data as an array of byte and each pixel is represented as 3 consecutive bytes. using a 32 bit struct would cause a misalignment of the pixels. i've tried declaring the array with the aligned attribute and with a size of 64 instead of 50, this doesn't solve. also i've noticed that the problem comes up only when using the array inside a for loop (but i can't simply delete the loop, it's quite necessary)

     
  • mux85

    mux85 - 2010-05-19

    i added the attribute aligned also to type definition. the problem is still there. i think that the problem is somehow related to the loop but i don't understand how

     
  • nythrix

    nythrix - 2010-05-19

    Ah I see. I thought you're using an array of 24bit structs. An array of flat bytes doesn't suffer from memory alignment problems so ignore my previous post.

    I found this http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=128573 Are you running the latest SDK?

     
  • mux85

    mux85 - 2010-05-19

    anyway adding a byte to the struct doesn't help. i'll post the code that works on nvidia gpu and on cpu so that you can take a look at it

     
  • mux85

    mux85 - 2010-05-19

    #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
               
    #define MAX_FRAMES 64

    typedef struct
    {
    uchar r;
    uchar g;
    uchar b;
    } rgb;
               
    void swap(rgb * a, rgb * b)
    {
    rgb t=*a;
    *a=*b;
    *b=t;
    }

    float rgbToLum(rgb pix)
    {
    return 0.3f*pix.r+0.59f*pix.g+0.11f*pix.b;
    }
               
    void sort(rgb * v, uint n)
    {
    bool swapped=true;
        while(swapped)
    {
    swapped=false;
            for(uint i=0; i<n-1; i++)
            if(rgbToLum(v_)>rgbToLum(v))
            {
    swap(&v,&v);
                swapped=true;
            }
            n=n-1;
        }
    }

    rgb median(rgb * v, uint n)
    {
    sort(v, n);
    return v;
    }

    rgb average(rgb m)
    {
    rgb a;
    ushort r=0, g=0, b=0;
    for(uint i=0; i<2; i++)
    for(uint j=0; j<2; j++)
    {
    r += m.r;
    g += m.g;
    b += m.b;
    }
    a.r = r/4;
    a.g = g/4;
    a.b = b/4;
    return a;
    }

    kernel void BackgroundKernel( global read_only rgb * buf_in,
    global write_only rgb * buf_bg,  
    read_only uint n)
    {
    ushort i = get_global_id(0);
    ushort j = get_global_id(1);
    ushort h_bg = get_global_size(0);
    ushort w_bg = get_global_size(1);
    ushort h_in = h_bg*2;
    ushort w_in = w_bg*2;                                                               

    uint frame_size = w_in*h_in;
    uint pos_in = w_in*i*2+j*2;           
    uint pos_bg = w_bg*i+j;
                   
    //down-scaling of the frames
    rgb matr;   
    rgb temp;          
    for(uint k=0; k<n; k++)
    {                   
    matr = buf_in;
    matr = buf_in;
    matr = buf_in;
    matr = buf_in;
    temp = average(matr);
    pos_in += frame_size;
    }
                   
    //median of the frames used as bg               
    buf_bg = median(temp, n);
    }_

     
  • mux85

    mux85 - 2010-05-19

    yes i am using the latest sdk. cl_khr_byte_addressable_store is in the supported extensions and stream kernel analyzer don't find any errors

     
  • mux85

    mux85 - 2010-05-19

    for the moment i solved using a buffer passed as a parameter in place of the array declared inside the kernel. but in this way i must use global memory making all the program very very very inefficient (it becomes slower on gpu then on cpu) so i am still searching for a solution.

     
  • nythrix

    nythrix - 2010-05-19

    I couldn't find any visible problems within the code. I stil don't like the rgb but if aligning the struct doesn't make a difference I'm short of other ideas.
    You can use local memory instead of global. That should be cached somewhere near the shader. Just make sure to set its size with kernel.SetArgument( index, size, IntPtr.Zero);

     
  • mux85

    mux85 - 2010-05-19

    but to use local memory i need to know localworksize before executing the kernel. in general i don't specify the localworksize   when executing the kernel and let opencl decide which is the best value. furthermore localworksize  must be a divisor of globalworksize which in my case is not constant so i would have to find i way to find a good value which doesn't seem simple to me. I'll think about it anyway. and i have posted this problem on amd forum and sent them the complete project to help them find the problem because looking at the kernel they couldn't say anything useful.

     
  • nythrix

    nythrix - 2010-05-19

    I can think of a way. Launch an empty kernel and retrieve the recommended local-worksize. Then re-use it.

     
  • mux85

    mux85 - 2010-05-19

    mmm doesn't localworksize depend on the code of the kernel? in this case using an empty kernel would be useless

     
  • nythrix

    nythrix - 2010-05-19

    AFAIK localsize depends on the number of launched kernels (and the underlying HW architecture). However, it is just a tip.

     
  • mux85

    mux85 - 2010-05-20

    ok, thanks. i'll try that

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks