Kernel execution failure

Help
tmac7
2011-04-05
2012-12-21
  • tmac7
    tmac7
    2011-04-05

    I am running into problems executing some of my kernels on an AMD HD 6950 card.  They run correctly on both Intel and ATI Stream CPU devices, but fail silently on the GPU once they reach a certain level of complexity (number of arguments or variables).  I have hooked up a ComputeContextNotifier, but it does not get called.

    Is there a way to use events in Cloo to get a notified when kernel execution fails?

    I am running Cloo 0.9alpha2, ATI Stream SDK 2.3 installed, 11-2_vista64_win7_dd_ccc_ocl drivers dated 3/17/2011

     
  • nythrix
    nythrix
    2011-04-06

    I haven't been able to get a context notifier callback either. The spec never points out when it should execute. It only states that it "may".
    Try the following:

    ComputeEventList eventList = new ComputeEventList();
    commands.Execute(kernel, null, new long[] { count }, null, eventList);
    eventList.Last.Aborted += new ComputeCommandStatusChanged(CommandAborted);
    //...
    void CommandAborted(object sender, ComputeCommandStatusArgs args)
    {
        // behave accordingly
    }
    

    This should do the work.

    Please note that under certain rare circumstances "CommandAborted" might not get called. This is a problem inside Cloo that I'm aware of. I'm currently looking for a solution.

     
  • tmac7
    tmac7
    2011-04-06

    Thanks for the reply.  I will try that.  How & when should I unhook the event handler to avoid a memory leak from the events? I'm thinking that I should hook up a handler for all possible events and unhook all of them for that ComputeEvent when any of the events occur.

    BTW, I once got a context notifier callback when using the NVIDIA 1.1 beta (which I have since abandoned).  I believe I received an invalid argument message.  I later realized that one of the buffer arguments was probably corrupted.

     
  • nythrix
    nythrix
    2011-04-07

    Personally, I don't bother unhooking event handlers unless I constantly create them in a loop or so.

    Without knowing your code, it's hard to recommend anything. It should be safe to unhook it after the associated command has finished/failed (either one can occur but not both).

    Generally speaking, you should avoid creating a lot of OpenCL events. They consume resources and slow down execution quite a bit. I've tested this myself with a

    Stopwatch
    

    and a very simple loop of read/write operations (Nvidia 9600GT, OpenCL 1.0 drivers).

    That said, a small amount of the code regarding the ComputeEvents will *probably* change in the next version. It's unfortunate but the current model doesn't guarantee the callback under all circumstances. Long story short, if the OpenCL callback is fired before you hook the handler, you'll miss it.