How often should I be creating ComputeBuffer?

Help
sebj
2012-04-28
2012-12-21
  • sebj
    sebj
    2012-04-28

    Hello,

    I have been using Cloo for a short while and am finding it fantastically intuitive - too much even because everything has worked so far, I have not been forced to code defensively and now my applications performance is becoming an issue! ;)

    I wondered if someone could give me a couple of tips on how to use Cloo efficiently to avoid things like unnecessarily creating objects?

    Specifically…
    1. How expensive is the creation of a ComputeBuffer? Should I attempt to reuse these as much as possible or will the time spent caching them be equivalent to creating and cleaning up?

    2. Should I dispose of all my Cloo objects myself or can I rely on the garbage collector to do it?

    3. Can/should I reuse ComputeEventList and ComputeCommandQueue objects?

    4. Most importantly, in my application I have this embarrassing method:

            protected void Execute(ComputeCommandQueue commands, ComputeEventList eventList, int globalWorkSize)
            {
                int count = globalWorkSize;
                int done = 0;
                do
                {
                    int workCount = Toolbox.Min(maxGlobalWorkSize, count - done);
                    commands.Execute(kernel, new long[] { done }, new long[] { workCount }, null, eventList);
                    System.Threading.Thread.Sleep(2);  //i hate myself so much....
                    done += workCount;
                } while (done < count);
            }
    

    As for very computationally intensive tasks, I reached the 5 second time-out on my NVidia GPU causing the OutOfResources exception, so I put the thread to sleep to give the GPU a chance to catch up, however, this is a. horrible and b. causing messages in my debug stream similar to this thread: https://sourceforge.net/projects/cloo/forums/forum/1048266/topic/4589811

    What is the correct way for my application to assess the status of the GPU, and avoid locking it?

    SJ

     
  • nythrix
    nythrix
    2012-04-29

    1. It is slightly faster to create buffers once and then read from/write to them. The drivers should cache the reads and writes into larger chunks which are executed at once. See also 3.
    2. As with all unmanaged resources it is recommended you clean up manually. If you ignore said recomendation everything still works ok except the following situation: OpenCL objects that are created from or share data with OpenGL objects must be disposed manually BEFORE the OpenGL context becomes unavailable, otherwise crashes may occur during cleanup/application exit.
    3 a) An application doesn't usually create and destroy many command queues. Reuse the existing ones.
    3 b) Do not use events unless absolutely necessary. From my experience their creation is very slow. This isn't Cloo specific. I've hit this issue with pure C programs as well.
    4. There is only one way I know of. Reduce the size of the computation which is executed at once. The events are being accumulated in the eventList. Consider passing eventList.AsReadOnly() to have commands.Execute not generate a new event or dont use eventList if you don't need it (pass null).

     
  • sebj
    sebj
    2012-04-29

    Hi nythrix,

    Thank you! I have removed the ComputeEventLists and am using only one CommandQueue for the lifetime of the objects that utilize the GPU and performance has significantly improved.

    I was using the event lists as they are needed if I want a blocking read, which I was under the impression was required to prevent my host application reading back data before all kernel executions have completed. My application still works though, is my understanding wrong or is it luck that I haven't read incomplete data yet?

    I will dispose of the objects manually; I also want to reuse buffers where possible so I have implemented some simple caching but am getting some odd behaviour.
    My caching works by checking the Count property of the buffer against the size of the array that is about to be written and recreates the buffer if it is smaller, however at one point in my application, I attempt to write data into a previously created buffer and it fails with AccessViolationException.
    What is odd is that it always occurs at exactly the same place (writing 14976 integers into a buffer previously loaded with 361488); this is on the 6th run of this set of Cloo executions, and at least one run before this one updates the buffer with smaller data than its capacity.
    Each run consists of a number of calls (creating and filling other buffers, kernel execution and reads), so it cannot be any other calls to Cloo methods which are causing it (since the calls don't change).

    There isn't something simple I have missed about the buffers is there? Like limitations on the sizes/packing/updates?

    Thanks again,
    SJ

     
  • sebj
    sebj
    2012-04-29

    (Additional info, I have been playing with it and from what I can see, if I create a buffer with a larger capacity than what I attempt to write to it, the write will sometimes fail with AccessViolationException; I have been unable to spot a pattern between the sizes I create the buffer with, and those I am writing which fail and those that don't.)

     
  • nythrix
    nythrix
    2012-04-29

    I was using the event lists as they are needed if I want a blocking read, which I was under the impression was required to prevent my host application reading back data before all kernel executions have completed. My application still works though, is my understanding wrong or is it luck that I haven't read incomplete data yet?

    Not required. All commands of a given queue will, unless otherwise specified, be executed in-order. This means that every command will finish before the next starts. Only when you need to fiddle around with the data  the queue works on you:
    1) issue commands.Finish() or
    2) use the

    bool blocking
    

    parameter of the read/write commands. Guess what it does ;)

    There isn't something simple I have missed about the buffers is there? Like limitations on the sizes/packing/updates?

    No limitations. Well, not in Cloo that is. I can't really help you without seeing some code…

     
  • sebj
    sebj
    2012-04-29

    Hi nythrix,

    Thank you, I am using commands.Finish() now.

    I have reduced the problem with the buffers down to the following example:

            static void Main(string args)
            {
                ComputePlatform platform;
                ComputeContextPropertyList properties;
                ComputeContext context;
                ComputeCommandQueue commands;

                platform = ComputePlatform.Platforms.FirstOrDefault();
                if (platform == null)
                    return;

                if (platform.Devices.Version < new Version(1, 1))
                    return;

                properties = new ComputeContextPropertyList(platform);
                context = new ComputeContext(platform.Devices, properties, null, IntPtr.Zero);
                commands = new ComputeCommandQueue(context, context.Devices, ComputeCommandQueueFlags.None);

                int d1 = new int;
                var buffer = new ComputeBuffer<int>(context, ComputeMemoryFlags.ReadWrite, 361488);
                commands.WriteToBuffer(d1, buffer, false, null);    //<- unhandled AccessViolationException every time
            }

    The difference between the capacity and the amount being written makes a difference, for example, if I create a buffer with a count of 20000 above, the code executes even though I'm still only writing 14976 integers into it. From what I have seen it appears as if a buffer with a capacity significantly greater than the amount being written will throw the exception but my tests are too limited to say this for sure. Something to do with pitch or packing was the only thing I could think of to explain it!

    This occurs on an NVidia GTX 480 with driver v. 296.10.

     
  • nythrix
    nythrix
    2012-04-29

    Aaaah yes.
    You need to use

    WriteToBuffer<T>(T[] source, ComputeBufferBase<T> destination, bool blocking, long sourceOffset, long destinationOffset, long region, IList<ComputeEventBase> events)
    

    for that. Cloo cannot know what to do if the array and buffer sizes don't match ;)

     
  • sebj
    sebj
    2012-04-30

    AH! It is all working much better now.

    Thank you nythrix for your help!