Hi, i'm writing a program that is used to elaborate video. i have a kernel that i use once for every frame. all the objects (buffers, kernel, queues, etc…) and functions are contained in a class that i instantiate only once. this objects are instantiated in the constructor of that class. the first iteration of the program works (i get no exception), but in the second i get a NullReferenceException. I've tested many objects and all of them results to be null after the first iteration. Until this morning everything worked, after modifying the type and size of some of the kernel parameters i got this problem. what could it be? thanks
1) everything becomes null after calling the finish method.
2) deleting the finish call i get a fatalExecutionEngineError always after 40 iteration when i call setmemoryargument and ComputeEventBase.StatusNotify is called, nothing gets nulled.
3) deleting also the call to read right before the call to finish or deleting only the call to read and not the call to finish, the program ends normally.
Hi Nythrix, are you still there? :-)
The problem of everything becoming null was somehow caused by the use of an array of boolean as parameter of a kernel, switching back to an array of bytes resolved the problem.
Anyway i'm still having some very strange bugs. I get these errors only using videos with certain resolutions. For example with 640x480 or 400x320 the program works well, with 320x240 or 320x256 i get the bugs. Obviously the resolution influence the size of the array kernel parameter and i have verified that the way sizes are computed is correct. The error is different at every execution of the program (even with the same video), here's a list:
1) FatalExecutionEngineError with the following message: "The runtime has encountered a fatal error. The address of the error was at 0x6f4aba18, on thread 0x198. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack." It is cast at different point every time, for example it happens inside Tools.ParseVersionString when the string is initialized, or inside different methods of ComputeCommandQueue, or in the constructor of ComputeEvent.
2) InvalidContexComputeException or InvalidEventWaitListComputeException when i call ComputeCommandQueue.Execute
3) AccessViolationException inside ComputeCommandQueue.Execute where CL10.EnqueueNDRangeKernel is called.
4) Sometimes the program simply closes prematurely without any exception.
Each error happens after the kernel has been used many times (often, but not always, after 120 iterations, so 120 frames, or more, in fact the computation of a very short video may terminate normally sometimes) and the output is correct for the iterations before i get the error.
I think there might be some memory corruption, it may be caused by a driver bug but also by an incorrect use of marshaling (as suggested by the description of the FatalExecutionEngineError, which happens to be the more frequent error) inside Cloo.
Do you have any suggestion to resolve the problem? Let me know if you need more information. Thanks
I've tried videos with many different resolutions. i got many different errors in different places but most importantly i have arrived to a strange conclusion: the problems are present only with video under a certain resolution (in particular something less the 115000 pixels).
I'm more confused then before. Anyway i tried a thing: when i create the buffers i set the size to an arbitrary high value. The first time the program writes to the buffer it crashes saying "vshost.exe" has stopped working. Is it a problem to have a buffer greater than what i use? Which size do the WriteToBuffer method uses? The size of the buffer or the size of the array?
going back to cloo 0.8.0 didn't resolve but going back to 0.7.4 did. this probably means that the error as been introduced in version 0.8. i also noticed a notable performance boost going back to 0.7.4. you should inspect the 0.8 version deeply. let me know if you need help. bye
hi nythrix, did you take a look at this thread?
Yes. This is very probably a bug (or more) in the new code of 0.8.
There's a lot of info in this thread, so I'd like to start small and see if we can get on top of it. All of this mayhem could be caused by erroneous states piling one over another.
1) Are you using any new CommandQueue methods introduced with 0.8 (see Changelog.txt, section 0.8.0, "New Functionality")?
2) Can you specify OpenCL platform and drivers?
I guess there's something wrong with Cloo auto-tracking stuff and/or callbacks from OpenCL. That would explain errors popping out at different places.
All buffer read/write/copy methods use element indexing NOT number of bytes (contrary to native OpenCL). It is safer because a ComputeBuffer knows the type of its elements and can therefore compute the proper sizes/offsets. Methods operating on images also work with element/pixel indexing (same as native OpenCL).
I checked Cloo documentation and you're right. It's not very clear on this.
1) yes i used some of the new methods
2) the platform is OpenCL 1.1 ATI-Stream-v2.2 (302) FULL_PROFILE. I always use latest catalyst so at the time a was having those bugs i was using 10.9. I've an ati 5850 gpu and intel i7 920 cpu.
let me know if you need anything else.
Cannot reproduce. Part of the problem is I don't have an ATI card. Can you post a small example (if possible/available/doable) or at a least a more precise description?
Marshaling is a big headache. I'm also having problems with resource management (especially on program exit or general cleanup) and I'm running out of ideas.
I'm putting together a version that should output debugging info here and there. Hopefully, it'll help us when bug reporting.
ok i'll try to put something together to demonstrate the bug. but in effect it maybe caused by the ATI driver, so you will not be able to reproduce it anyway. it may also be caused by some interactions with other libraries i'm using, so if i write a simplified version maybe the bug will go away. it has appeared and disappeared many times during the development of my project. Lately i experienced this problem also with CLoo 0.7.4 but it has gone away after some other modification to the code.
That'll be great. Yeah, I don't have the GPU but I still can test on the CPU.
0.8 introduced a lot of new code under the hood. I suspected this was the cause for the errors. It is a bit strange that 0.7 is also affected since that one was a lot simpler (with little space for bugs).
I don't know what to say. I'm not able to reproduce the bug anymore. Even using CLoo 0.8.1 and its new methods. Maybe the bug has simply gone away with some new ati driver. I will try again when i have more time
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.