From: Noah R. <noa...@gm...> - 2018-12-21 22:10:28
|
Hi, I figured it is about time I give pocl a try with my physics simulation code. I've been using Intel's OpenCL library for computing on Cray systems with Xeon CPU. Today I built pocl (today's git master ) on a Cray XC40 using clang+llvm-7.0.0-x86_64-linux-sles12.3 I was able to run a simple Hello World kernel as well as clinfo. When running my physics application at necessary scale, I'm seeing about 0.2% of clBuildProgram fail by SEGFAULT, all with a common stack signature. (pasted below) I'm not sure why this would be so intermittent. I've tried reducing to one process per compute node, so only one clBuildProgram would be executing on that node at a time. In this testing, that leaves 90 processes doing the same program compile simultaneously in the same working directory. Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Google-ing for similar stack language, I find one mention that may well be the same bug: https://www.mail-archive.com/llv...@li.../msg28677.html https://bugs.llvm.org/show_bug.cgi?id=39833 "poclcc" is successful with the same OpenCL kernel source. I assume I'd need to run it hundreds of times, perhaps in parallel to potentially trigger the same bug. Any advice would be appreciated. Now that I've thought through the situation, I think I should probably create an account and contribute to the LLVM bug 39833 discussion with a me-too. Cheers, Noah Reddell WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:489 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07 clang::FrontendAction::Execute()@0x2aaaabf1c106 clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328 clang::DoPrintPreprocessedInput(clang::Preprocessor&, llvm::raw_ostream*, clang::PreprocessorOutputOptions const&)@0x2aaaabf51226 clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc clang::Preprocessor::EnterSourceFile(clang::FileID, clang::DirectoryLookup const*, clang::SourceLocation)@0x2aaaacbf7407 (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID)@0x2aaaabf5212d clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) const@0x2aaaacc4e00e clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) const@0x2aaaacc4e43a *ComputeLineNumbers*(clang::DiagnosticsEngine&, clang::SrcMgr::ContentCache*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, clang::SourceManager const&, bool&)@0x2aaaacc4e683 |