Download Latest Version Version 0.8.1 CUDA 12.x compatibility improvements _ minor fixes source code.tar.gz (630.3 kB)
Email in envelope

Get an email when there's a new version of CUDA API Wrappers

Home / v0.8.1
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-03-19 2.1 kB
Version 0.8.1 CUDA 12.x compatibility improvements _ minor fixes source code.tar.gz 2025-03-19 630.3 kB
Version 0.8.1 CUDA 12.x compatibility improvements _ minor fixes source code.zip 2025-03-19 758.6 kB
Totals: 3 Items   1.4 MB 1

Changes since v0.8.0:

CUDA 12.x compatibility

  • [#711] : Added preliminary information regarding Blackwell cards and micro-architecture *#701 : The --version-ident compilation option to NVRTC was dropped in CUDA 12.2; this is now respected by the wrappers and the option is not exposed for 12.2 and newer versions of CUDA.
  • [#702] : Fixed handling of --version-ident (we had a spacing issue)
  • [#635], [#701] : Added support for the --fdevice_syntax_only and --minimal options for NVRTC compilation

Changes to the unique_span & unique_region classes

  • [#703] : unique_span<T>::swap() now correctly swaps the deleters as well
  • [#713] : Move constructor and assignment operator of unique_region_t
  • [#702] : Fixed a typo when passing the --no-source-include option to NVRTC
  • [#719]: Removed redundant cast operations from unique_span<T>

Bug fixes

  • [#706] : Made context_t::flags() non-virtual
  • [#710] : Fixed the comparison operators for launch configurations
  • [#709] : Span-to-C-array copy no longer ignoring the designated stream
  • [#708] : Avoiding infinite recursion in link_t::add_file()

Build & installation

  • [#717]: Creating possibly-missing CUDAToolkit targets in installed config files, so that library targets can rely on them: nvfatbin, nvfatbin_static and cufilt.

Other changes

  • [#704] : Limited the clang warning flags (no -pedantic) to avoid warnings we can't resolve
  • [#705] : Made some methods of library_t be const
  • [#721] : device::proprties_t::max_in_flight_threads_on_device() now returns an unsigned (rather than unsigned long long)

Example programs

  • [#720] : Avoiding suspicious numeric conversions in the example programs (mostly inherited from NVIDIA, tsk tsk tsk)
  • [#722]: In simpleCudaGraphs, when using stream capture, now enqueueing the correct, existing event rather than an anonymous transient event
  • Now compiling the example programs with more warning flags on.
Source: README.md, updated 2025-03-19