Download Latest Version Version 0.8.1 CUDA 12.x compatibility improvements _ minor fixes source code.tar.gz (630.3 kB)
Email in envelope

Get an email when there's a new version of CUDA API Wrappers

Home / v0.6.9
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-05-03 5.5 kB
Version 0.6.9_ Documentation update, unique_span_s, bug fixes, many small improvements source code.tar.gz 2024-05-03 336.6 kB
Version 0.6.9_ Documentation update, unique_span_s, bug fixes, many small improvements source code.zip 2024-05-03 446.2 kB
Totals: 3 Items   788.4 kB 0

(This is planned to be the last release before 0.7.0, which will add support for CUDA graphs.)

Changes since v0.6.8:

  • [#606] Can now copy directly to and from containers with contiguous storage - without going through pointers or specifying the size

Owning typed and untyped memory: unique_span and unique_region

  • [#291] Added a unique_span<T> template class, combining the functionality of cuda::unique_ptr and cuda::span (and being somewhat similar to std::dynarray which almost made it into C++14). Many CUDA programs want to represent both the ownership of allocated memory, and the range of that memory for actual use, in the same variable - without the on-the-fly reallocation behavior of std::vector. This is now possible. Also implemented an untyped version of this, named unique_region.
  • [#617] Replaced memory::external::mapped_region_t with memory::unique_region
  • [#601] Added an empty() method to cuda::span (to match that of std::span - as it is no sometimes used)
  • [#603] Use unique_span instead of our cuda::dynarray (which had been an std::vector under the hood), in various places in the API, especially RTC
  • [#610] Return unique_span's from the cuda::rtc::program_output class methods which allocated their own buffers: The methods for getting the compilation log, the cubin data, the PTX and the LTO IR.

More robust memory regions (memory::region_t)

  • [#592] Changed the approach used in v0.6.8 to bring managed regions and general regions in line with each other; now, memory::managed::region_t inherits memory::region_t
  • [#594] Now using memory::region_t for mapped memory rather than a different, mapped-memory specific region class
  • [#602] Make memory::region_t more constexpr-friendly
  • [#604] memory::region_t's are now CUDA-independent, i.e. do not utilize any CUDA-specific definitions
  • [#605] Can now construct const_region_t's from rvalue references to regions
  • [#640] User no longer needs to know about range_attribute_t or advice_t - those are left to detail_ namespaces; also, fixed implementation of attribute setting for device-inspecific attributes
  • [#647] Mapped memory: Can now implicitly convert memory::mapped::span_pair_t<T> into a pair of region_t's

Documentation & comments

  • [#595] Correct the documentation for supports_memory_pools

Launch configuration & launch config builder changes

  • [#596] Corrected a check against the associated device in the kernel-setting method of the launch config builder
  • [#619], [#618] Fixed launch configuration comparisons and now user defaulted comparison
  • [#619] Fixed a bug in checking whether some CUDA-12-introduced launch config parameters are set

CUDA libraries and in-library, non-associated kernel support

  • [#598] Corrected the API and implementation of get_attribute() and set_attribute() for library kernels

Internal refactoring

  • [#607] Split off a detail/type_traits.hpp from types.hpp
  • [#620] context::current::scoped_override_t now declared in current_context.hpp
  • [#611] Reduced code repetition between context_t and primary_context_t.
  • [#622] link::marshalled_options_t and link::option_t are now in the detail_ namespace - the user should typically never used
  • [#624] Now collecting the log-related link options into a sub-structure of link::options_t
  • [#625] Dropped specify_default_load_caching_mode from link::options_t, in favor of using an stdx::optional
  • [#626] Now using optional's instead of bespoke constructs in pci_location_t
  • [#628] Corrected the signature of context::current::peer_to_peer functions
  • [#630] Moved program_base_t into the detail_ namespace
  • [#632] Move rtc::marshalled_options_t and rtc::marshal() into the detail_ subnamespace - users should not need to use this themselves
  • [#643] Moved memory::pool::ipc::ptr_handle_t out of ipc.hpp up into types.hpp (so that memory_pool.hpp doesn't depend on ipc.hpp)
  • [#621] Renamed: link::fallback_strategy_t -> link::fallback_strategy_for_binary_code_t
  • [#600] Now adhering to underscore suffix for proxy class field names

Other changes

  • [#599] An invalid file, name_caching_program.hpp, had snuck into our code - removed it
  • [#609] "Robustified" the buffers returned from cuda::rtc::program_output's various methods, so that they are all padded with an extra '\0' character past the end of the span's actual range. This is not necessarily and that data should hopefully not actually be reached, but - let's be
  • on the safe side.
  • [#627] Dropped context-specification from host-memory allocator functions - it's not actually used
  • [#629] Added a device ID field to the texture view object
  • [#631] Dropped examples/rtc_common.hpp, which is no longer in which now
  • [#638] Dropped the native_word_t type
  • [#639] Simplified the memory access permissions code somewhat + some renaming (access_permissions -> permissions)
  • [#637] Devices and contexts no longer have flag-related members in their public interface; these are now just implementation details
  • [#645] Bug fix: Now using the correct free() function in memory::managed::detail_::deleter
Source: README.md, updated 2024-05-03