Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2024-05-03 | 5.5 kB | |
Version 0.6.9_ Documentation update, unique_span_s, bug fixes, many small improvements source code.tar.gz | 2024-05-03 | 336.6 kB | |
Version 0.6.9_ Documentation update, unique_span_s, bug fixes, many small improvements source code.zip | 2024-05-03 | 446.2 kB | |
Totals: 3 Items | 788.4 kB | 0 |
(This is planned to be the last release before 0.7.0, which will add support for CUDA graphs.)
Changes since v0.6.8:
Memory allocation & copying-related changes
- [#606] Can now copy directly to and from containers with contiguous storage - without going through pointers or specifying the size
Owning typed and untyped memory: unique_span
and unique_region
- [#291] Added a
unique_span<T>
template class, combining the functionality ofcuda::unique_ptr
andcuda::span
(and being somewhat similar tostd::dynarray
which almost made it into C++14). Many CUDA programs want to represent both the ownership of allocated memory, and the range of that memory for actual use, in the same variable - without the on-the-fly reallocation behavior ofstd::vector
. This is now possible. Also implemented an untyped version of this, namedunique_region
. - [#617] Replaced
memory::external::mapped_region_t
withmemory::unique_region
- [#601] Added an
empty()
method tocuda::span
(to match that ofstd::span
- as it is no sometimes used) - [#603] Use
unique_span
instead of ourcuda::dynarray
(which had been anstd::vector
under the hood), in various places in the API, especially RTC - [#610] Return
unique_span
's from thecuda::rtc::program_output
class methods which allocated their own buffers: The methods for getting the compilation log, the cubin data, the PTX and the LTO IR.
More robust memory regions (memory::region_t
)
- [#592] Changed the approach used in v0.6.8 to bring managed regions and general regions in line with each other; now,
memory::managed::region_t
inheritsmemory::region_t
- [#594] Now using
memory::region_t
for mapped memory rather than a different, mapped-memory specific region class - [#602] Make
memory::region_t
more constexpr-friendly - [#604]
memory::region_t
's are now CUDA-independent, i.e. do not utilize any CUDA-specific definitions - [#605] Can now construct
const_region_t
's from rvalue references to regions - [#640] User no longer needs to know about
range_attribute_t
oradvice_t
- those are left todetail_
namespaces; also, fixed implementation of attribute setting for device-inspecific attributes - [#647] Mapped memory: Can now implicitly convert
memory::mapped::span_pair_t<T>
into a pair ofregion_t
's
Documentation & comments
- [#595] Correct the documentation for
supports_memory_pools
Launch configuration & launch config builder changes
- [#596] Corrected a check against the associated device in the kernel-setting method of the launch config builder
- [#619], [#618] Fixed launch configuration comparisons and now user defaulted comparison
- [#619] Fixed a bug in checking whether some CUDA-12-introduced launch config parameters are set
CUDA libraries and in-library, non-associated kernel support
- [#598] Corrected the API and implementation of
get_attribute()
andset_attribute()
for library kernels
Internal refactoring
- [#607] Split off a
detail/type_traits.hpp
fromtypes.hpp
- [#620]
context::current::scoped_override_t
now declared incurrent_context.hpp
- [#611] Reduced code repetition between
context_t
andprimary_context_t
. - [#622]
link::marshalled_options_t
andlink::option_t
are now in thedetail_
namespace - the user should typically never used - [#624] Now collecting the log-related link options into a sub-structure of
link::options_t
- [#625] Dropped
specify_default_load_caching_mode
fromlink::options_t
, in favor of using anstdx::optional
- [#626] Now using optional's instead of bespoke constructs in
pci_location_t
- [#628] Corrected the signature of
context::current::peer_to_peer
functions - [#630] Moved
program_base_t
into thedetail_
namespace - [#632] Move
rtc::marshalled_options_t
andrtc::marshal()
into thedetail_
subnamespace - users should not need to use this themselves - [#643] Moved
memory::pool::ipc::ptr_handle_t
out ofipc.hpp
up intotypes.hpp
(so thatmemory_pool.hpp
doesn't depend onipc.hpp
) - [#621] Renamed:
link::fallback_strategy_t
->link::fallback_strategy_for_binary_code_t
- [#600] Now adhering to underscore suffix for proxy class field names
Other changes
- [#599] An invalid file,
name_caching_program.hpp
, had snuck into our code - removed it - [#609] "Robustified" the buffers returned from
cuda::rtc::program_output
's various methods, so that they are all padded with an extra '\0' character past the end of the span's actual range. This is not necessarily and that data should hopefully not actually be reached, but - let's be - on the safe side.
- [#627] Dropped context-specification from host-memory allocator functions - it's not actually used
- [#629] Added a device ID field to the texture view object
- [#631] Dropped
examples/rtc_common.hpp
, which is no longer in which now - [#638] Dropped the
native_word_t
type - [#639] Simplified the memory access permissions code somewhat + some renaming (
access_permissions
->permissions
) - [#637] Devices and contexts no longer have
flag
-related members in their public interface; these are now just implementation details - [#645] Bug fix: Now using the correct
free()
function inmemory::managed::detail_::deleter