Device Functions
Inplace Functions
subroutine devf_zeros(dv_A)
Initializes the device variable to zero
Input:
type(devVar) dv_A
Output:
None
Description:
Sets device variable created by allocate_dv to 0.
Error handling:
None
Examples of use:
dv_A = allocate_dv('real', 4)
call devf_zeros(dv_A)
subroutine devf_conjugate(dv_A)
Inplace complex conjugate
Input:
type(devVar) dv_A
Output:
None
Description:
Sets device variable created by allocate_dv to its complex conjugate
Error handling:
In the case where the device variable is not of type cmplx4, an error message is displayed and execution stopped.
Examples of use:
complex array(4)
...
dv_A = allocate_dv('complex', 4)
call transfer_c4(array,dv_A,.true.)
call devf_conjugate(dv_A)
subroutine devf_partofcmplx(dv_A)
Input:
type(devVar) dv_A
Output:
None
Description:
Replaces the complex value of device Variable with it real or imaginary part, the variable stays in the same place and the result is complex
Error handling:
In the case where the device variable is not of type cmplx4, an error message is displayed and execution stopped.
Examples of use:
complex array(4)
...
dv_A = allocate_dv('complex', 4)
call transfer_c4(array,dv_A,.true.)
call devf_partofcmplx(dv_A)
subroutine devf_iscal(dv_A,a,b)
Input:
type(devVar) dv_A, integer a, integer b
Output:
None
Description:
Performs an inplace linear transform (integer) deviceVariable=a*deviceVariable+b
Error handling:
In the case where the device variable is not of type int4, an error message is displayed and execution stopped.
Examples of use:
integer array(4)
...
dv_A = allocate_dv('integer', 4)
call transfer_i4(array,dv_A,.true.)
call devf_iscal(dv_A,2,3)
subroutine devf_sscal(dv_A,a,b)
Input:
type(devVar) dv_A, real a, real b
Output:
None
Description:
Performs an inplace linear transform (real(4)) deviceVariable=a*deviceVariable+b
Error handling:
In the case where the device variable is not of type real, an error message is displayed and execution stopped.
Examples of use:
real array(4)
...
dv_A = allocate_dv('real', 4)
call transfer_r4(array,dv_A,.true.)
call devf_sscal(dv_A,2.0,3.0)
subroutine devf_cscal(dv_A,a,b)
Input:
type(devVar) dv_A, complex a, complex b
Output:
None
Description:
Performs an inplace linear transform (complex(4)) deviceVariable=a*deviceVariable+b
Error handling:
In the case where the device variable is not of type real, an error message is displayed and execution stopped.
Examples of use:
complex array(4)
...
dv_A = allocate_dv('complex', 4)
call transfer_c4(array,dv_A,.true.)
call devf_cscal(dv_A,2.0,3.0)
subroutine devf_cscalconj(dv_A,a,b)
Input:
type(devVar) dv_A, complex a, complex b
Output:
None
Description:
Performs an inplace linear conjugate transform (complex(4)) deviceVariable=a*conj(deviceVariable)+b
Error handling:
In the case where the device variable is not of type complex, an error message is displayed and execution stopped
Examples of use:
complex array(4)
...
dv_A = allocate_dv('complex', 4)
call transfer_c4(array,dv_A,.true.)
call devf_cscalconj(dv_A,2.0,3.0)
subroutine devf_hadamardf(dv_C, dv_A, dv_B, option)
Input:
type(devVar) dv_C, dv_A, dv_B; parameter option is optional (in most cases can be omitted)
Output:
None
Description:
performs point-wise product of dv_C=dv_A.*dv_B
Error handling:
If dv_A and dv_B are of different types, then an error message is displayed and execution stopped when dv_C can not be converted to the corresponding type.
Examples of use:
complex arrayA(4)
complex arrayB(4)
...
dv_C = allocate_dv('complex', 4)
dv_A = allocate_dv('complex', 4)
dv_B = allocate_dv('complex', 4)
call transfer_c4(arrayA,dv_A,.true.)
call transfer_c4(arrayB,dv_B,.true.)
call devf_hadamardf(dv_C, dv_A, dv_B)
subroutine devf_dividef(dv_C, dv_A, dv_B, option)
Input:
type(devVar) dv_C, dv_A, dv_B; parameter option is optional (in most cases can be omitted)
Output:
None
Description:
performs point-wise division of dv_C=dv_A./dv_B
Error handling:
If dv_A and dv_B are of different types, then an error message is displayed and execution stopped when dv_C can not be converted to the corresponding type.
Examples of use:
complex arrayA(4)
complex arrayB(4)
...
dv_C = allocate_dv('complex', 4)
dv_A = allocate_dv('complex', 4)
dv_B = allocate_dv('complex', 4)
call transfer_c4(arrayA,dv_A,.true.)
call transfer_c4(arrayB,dv_B,.true.)
call devf_dividef(dv_C, dv_A, dv_B)
subroutine devf_additionf(dv_C, dv_A, dv_B, option)
Input:
type(devVar) dv_C, dv_A, dv_B; parameter option is optional (in most cases can be omitted)
Output:
None
Description:
performs point-wise addition of dv_C=dv_A.+dv_B
Error handling:
If dv_A and dv_B are of different types, then an error message is displayed and execution stopped when dv_C can not be converted to the corresponding type.
Examples of use:
complex arrayA(4)
complex arrayB(4)
...
dv_C = allocate_dv('complex', 4)
dv_A = allocate_dv('complex', 4)
dv_B = allocate_dv('complex', 4)
call transfer_c4(arrayA,dv_A,.true.)
call transfer_c4(arrayB,dv_B,.true.)
call devf_additionf(dv_C, dv_A, dv_B)
subroutine devf_subtractionf(dv_C, dv_A, dv_B, option)
Input:
type(devVar) dv_C, dv_A, dv_B; parameter option is optional (in most cases can be omitted)
Output:
None
Description:
performs point-wise subtraction of dv_C=dv_A.-dv_B
Error handling:
If dv_A and dv_B are of different types, then an error message is displayed and execution stopped when dv_C can not be converted to the corresponding type.
Examples of use:
complex arrayA(4)
complex arrayB(4)
...
dv_C = allocate_dv('complex', 4)
dv_A = allocate_dv('complex', 4)
dv_B = allocate_dv('complex', 4)
call transfer_c4(arrayA,dv_A,.true.)
call transfer_c4(arrayB,dv_B,.true.)
call devf_subtractionf(dv_C, dv_A, dv_B)
Setting Functions
path_to_devObjects
Input:
None
Output:
None
Description:
Sets the path to the public commandline variable
Error handling:
Status checking if failed to get command line
Examples of use:
call path_to_devObjects
open_devObjects
Input:
None
Output:
None
Description:
Instantiate/load devObject, paths, CUBLAS, and external kernel functions into device memory
Error handling:
None
Examples of use:
call open_devObjects
close_devObjects
Input:
None
Output:
None
Description:
Deallocate devObject parameters from device memory, shutdown CUBLAS, and unload external functions
Error handling:
None
Examples of use:
call close_devObjects
Helper Functions
devObject_get_number_of_blocks(length,block_size)
Input:
integer length, block_size
Output:
integer
Description:
Returns the padded number of blocks for CUDA kernel calls given a length and a specified dimension in terms of block length
Error handling:
None
Examples of use:
integer block_size_x,block_size_y,block_size_z integer nx,ny,nz ... block_size_x=devObject_block_size_x_3d block_size_y=devObject_block_size_y_3d block_size_z=devObject_block_size_z_3d nblocks_x=devObject_get_number_of_blocks(nx,block_size_x) nblocks_y=devObject_get_number_of_blocks(ny,block_size_y) nblocks_z=devObject_get_number_of_blocks(nz,block_size_z)
devObject_get_type(chartype)
Input:
character chartype
Output:
integer(4)
Description:
Returns int4, real4, cmplx4 depending on if 'i', 'r', 'c' is passed as an argument
Error handling:
None
Examples of use:
integer(4) vartype
...
vartype=devObject_get_type('i')
devObject_get_relsize(vartype)
Input:
integer(4) vartype
Output:
integer(4)
Description:
Returns the size of type vartype (int4, real4, cmplx4) relative to type real
Error handling:
None
Examples of use:
integer(4) vartype, varsize ... vartype=cmplx4 varsize=devObject_get_relsize(vartype)
devObject_get_size(vartype)
Input:
integer(4) vartype
Output:
integer(4)
Description:
Returns the size of a variable type (int4, real4, cmplx4) in terms of bytes
Error handling:
None
Examples of use:
integer(4) varsize, vartype ... vartype=int4 varsize=devObject_get_size(vartype)
devObject_check1d(deviceVariable,i1,i2)
Input:
type(devVar) deviceVariable, integer i1, integer i2
Output:
None
Description:
Error checking for a 1D device variable
*Non-allocated status
*Index range i2-i1 is greater than or equal to the size of deviceVariable
*Treating 1D device variable as 2D, 3D
Error handling:
None
Examples of use:
type(devVar) dv_A
...
dv_A = allocate_dv('complex', 4)
devObject_check1d(dv_A,0,3)