Running Ubuntu 14.04 on a 64 GB server, tried both CUDA 6.0 and 6.5 versions of balsa, all cuda sample executables run without error. Have been running soap3-dp on cuda 5.5.
Balsa is version from http://www.l3-bioinfo.com/balsa/web/index.php?r=site%2Fregister
Preparation went without error. Then
time ~/Downloads/balsax/balsa/balsa_cuda_6.0/balsa pair reference/mm10.fa.index mm10.snpDB mm10.indelDB mm10-gene-region-list.index fastq/index39_CTATAC_L001-L002_R1_001.fastq fastq/index39_CTATAC_L001-L002_R2_001.fastq result -b 3
Launching BALSA version 1.0.
[BALSA] Analyzing read file for auto configuration.
[BALSA] Detected max. read length : 106.
[BALSA] Detected quality constant : +33.
...
I see this about 7 minutes into this alignment:
...
38 -> 38
39 -> 39
40 -> 40
[BALSA] Finish Score Recalibration Process
[BALSA] Score Recalibration processing time ( including load reads ): 300.7750 seconds
I see this about 7 minutes into an alignment:
...
38 -> 38
39 -> 39
40 -> 40
[BALSA] Finish Score Recalibration Process
[BALSA] Score Recalibration processing time ( including load reads ): 300.7750 seconds
[BALSA] Loading read files fastq/index39_CTATAC_L001-L002_R1_001.fastq and fastq/index39_CTATAC_L001-L002_R2_001.fastq
[BALSA] Loaded 12582912 short reads from the query file.
[BALSA] Elapsed time on host : 8.8898 seconds
[BALSA] Finished copying index into device (GPU).
[BALSA] Loading time : 0.6853 seconds
CUDA MALLOC FAILED .. an illegal memory access was encountered(77)
real 7m15.004s
user 8m24.920s
sys 1m17.336s
My hardware:
bob@homequad:~/NVIDIA_CUDA-6.0_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "GeForce GTX 690"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock rate: 1020 MHz (1.02 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "GeForce GTX 690"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
and so on....
Result = PASS
From top:
KiB Mem: 65924676 total, 24184836 used, 41739840 free, 317288 buffers
KiB Swap: 16773112 total, 0 used, 16773112 free. 20857260 cached Mem
The "CUDA MALLOC FAILED" problem was eliminated by reducing the size of the snpDB input file, but there is a new problem. The alignment stops prematurely with the following messages about base quality. Adding the -I option doesn't help. The same sequence data is successfully mapped by both BBmap and Bowtie2, where there is no indication that base quality is less than very good.
Please add a flag "-qc" instead of "-I"