BALSA / Tickets / #1 "CUDA MALLOC FAILED" running balsa

#1 "CUDA MALLOC FAILED" running balsa

Milestone: 1.0

Status: open

Owner: nobody

Labels: None

Updated: 2015-03-23

Created: 2015-01-23

Creator: robertthebob2

Private: No

Running Ubuntu 14.04 on a 64 GB server, tried both CUDA 6.0 and 6.5 versions of balsa, all cuda sample executables run without error. Have been running soap3-dp on cuda 5.5.
Balsa is version from http://www.l3-bioinfo.com/balsa/web/index.php?r=site%2Fregister

Preparation went without error. Then
time ~/Downloads/balsax/balsa/balsa_cuda_6.0/balsa pair reference/mm10.fa.index mm10.snpDB mm10.indelDB mm10-gene-region-list.index fastq/index39_CTATAC_L001-L002_R1_001.fastq fastq/index39_CTATAC_L001-L002_R2_001.fastq result -b 3
Launching BALSA version 1.0.

[BALSA] Analyzing read file for auto configuration.
[BALSA] Detected max. read length : 106.
[BALSA] Detected quality constant : +33.
...

I see this about 7 minutes into this alignment:
...
38 -> 38
39 -> 39
40 -> 40
[BALSA] Finish Score Recalibration Process
[BALSA] Score Recalibration processing time ( including load reads ): 300.7750 seconds

I see this about 7 minutes into an alignment:
...
38 -> 38
39 -> 39
40 -> 40
[BALSA] Finish Score Recalibration Process
[BALSA] Score Recalibration processing time ( including load reads ): 300.7750 seconds

[BALSA] Loading read files fastq/index39_CTATAC_L001-L002_R1_001.fastq and fastq/index39_CTATAC_L001-L002_R2_001.fastq
[BALSA] Loaded 12582912 short reads from the query file.
[BALSA] Elapsed time on host : 8.8898 seconds

[BALSA] Finished copying index into device (GPU).
[BALSA] Loading time : 0.6853 seconds

CUDA MALLOC FAILED .. an illegal memory access was encountered(77)

real 7m15.004s
user 8m24.920s
sys 1m17.336s

My hardware:
bob@homequad:~/NVIDIA_CUDA-6.0_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 690"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock rate: 1020 MHz (1.02 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 690"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
and so on....
Result = PASS

From top:
KiB Mem: 65924676 total, 24184836 used, 41739840 free, 317288 buffers
KiB Swap: 16773112 total, 0 used, 16773112 free. 20857260 cached Mem

Discussion

The "CUDA MALLOC FAILED" problem was eliminated by reducing the size of the snpDB input file, but there is a new problem. The alignment stops prematurely with the following messages about base quality. Adding the -I option doesn't help. The same sequence data is successfully mapped by both BBmap and Bowtie2, where there is no indication that base quality is less than very good.

38  -> 38
39  -> 39
40  -> 40
[BALSA] Finish Score Recalibration Process
[BALSA] Score Recalibration processing time ( including load reads ):  292.6144 seconds

[BALSA] Loading read files fastq/index39_CTATAC_L001-L002_R1_001.fastq and fastq/index39_CTATAC_L001-L002_R2_001.fastq
[BALSA] Loaded 12582912 short reads from the query file.
[BALSA] Elapsed time on host :   10.6545 seconds

[BALSA] Finished copying index into device (GPU).
[BALSA] Loading time :    1.2311 seconds

[BALSA] Error in Base Quality. Please check your input read file.
[BALSA] Error in Base Quality. Please check your input read file.
[BALSA] Error in Base Quality. Please check your input read file.
[BALSA] Error in Base Quality. Please check your input read file.
[BALSA] Error in Base Quality. Please check your input read file.
[BALSA] Error in Base Quality. Please check your input read file.
Command exited with non-zero status 1
488.45user 76.01system 5:09.76elapsed 182%CPU (0avgtext+0avgdata 29715560maxresident)k
1224inputs+360outputs (8major+6019741minor)pagefaults 0swaps

Chun - 2015-01-28

Please add a flag "-qc" instead of "-I"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

"CUDA MALLOC FAILED" running balsa

Integrated WGS and WES secondary analysis

Milestone

Searches

Help

#1 "CUDA MALLOC FAILED" running balsa

Discussion