I've been experimenting lately with GPU enabled nnet2 trainig. I've tried train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig was terminated with log:
LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M, free/total:0.262352 version 3.0
LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used: 1048576 bytes.
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
Why did you have so much memory used? -- see the line
The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M,
total:2047M, free/total:0.262352 version 3.0
Did you have some other cuda enabled program running? If yes, that might be
the problem.
y.
I've been experimenting lately with GPU enabled nnet2 trainig. I've tried
train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig
was terminated with log:
LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active
GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M,
free/total:0.262352 version 3.0
LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used:
1048576 bytes.
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
of memory" returned from 'cudaGetLastError()'
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
of memory" returned from 'cudaGetLastError()'
If the error is not reproducible then it is probably because of
overheating. You can restart training from the 253'th iteration by
using the --stage option to the script.
Dan
Hello everyone,
I've been experimenting lately with GPU enabled nnet2 trainig. I've tried train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig was terminated with log:
LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M, free/total:0.262352 version 3.0
LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used: 1048576 bytes.
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::CuVectorBase<double>::Scale(double)
kaldi::nnet2::NonlinearComponent::Scale(float)
kaldi::nnet2::Nnet::ZeroStats()
nnet-train-simple(main+0x552) [0x65f08c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fcfaa3f2ec5]
nnet-train-simple() [0x65ea79]</double>
Accounting: time=2 threads=1
Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2 seconds
Could this be due to GPU overheating?
Thank all.
Why did you have so much memory used? -- see the line
The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M,
total:2047M, free/total:0.262352 version 3.0
Did you have some other cuda enabled program running? If yes, that might be
the problem.
y.
On Mon, May 25, 2015 at 5:11 AM, JTDamaja jtdamaja@users.sf.net wrote:
Thanks for your reply.
No, I don`t have enabled program running, except nnet training.
At rest, on GPU used only 250 MB of memory.
If the error is not reproducible then it is probably because of
overheating. You can restart training from the 253'th iteration by
using the --stage option to the script.
Dan
On Tue, May 26, 2015 at 4:21 AM, JTDamaja jtdamaja@users.sf.net wrote:
Thanks for your reply.