Kaldi / Discussion / Help: "out of memory" returned from 'cudaGetLastError()'

JTDamaja - 2015-05-25

Hello everyone,

I've been experimenting lately with GPU enabled nnet2 trainig. I've tried train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig was terminated with log:

LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M, free/total:0.262352 version 3.0
LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used: 1048576 bytes.
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'

[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::CuVectorBase<double>::Scale(double)
kaldi::nnet2::NonlinearComponent::Scale(float)
kaldi::nnet2::Nnet::ZeroStats()
nnet-train-simple(main+0x552) [0x65f08c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fcfaa3f2ec5]
nnet-train-simple() [0x65ea79]</double>

Accounting: time=2 threads=1

Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2 seconds

Could this be due to GPU overheating?

Thank all.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan "yenda" Trmal - 2015-05-25
  
  Why did you have so much memory used? -- see the line
  The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M,
  total:2047M, free/total:0.262352 version 3.0
  
  Did you have some other cuda enabled program running? If yes, that might be
  the problem.
  y.
  
  On Mon, May 25, 2015 at 5:11 AM, JTDamaja jtdamaja@users.sf.net wrote:
  
  Hello everyone,
  
  I've been experimenting lately with GPU enabled nnet2 trainig. I've tried
  train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig
  was terminated with log:
  
  LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active
  GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M,
  free/total:0.262352 version 3.0
  LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used:
  1048576 bytes.
  ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
  of memory" returned from 'cudaGetLastError()'
  ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
  of memory" returned from 'cudaGetLastError()'
  
  [stack trace: ]
  kaldi::KaldiGetStackTrace()
  kaldi::KaldiErrorMessage::~KaldiErrorMessage()
  kaldi::CuVectorBase<double>::Scale(double)
  kaldi::nnet2::NonlinearComponent::Scale(float)
  kaldi::nnet2::Nnet::ZeroStats()
  nnet-train-simple(main+0x552) [0x65f08c]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fcfaa3f2ec5]
  nnet-train-simple() [0x65ea79]</double>
  
  Accounting: time=2 threads=1
  
  Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2
  
  seconds
  
  Could this be due to GPU overheating?
  
  Thank all.
  
  "out of memory" returned from 'cudaGetLastError()'
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/kaldi/discussion/1355348/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - JTDamaja - 2015-05-26
    
    Thanks for your reply.
    No, I don`t have enabled program running, except nnet training.
    At rest, on GPU used only 250 MB of memory.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Povey - 2015-05-26
      
      If the error is not reproducible then it is probably because of
      overheating. You can restart training from the 253'th iteration by
      using the --stage option to the script.
      Dan
      
      On Tue, May 26, 2015 at 4:21 AM, JTDamaja jtdamaja@users.sf.net wrote:
      
      Thanks for your reply.
      No, I don`t have enabled program running, except nnet training.
      At rest, on GPU used only 250 MB of memory.
      
      "out of memory" returned from 'cudaGetLastError()'
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - JTDamaja - 2015-05-29
        
        Thanks for your reply.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

"out of memory" returned from 'cudaGetLastError()'

Forums

Help

"out of memory" returned from 'cudaGetLastError()' document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Accounting: time=2 threads=1

Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2 seconds

Accounting: time=2 threads=1

Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2

"out of memory" returned from 'cudaGetLastError()'