Menu

"out of memory" returned from 'cudaGetLastError()'

Help
JTDamaja
2015-05-25
2015-05-29
  • JTDamaja

    JTDamaja - 2015-05-25

    Hello everyone,

    I've been experimenting lately with GPU enabled nnet2 trainig. I've tried train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig was terminated with log:

    LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M, free/total:0.262352 version 3.0
    LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used: 1048576 bytes.
    ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'
    ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out of memory" returned from 'cudaGetLastError()'

    [stack trace: ]
    kaldi::KaldiGetStackTrace()
    kaldi::KaldiErrorMessage::~KaldiErrorMessage()
    kaldi::CuVectorBase<double>::Scale(double)
    kaldi::nnet2::NonlinearComponent::Scale(float)
    kaldi::nnet2::Nnet::ZeroStats()
    nnet-train-simple(main+0x552) [0x65f08c]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fcfaa3f2ec5]
    nnet-train-simple() [0x65ea79]</double>

    Accounting: time=2 threads=1

    Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2 seconds

    Could this be due to GPU overheating?

    Thank all.

     
    • Jan "yenda" Trmal

      Why did you have so much memory used? -- see the line
      The active GPU is [0]: GeForce GTX 770 free:537M, used:1510M,
      total:2047M, free/total:0.262352 version 3.0

      Did you have some other cuda enabled program running? If yes, that might be
      the problem.
      y.

      On Mon, May 25, 2015 at 5:11 AM, JTDamaja jtdamaja@users.sf.net wrote:

      Hello everyone,

      I've been experimenting lately with GPU enabled nnet2 trainig. I've tried
      train nnet with steps/nnet2/train_tanh_fast.sh. On 253-th iteration trainig
      was terminated with log:

      LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active
      GPU is [0]: GeForce GTX 770 free:537M, used:1510M, total:2047M,
      free/total:0.262352 version 3.0
      LOG (nnet-train-simple:PrintMemoryUsage():cu-device.cc:334) Memory used:
      1048576 bytes.
      ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
      of memory" returned from 'cudaGetLastError()'
      ERROR (nnet-train-simple:Scale():cu-vector.cc:1037) cudaError_t 2 : "out
      of memory" returned from 'cudaGetLastError()'

      [stack trace: ]
      kaldi::KaldiGetStackTrace()
      kaldi::KaldiErrorMessage::~KaldiErrorMessage()
      kaldi::CuVectorBase<double>::Scale(double)
      kaldi::nnet2::NonlinearComponent::Scale(float)
      kaldi::nnet2::Nnet::ZeroStats()
      nnet-train-simple(main+0x552) [0x65f08c]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fcfaa3f2ec5]
      nnet-train-simple() [0x65ea79]</double>

      Accounting: time=2 threads=1

      Ended (code 65280) at Sat May 23 22:33:53 MSK 2015, elapsed time 2

      seconds

      Could this be due to GPU overheating?

      Thank all.


      "out of memory" returned from 'cudaGetLastError()'


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/discussion/1355348/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • JTDamaja

        JTDamaja - 2015-05-26

        Thanks for your reply.
        No, I don`t have enabled program running, except nnet training.
        At rest, on GPU used only 250 MB of memory.

         
        • Daniel Povey

          Daniel Povey - 2015-05-26

          If the error is not reproducible then it is probably because of
          overheating. You can restart training from the 253'th iteration by
          using the --stage option to the script.
          Dan

          On Tue, May 26, 2015 at 4:21 AM, JTDamaja jtdamaja@users.sf.net wrote:

          Thanks for your reply.
          No, I don`t have enabled program running, except nnet training.
          At rest, on GPU used only 250 MB of memory.


          "out of memory" returned from 'cudaGetLastError()'


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

           
          • JTDamaja

            JTDamaja - 2015-05-29

            Thanks for your reply.

             
MongoDB Logo MongoDB