You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(1) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
(8) |
Mar
|
Apr
(1) |
May
(3) |
Jun
(13) |
Jul
(7) |
Aug
(11) |
Sep
(6) |
Oct
(14) |
Nov
(16) |
Dec
(1) |
2013 |
Jan
(3) |
Feb
(8) |
Mar
(17) |
Apr
(21) |
May
(27) |
Jun
(11) |
Jul
(11) |
Aug
(21) |
Sep
(39) |
Oct
(17) |
Nov
(39) |
Dec
(28) |
2014 |
Jan
(36) |
Feb
(30) |
Mar
(35) |
Apr
(17) |
May
(22) |
Jun
(28) |
Jul
(23) |
Aug
(41) |
Sep
(17) |
Oct
(10) |
Nov
(22) |
Dec
(56) |
2015 |
Jan
(30) |
Feb
(32) |
Mar
(37) |
Apr
(28) |
May
(79) |
Jun
(18) |
Jul
(35) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: KERMORVANT, C. <Chr...@a2...> - 2013-09-15 10:41:04
|
Hi, I have configured an automatic build of Kaldi which runs the egs/rm/s5 recipe every day. It runs on a Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 12*2 CPU, 64 Gb memory. The current time for running this recipe on this computer is 55 minutes. -- Chris ________________________________________ De : fe...@in... [fe...@in...] Envoyé : jeudi 5 septembre 2013 16:25 À : kal...@li... Objet : [Kaldi-developers] Questions About Hardware Dear Sirs, We are with the Speech Processing and Transmission Laboratory at University of Chile. We want to install Kaldi for speech recognition tasks that use the World Street Journal Data Base (WJS0). Link: http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S6A Currently, we don't have access to any cluster or GPU for computation. So, our questions are: 1.- Do you think it is feasible to use just a regular computer (e.g. Intel i7, Xeon) to run Kaldi without a cluster or GPU? 2.- How long do you think an experiment would take for each configuration (just one PC, GPU and cluster). I mean a rough idea (hours, days, more than 3 days, etc)? 3.- Do you suggest an economic alternative of hardware to run Kaldi for the specified task? Best Regards, -Felipe Espic ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk _______________________________________________ Kaldi-developers mailing list Kal...@li... https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Daniel P. <dp...@gm...> - 2013-09-13 15:35:06
|
THe basic configuration does not require GPUs at all. You need Linux. In a meeting-> short reply Dan On Thu, Sep 5, 2013 at 10:25 AM, <fe...@in...> wrote: > > Dear Sirs, > > > We are with the Speech Processing and Transmission Laboratory at > University of Chile. > We want to install Kaldi for speech recognition tasks that use the > World Street Journal Data Base (WJS0). > Link: http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S6A > > Currently, we don't have access to any cluster or GPU for computation. > > So, our questions are: > > 1.- Do you think it is feasible to use just a regular computer (e.g. > Intel i7, Xeon) to run Kaldi without a cluster or GPU? > > 2.- How long do you think an experiment would take for each > configuration (just one PC, GPU and cluster). I mean a rough idea > (hours, days, more than 3 days, etc)? > > > 3.- Do you suggest an economic alternative of hardware to run Kaldi > for the specified task? > > > Best Regards, > > > -Felipe Espic > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. Consolidate legacy IT systems to a single system of record for IT > 2. Standardize and globalize service processes across IT > 3. Implement zero-touch automation to replace manual, redundant tasks > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Daniel P. <dp...@gm...> - 2013-09-12 01:05:44
|
I just fixed this issue (it was a fix to CuRand that didn't compile without CUDA, I added an #ifdef). Chris, thanks for setting up the automatic build. Dan On Wed, Sep 11, 2013 at 5:24 PM, <jen...@a2...> wrote: > Kaldi - Build # 136 - Failure: > > Check console output at http://jenkins.a2ialab.com/jenkins/job/Kaldi/136/ to view the results. > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. Consolidate legacy IT systems to a single system of record for IT > 2. Standardize and globalize service processes across IT > 3. Implement zero-touch automation to replace manual, redundant tasks > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: <jen...@a2...> - 2013-09-12 01:01:20
|
Kaldi - Build # 136 - Failure: Check console output at http://jenkins.a2ialab.com/jenkins/job/Kaldi/136/ to view the results. |
From: Tony R. <to...@ca...> - 2013-09-11 08:09:35
|
Hi Karel, Indeed, you are spending nearly all your time in cublasSgemm() which is as fast as you are ever going to go - congratulations. I see you've thought about whether to store matrices or transposes in AddMatMat(). For CPU code I think it's worth storing both, that way you get efficient memory access on both the forward and the backward pass. For tiled matrix matrix CUDA code there probably is nothing to be gained, but I do seem to recall that cublasSgemm() uses rectangular tiles of 64x32 so perhaps there is a little. I also like the fact that computing gradients and updating weights take virtually none of your time. All very interesting, many thanks, Tony On 09/10/2013 06:27 PM, Karel Vesely wrote: > Hi Tony, Arnab, Dan and others, > the idea is interesting, but first let's have a look at the profiling > numbers > coming from 12th iteration of switchboard DNN recipe on GTX680, > topology with 6hid sigmoid layers, 2048 neurons each, roughly 9000 outputs: > > AddColSumMat 239.611s > AddMat 1341.95s > AddMatMat 20580s > AddRowSumMat 927.644s > AddVec 937.368s > AddVecToRows 164.433s > CuMatrix::CopyFromMatD2D 194.821s > CuMatrix::CopyFromMatH2D 11.0674s > CuMatrix::CopyToMatD2H 0.113835s > CuMatrix::SetZero 51.3638s > CuStlVector::CopyFromVecH2D 3.81734s > CuStlVector::CopyToVecD2H 7.35736s > CuStlVector::SetZero 4.45224s > CuVector::CopyFromVecH2D 0.000355005s > CuVector::CopyToVecD2H 5.94601s > CuVector::SetZero 109.982s > DiffSigmoid 197.664s > DiffXent 2.89855s > DivRowsVec 80.7666s > FindRowMaxId 177.468s > MulColsVec 5.66732s > Randomize 6.10032s > Set 21.3223s > Sigmoid 267.29s > Softmax 733.461s > Splice 7.73193s > > > The total amount of time was 25380s, and CUBLAS matrix multiplication is > 81% of the time. > The idea is suitable for simple hidden layer activations (Sigmoids), > where we would save > 2x access to global GPU memory. On the other hand Sigmoid corresponds to > 1% of run-time. > > Based on these stats and assumption that CUBLAS is written optimally, > we can say that the extra memory access for activation functions is not > an issue. > Maybe in case of smaller nets, the numbers would be different, > but those also have faster training times. > > Is this argumentation convincing? :) > > Best regards, > Karel > > > > On 09/05/13 18:06, Tony Robinson wrote: >> On 09/05/2013 04:18 PM, Arnab Ghoshal wrote: >>> On Thu, Sep 5, 2013 at 4:54 PM, Tony Robinson <to...@ca...> wrote: >>>> I guess we can ask the question in the other way: does anyone have any >>>> profile information to share? That is, what GPU utilisation does Kaldi >>>> achieve? Clearly if it's currently getting over (say) 50% then there >>>> is no point in thinking about this any more. >>> I don't think it is possible to look up the computation utilization of >>> the GTX cards or at least I haven't figured out how to. >> If you run the nvidia visual profiler (nvvp) which is available as part >> of the CUDA toolkit download >> (https://developer.nvidia.com/nvidia-visual-profiler) you can get the >> compute utilization and much else besides. All you need do is create a >> new session with the binary and relevant arguments (ensuring that you >> binary will only run for a short amount of time e.g ~3 secs) and then >> generate a timeline for your program. Once you have a timeline you can >> used the 'guided analysis' to measure different metrics >> (http://docs.nvidia.com/cuda/profiler-users-guide/index.html#analysis-view) >> including compute utilization. >> >> Tony >> (who cheated - I'm not a GPU guru - I had to ask a colleague to write >> the above paragraph for me) >> > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. Consolidate legacy IT systems to a single system of record for IT > 2. Standardize and globalize service processes across IT > 3. Implement zero-touch automation to replace manual, redundant tasks > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers -- Dr A J Robinson, Founder and CEO, Cantab Research Ltd Company reg no GB 05697423, VAT reg no 925606030 Phone direct: +44 (0)1223 977211, office: +44 (0)1223 794497 St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK |
From: Karel V. <ve...@gm...> - 2013-09-10 17:27:45
|
Hi Tony, Arnab, Dan and others, the idea is interesting, but first let's have a look at the profiling numbers coming from 12th iteration of switchboard DNN recipe on GTX680, topology with 6hid sigmoid layers, 2048 neurons each, roughly 9000 outputs: AddColSumMat 239.611s AddMat 1341.95s AddMatMat 20580s AddRowSumMat 927.644s AddVec 937.368s AddVecToRows 164.433s CuMatrix::CopyFromMatD2D 194.821s CuMatrix::CopyFromMatH2D 11.0674s CuMatrix::CopyToMatD2H 0.113835s CuMatrix::SetZero 51.3638s CuStlVector::CopyFromVecH2D 3.81734s CuStlVector::CopyToVecD2H 7.35736s CuStlVector::SetZero 4.45224s CuVector::CopyFromVecH2D 0.000355005s CuVector::CopyToVecD2H 5.94601s CuVector::SetZero 109.982s DiffSigmoid 197.664s DiffXent 2.89855s DivRowsVec 80.7666s FindRowMaxId 177.468s MulColsVec 5.66732s Randomize 6.10032s Set 21.3223s Sigmoid 267.29s Softmax 733.461s Splice 7.73193s The total amount of time was 25380s, and CUBLAS matrix multiplication is 81% of the time. The idea is suitable for simple hidden layer activations (Sigmoids), where we would save 2x access to global GPU memory. On the other hand Sigmoid corresponds to 1% of run-time. Based on these stats and assumption that CUBLAS is written optimally, we can say that the extra memory access for activation functions is not an issue. Maybe in case of smaller nets, the numbers would be different, but those also have faster training times. Is this argumentation convincing? :) Best regards, Karel On 09/05/13 18:06, Tony Robinson wrote: > On 09/05/2013 04:18 PM, Arnab Ghoshal wrote: >> On Thu, Sep 5, 2013 at 4:54 PM, Tony Robinson <to...@ca...> wrote: >>> I guess we can ask the question in the other way: does anyone have any >>> profile information to share? That is, what GPU utilisation does Kaldi >>> achieve? Clearly if it's currently getting over (say) 50% then there >>> is no point in thinking about this any more. >> I don't think it is possible to look up the computation utilization of >> the GTX cards or at least I haven't figured out how to. > If you run the nvidia visual profiler (nvvp) which is available as part > of the CUDA toolkit download > (https://developer.nvidia.com/nvidia-visual-profiler) you can get the > compute utilization and much else besides. All you need do is create a > new session with the binary and relevant arguments (ensuring that you > binary will only run for a short amount of time e.g ~3 secs) and then > generate a timeline for your program. Once you have a timeline you can > used the 'guided analysis' to measure different metrics > (http://docs.nvidia.com/cuda/profiler-users-guide/index.html#analysis-view) > including compute utilization. > > Tony > (who cheated - I'm not a GPU guru - I had to ask a colleague to write > the above paragraph for me) > |
From: Tony R. <to...@ca...> - 2013-09-05 16:06:10
|
On 09/05/2013 04:18 PM, Arnab Ghoshal wrote: > On Thu, Sep 5, 2013 at 4:54 PM, Tony Robinson <to...@ca...> wrote: >> I guess we can ask the question in the other way: does anyone have any >> profile information to share? That is, what GPU utilisation does Kaldi >> achieve? Clearly if it's currently getting over (say) 50% then there >> is no point in thinking about this any more. > I don't think it is possible to look up the computation utilization of > the GTX cards or at least I haven't figured out how to. If you run the nvidia visual profiler (nvvp) which is available as part of the CUDA toolkit download (https://developer.nvidia.com/nvidia-visual-profiler) you can get the compute utilization and much else besides. All you need do is create a new session with the binary and relevant arguments (ensuring that you binary will only run for a short amount of time e.g ~3 secs) and then generate a timeline for your program. Once you have a timeline you can used the 'guided analysis' to measure different metrics (http://docs.nvidia.com/cuda/profiler-users-guide/index.html#analysis-view) including compute utilization. Tony (who cheated - I'm not a GPU guru - I had to ask a colleague to write the above paragraph for me) -- Dr A J Robinson, Founder and Director of Cantab Research Limited. St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: Arnab G. <ar...@gm...> - 2013-09-05 15:18:59
|
On Thu, Sep 5, 2013 at 4:54 PM, Tony Robinson <to...@ca...> wrote: > I guess we can ask the question in the other way: does anyone have any > profile information to share? That is, what GPU utilisation does Kaldi > achieve? Clearly if it's currently getting over (say) 50% then there > is no point in thinking about this any more. I don't think it is possible to look up the computation utilization of the GTX cards or at least I haven't figured out how to. You can look up the GPU memory utilization, temperature, fan speed, etc. with nvidia-smi. We have found CPU utilization, GPU fan speed and (to a lesser extent) temperature to be reasonable surrogates for GPU utilization. We are still not using Kaldi nnet code at Edinburgh due to the inertia of changing the experimental setups, and so have no stats to share. |
From: Daniel P. <dp...@gm...> - 2013-09-05 14:58:46
|
>> I'm not too concerned about this as it >> seems to me that the matrix-multiply will be much slower than the >> softmax (as it's O(n^3) not O(n^2)), and therefore the small penalty >> from doing them separately does not matter relative to the possibly >> large performance gain from the faster matrix multiply. > > > Certainly the problem goes away when n is large. But do we have large n? > This implies large minibatch sizes which may slow down the weight updates > and certainly take quite a lot of GPU memory. Assuming about 2GB RAM there > is a limit to the size you can make minibatches. The factor by which matmul is slower than softmax is anyway not the minibatch size, it's the size of the nonlinear layers, I think. > I guess we can ask the question in the other way: does anyone have any > profile information to share? That is, what GPU utilisation does Kaldi > achieve? Clearly if it's currently getting over (say) 50% then there is no > point in thinking about this any more. As it is, my main concern is > satisfied, I was just looking in the wrong place. I'm not sure. Karel would know (for his codebase); mine is still in the early stages (I'm still fixing issues with it). Dan > Tony > > -- > Dr A J Robinson, Founder and Director of Cantab Research Limited. > St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. > Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: Tony R. <to...@ca...> - 2013-09-05 14:55:08
|
On 09/05/2013 03:41 PM, Daniel Povey wrote: > So what you're concerned about is the fact that the results of the > sgemm CUDA kernel are written out to GPU memory before being read in > again to do the nonlinearity? Yes, exactly. > I'm not too concerned about this as it > seems to me that the matrix-multiply will be much slower than the > softmax (as it's O(n^3) not O(n^2)), and therefore the small penalty > from doing them separately does not matter relative to the possibly > large performance gain from the faster matrix multiply. Certainly the problem goes away when n is large. But do we have large n? This implies large minibatch sizes which may slow down the weight updates and certainly take quite a lot of GPU memory. Assuming about 2GB RAM there is a limit to the size you can make minibatches. I guess we can ask the question in the other way: does anyone have any profile information to share? That is, what GPU utilisation does Kaldi achieve? Clearly if it's currently getting over (say) 50% then there is no point in thinking about this any more. As it is, my main concern is satisfied, I was just looking in the wrong place. Tony -- Dr A J Robinson, Founder and Director of Cantab Research Limited. St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: <fe...@in...> - 2013-09-05 14:51:16
|
Dear Sirs, We are with the Speech Processing and Transmission Laboratory at University of Chile. We want to install Kaldi for speech recognition tasks that use the World Street Journal Data Base (WJS0). Link: http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S6A Currently, we don't have access to any cluster or GPU for computation. So, our questions are: 1.- Do you think it is feasible to use just a regular computer (e.g. Intel i7, Xeon) to run Kaldi without a cluster or GPU? 2.- How long do you think an experiment would take for each configuration (just one PC, GPU and cluster). I mean a rough idea (hours, days, more than 3 days, etc)? 3.- Do you suggest an economic alternative of hardware to run Kaldi for the specified task? Best Regards, -Felipe Espic |
From: Daniel P. <dp...@gm...> - 2013-09-05 14:41:19
|
So what you're concerned about is the fact that the results of the sgemm CUDA kernel are written out to GPU memory before being read in again to do the nonlinearity? I'm not too concerned about this as it seems to me that the matrix-multiply will be much slower than the softmax (as it's O(n^3) not O(n^2)), and therefore the small penalty from doing them separately does not matter relative to the possibly large performance gain from the faster matrix multiply. BTW, all this stuff is on GPU memory, cublasSgemm works with inputs and outputs on the GPU board. Also, the way the nnet software in Kaldi is currently written, the softmax is anyway bound to be a separate operation from the matrix-multiply. Dan On Thu, Sep 5, 2013 at 10:37 AM, Tony Robinson <to...@ca...> wrote: > Hi Dan, > > Ah yes, I found this call in cudamatrix/cu-matrix.cc now - thanks. > > I think it's an open question as to whether CUBLAS is the right way to go or > not. > > For: As you say, cublasSgemm() is very optimised. The example code that > comes with CUDA 5.5 gets 1.4TFLOP from a GTX TITAN and 2 * 1.0TFLOP from a > GTX 690 - impressive stuff. > > Against: CUBLAS doens't do what we want for NN implementations. There is a > high latency overhead in writing out the results of cublasSgemm() and then > reading it in again to do a trivial sigmoid/ReLU non-linearity (or softmax > or the indirect you need for sparse inputs or ouputs). You can mask this > to some degree with streams, but the overhead is still there. > > Ideally we'd have access to the CUBLAS source code and would be able to add > the non-linearity in just before writing out and so it would come for free. > My feeling right now is that it could well be better to use a slower matrix > multiply that is modifiable just to avoid the extra write then read. > > > Tony > > > On 09/05/2013 03:06 PM, Daniel Povey wrote: >> >> For matrix multiplication we just call CUBLAS >> (cublasDgemm/cublasSgemm), because we imagine it will be more highly >> optimized than anything we can code. >> BTW, the latest on the cudamatrix stuff is in the sandbox in >> ^/sandbox/dan2. This is being actively developed right now. >> >> Dan >> >> >> On Thu, Sep 5, 2013 at 7:50 AM, Tony Robinson <to...@ca...> >> wrote: >>> >>> Karel et al, >>> >>> I've spent a long time thinking about how to efficiently implement NNs >>> on GPUs (inc. taking the Coursera and Udacity courses). >>> >>> As I understand it GPUs aren't all that good at the simple view of a NN >>> which is outputVector = sigmoid(inputVector * weightMatrix) as they have >>> to read the entire weight matrix just to compute one output. However, >>> we often use minibatches so instead of doing vector matrix operations we >>> can group all the input vectors in a minibatch into a matrix and run >>> matrix matrix operations. That is do outputVector[t] = >>> sigmoid(inputVector[t] * weightMatrix) all in one go and so >>> substantially reduce memory bandwidth. >>> >>> Having got somewhat disillusioned with the CUBLAS calls I've poked >>> around kaldi/src/cudamatrix and I find cuda_mul_elements(), >>> cuda_mul_cols_vec() and cuda_mul_rows_vec() but no cuda_mul_mat(). >>> >>> Have I got this right in that Kaldi doesn't use GPU matrix matrix >>> operations? If so, is there a theoretical reason why not? >>> >>> >>> Tony >>> >>> -- >>> Dr A J Robinson, Founder and Director of Cantab Research Limited. >>> St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. >>> Company reg no 05697423 (England and Wales), VAT reg no 925606030. >>> >>> >>> ------------------------------------------------------------------------------ >>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>> Discover the easy way to master current and previous Microsoft >>> technologies >>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>> tutorial videos with LearnDevNow. Subscribe today and save! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > > -- > Dr A J Robinson, Founder and Director of Cantab Research Limited. > St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. > Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: Tony R. <to...@ca...> - 2013-09-05 14:37:38
|
Hi Dan, Ah yes, I found this call in cudamatrix/cu-matrix.cc now - thanks. I think it's an open question as to whether CUBLAS is the right way to go or not. For: As you say, cublasSgemm() is very optimised. The example code that comes with CUDA 5.5 gets 1.4TFLOP from a GTX TITAN and 2 * 1.0TFLOP from a GTX 690 - impressive stuff. Against: CUBLAS doens't do what we want for NN implementations. There is a high latency overhead in writing out the results of cublasSgemm() and then reading it in again to do a trivial sigmoid/ReLU non-linearity (or softmax or the indirect you need for sparse inputs or ouputs). You can mask this to some degree with streams, but the overhead is still there. Ideally we'd have access to the CUBLAS source code and would be able to add the non-linearity in just before writing out and so it would come for free. My feeling right now is that it could well be better to use a slower matrix multiply that is modifiable just to avoid the extra write then read. Tony On 09/05/2013 03:06 PM, Daniel Povey wrote: > For matrix multiplication we just call CUBLAS > (cublasDgemm/cublasSgemm), because we imagine it will be more highly > optimized than anything we can code. > BTW, the latest on the cudamatrix stuff is in the sandbox in > ^/sandbox/dan2. This is being actively developed right now. > > Dan > > > On Thu, Sep 5, 2013 at 7:50 AM, Tony Robinson <to...@ca...> wrote: >> Karel et al, >> >> I've spent a long time thinking about how to efficiently implement NNs >> on GPUs (inc. taking the Coursera and Udacity courses). >> >> As I understand it GPUs aren't all that good at the simple view of a NN >> which is outputVector = sigmoid(inputVector * weightMatrix) as they have >> to read the entire weight matrix just to compute one output. However, >> we often use minibatches so instead of doing vector matrix operations we >> can group all the input vectors in a minibatch into a matrix and run >> matrix matrix operations. That is do outputVector[t] = >> sigmoid(inputVector[t] * weightMatrix) all in one go and so >> substantially reduce memory bandwidth. >> >> Having got somewhat disillusioned with the CUBLAS calls I've poked >> around kaldi/src/cudamatrix and I find cuda_mul_elements(), >> cuda_mul_cols_vec() and cuda_mul_rows_vec() but no cuda_mul_mat(). >> >> Have I got this right in that Kaldi doesn't use GPU matrix matrix >> operations? If so, is there a theoretical reason why not? >> >> >> Tony >> >> -- >> Dr A J Robinson, Founder and Director of Cantab Research Limited. >> St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. >> Company reg no 05697423 (England and Wales), VAT reg no 925606030. >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers -- Dr A J Robinson, Founder and Director of Cantab Research Limited. St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: Daniel P. <dp...@gm...> - 2013-09-05 14:06:43
|
For matrix multiplication we just call CUBLAS (cublasDgemm/cublasSgemm), because we imagine it will be more highly optimized than anything we can code. BTW, the latest on the cudamatrix stuff is in the sandbox in ^/sandbox/dan2. This is being actively developed right now. Dan On Thu, Sep 5, 2013 at 7:50 AM, Tony Robinson <to...@ca...> wrote: > Karel et al, > > I've spent a long time thinking about how to efficiently implement NNs > on GPUs (inc. taking the Coursera and Udacity courses). > > As I understand it GPUs aren't all that good at the simple view of a NN > which is outputVector = sigmoid(inputVector * weightMatrix) as they have > to read the entire weight matrix just to compute one output. However, > we often use minibatches so instead of doing vector matrix operations we > can group all the input vectors in a minibatch into a matrix and run > matrix matrix operations. That is do outputVector[t] = > sigmoid(inputVector[t] * weightMatrix) all in one go and so > substantially reduce memory bandwidth. > > Having got somewhat disillusioned with the CUBLAS calls I've poked > around kaldi/src/cudamatrix and I find cuda_mul_elements(), > cuda_mul_cols_vec() and cuda_mul_rows_vec() but no cuda_mul_mat(). > > Have I got this right in that Kaldi doesn't use GPU matrix matrix > operations? If so, is there a theoretical reason why not? > > > Tony > > -- > Dr A J Robinson, Founder and Director of Cantab Research Limited. > St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. > Company reg no 05697423 (England and Wales), VAT reg no 925606030. > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Tony R. <to...@ca...> - 2013-09-05 12:15:19
|
Karel et al, I've spent a long time thinking about how to efficiently implement NNs on GPUs (inc. taking the Coursera and Udacity courses). As I understand it GPUs aren't all that good at the simple view of a NN which is outputVector = sigmoid(inputVector * weightMatrix) as they have to read the entire weight matrix just to compute one output. However, we often use minibatches so instead of doing vector matrix operations we can group all the input vectors in a minibatch into a matrix and run matrix matrix operations. That is do outputVector[t] = sigmoid(inputVector[t] * weightMatrix) all in one go and so substantially reduce memory bandwidth. Having got somewhat disillusioned with the CUBLAS calls I've poked around kaldi/src/cudamatrix and I find cuda_mul_elements(), cuda_mul_cols_vec() and cuda_mul_rows_vec() but no cuda_mul_mat(). Have I got this right in that Kaldi doesn't use GPU matrix matrix operations? If so, is there a theoretical reason why not? Tony -- Dr A J Robinson, Founder and Director of Cantab Research Limited. St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. Company reg no 05697423 (England and Wales), VAT reg no 925606030. |
From: Ben J. <be...@ne...> - 2013-09-03 00:36:21
|
Ok, got it. Let me try 200k first. I just updated the trunk, but couldn't find run_nnet2.sh. Is it supposed to be in wsj/s5/local/? Thanks Ben On Mon, Sep 2, 2013 at 7:52 PM, Daniel Povey <dp...@gm...> wrote: > That log-prob per frame if -7.31 is too low, it should be something > like -2, no lower-- maybe -3 on the 1st iteration. The size of your > training data does not matter, what matters is the #samples you > process per iteration. Maybe try reducing it from 400k (the default, > I think) to 200k. Or use the newer example scripts where I think that > is the default. (if you update the trunk and look at the example > scripts run_nnet2.sh, you'll see what I mean). > > But definitely something is wrong here. > > Dan > > > On Mon, Sep 2, 2013 at 7:47 PM, Ben Jiang <be...@ne...> wrote: > > The nonlinearaty type should be the default in train_nnet_cpu.sh, which > > should be tanh. The log-prob doesn't look too bad. Below is the output > from > > a run that actually succeeded: > > LOG > (nnet-train-parallel:DoBackpropParallel():nnet-update-parallel.cc:179) > > Did backprop on 399889 examples, average log-prob per frame is -7.31817 > > > > The learning rates are 0.01 initial and 0.001 final. I kind of used the > > value from swbd, but maybe my training data is quite bigger than swbd. I > > previously tried 0.001 and 0.0001, which also failed due to an error of > > "Cannot invert: matrix is singular", but I didn't have debug on back > then, > > so it's probably the same issue. Maybe I should try even smaller, such > as > > 0.0001 and 0.00001? > > > > > > Ben > > > > > > > > On Mon, Sep 2, 2013 at 6:55 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> I think the underlying cause is instability in the training, causing > >> the derivatives to become too large. This is something that commonly > >> happens in neural net training, and the solution is generally to > >> decrease the learning rate. What nonlinearity type are you using? > >> And do the log-probs printed out in train.*.log or compute_prob_*.log > >> get very negative? > >> > >> Unbounded nonlinearities such as ReLUs are more susceptible to this > >> instability. > >> Dan > >> > >> > >> On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: > >> > I see. Thanks for the fast response, Dan. > >> > > >> > So any idea on this "random" error I am stuck with at pass 27? I have > >> > pasted the stacktrace below. This error doesn't always happen, even > >> > after > >> > I removed the randomness introduced in the input mdl and shuffled egs. > >> > (eg, > >> > save the input mdl and shuffled egs to files and re-run the failed > >> > nnet-train-parallel from those files in debugger). The re-run would > >> > sometimes fail and sometimes succeed. > >> > > >> > Anyway, I was able catch the error in my debugger and examine the > >> > variables. > >> > I think the reason is that the deriv variable in > >> > NnetUpdater::Backprop() > >> > contains some "bad" value, such as 1.50931703e+20. This caused the > >> > trace of > >> > the matrix to become infinite and in turn cause the p_trace to become > 0 > >> > and > >> > fail the assert. I probably need more time to see how this value got > in > >> > there, but again, since the exact re-run would pass sometimes, it's > kind > >> > of > >> > hard to debug. > >> > > >> > Any idea? > >> > > >> > Here's the stacktrace: > >> > =============================== > >> > KALDI_ASSERT: at > >> > > >> > > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, > >> > failed: p_trace != 0.0 > >> > Stack trace is: > >> > kaldi::KaldiGetStackTrace() > >> > kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) > >> > > >> > > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> > >> > const&, double, kaldi::MatrixBase<float>*) > >> > > >> > > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> > >> > const&, kaldi::MatrixBase<float> const&) > >> > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> > const&, > >> > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, int, > >> > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const > >> > > >> > > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, > >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > >> > kaldi::Matrix<float>*) > >> > > >> > > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, > >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) > >> > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, > >> > std::vector<kaldi::nnet2::NnetTrainingExample, > >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > >> > kaldi::nnet2::Nnet*) > >> > kaldi::nnet2::DoBackpropParallelClass::operator()() > >> > kaldi::MultiThreadable::run(void*) > >> > > >> > Ben > >> > > >> > > >> > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> > wrote: > >> >> > >> >> That's how it's supposed to be-- AFAIK that's basically the point of > >> >> Hogwild, that you allow these kinds of updates and accept the > >> >> possibility that due to race conditions you will occasionally lose a > >> >> bit of date. The parameters only change slightly on the timescales > >> >> that these different threads access them. > >> >> Dan > >> >> > >> >> > >> >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: > >> >> > Hi all, > >> >> > > >> >> > While hunting some random error from nnet-train-parallel, I noticed > >> >> > the > >> >> > nnet_to_update is shared among the threads, but there is no > >> >> > synchronization > >> >> > checks when updating the components in the threads. I still > haven't > >> >> > gone > >> >> > too deep in the code yet, but should there be synchronization > checks? > >> >> > > >> >> > For example, the deriv variable in NnetUpdater::Backprop() is > updated > >> >> > and > >> >> > passed between the components. Could this be an issue if the > >> >> > components > >> >> > are > >> >> > being updated by other threads? > >> >> > > >> >> > > >> >> > Or am I missing something totally? > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks > >> >> > Ben > >> >> > > >> >> > > >> >> > > >> >> > > ------------------------------------------------------------------------------ > >> >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, > >> >> > more! > >> >> > Discover the easy way to master current and previous Microsoft > >> >> > technologies > >> >> > and advance your career. Get an incredible 1,500+ hours of > >> >> > step-by-step > >> >> > tutorial videos with LearnDevNow. Subscribe today and save! > >> >> > > >> >> > > >> >> > > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > >> >> > _______________________________________________ > >> >> > Kaldi-developers mailing list > >> >> > Kal...@li... > >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> >> > > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > -- > >> > Thanks > >> > Ben Jiang > >> > > >> > Co-Founder/Principal/CTO > >> > Nexiwave.com > >> > Tel: 226-975-2172 / 617-245-0916 > >> > "Confidential & Privileged: This email message is for the sole use of > >> > the > >> > intended recipient(s) and may contain confidential and privileged > >> > information. Any unauthorized review, use, disclosure or distribution > is > >> > prohibited. if you are not the intended recipient, please contact the > >> > sender > >> > by reply email and destroy all copies of the original message.” > > > > > > > > > > -- > > > > -- > > Thanks > > Ben Jiang > > > > Co-Founder/Principal/CTO > > Nexiwave.com > > Tel: 226-975-2172 / 617-245-0916 > > "Confidential & Privileged: This email message is for the sole use of the > > intended recipient(s) and may contain confidential and privileged > > information. Any unauthorized review, use, disclosure or distribution is > > prohibited. if you are not the intended recipient, please contact the > sender > > by reply email and destroy all copies of the original message.” > -- -- Thanks Ben Jiang Co-Founder/Principal/CTO Nexiwave.com Tel: 226-975-2172 / 617-245-0916 "Confidential & Privileged: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. if you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.” |
From: Ben J. <be...@ne...> - 2013-09-03 00:20:30
|
Got it. Really appreciate the help here! I'll report any findings back here. Ben On Mon, Sep 2, 2013 at 8:06 PM, Daniel Povey <dp...@gm...> wrote: > Sorry, in rm/s5, it's local/run_nnet2.sh, in wsj/s5 it's > local/run_nnet_cpu.sh > Dan > > > On Mon, Sep 2, 2013 at 8:04 PM, Ben Jiang <be...@ne...> wrote: > > Ok, got it. Let me try 200k first. > > > > I just updated the trunk, but couldn't find run_nnet2.sh. Is it > supposed to > > be in wsj/s5/local/? > > > > > > Thanks > > Ben > > > > > > > > On Mon, Sep 2, 2013 at 7:52 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> That log-prob per frame if -7.31 is too low, it should be something > >> like -2, no lower-- maybe -3 on the 1st iteration. The size of your > >> training data does not matter, what matters is the #samples you > >> process per iteration. Maybe try reducing it from 400k (the default, > >> I think) to 200k. Or use the newer example scripts where I think that > >> is the default. (if you update the trunk and look at the example > >> scripts run_nnet2.sh, you'll see what I mean). > >> > >> But definitely something is wrong here. > >> > >> Dan > >> > >> > >> On Mon, Sep 2, 2013 at 7:47 PM, Ben Jiang <be...@ne...> wrote: > >> > The nonlinearaty type should be the default in train_nnet_cpu.sh, > which > >> > should be tanh. The log-prob doesn't look too bad. Below is the output > >> > from > >> > a run that actually succeeded: > >> > LOG > >> > (nnet-train-parallel:DoBackpropParallel():nnet-update-parallel.cc:179) > >> > Did backprop on 399889 examples, average log-prob per frame is > -7.31817 > >> > > >> > The learning rates are 0.01 initial and 0.001 final. I kind of used > the > >> > value from swbd, but maybe my training data is quite bigger than swbd. > >> > I > >> > previously tried 0.001 and 0.0001, which also failed due to an error > of > >> > "Cannot invert: matrix is singular", but I didn't have debug on back > >> > then, > >> > so it's probably the same issue. Maybe I should try even smaller, > such > >> > as > >> > 0.0001 and 0.00001? > >> > > >> > > >> > Ben > >> > > >> > > >> > > >> > On Mon, Sep 2, 2013 at 6:55 PM, Daniel Povey <dp...@gm...> > wrote: > >> >> > >> >> I think the underlying cause is instability in the training, causing > >> >> the derivatives to become too large. This is something that commonly > >> >> happens in neural net training, and the solution is generally to > >> >> decrease the learning rate. What nonlinearity type are you using? > >> >> And do the log-probs printed out in train.*.log or compute_prob_*.log > >> >> get very negative? > >> >> > >> >> Unbounded nonlinearities such as ReLUs are more susceptible to this > >> >> instability. > >> >> Dan > >> >> > >> >> > >> >> On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: > >> >> > I see. Thanks for the fast response, Dan. > >> >> > > >> >> > So any idea on this "random" error I am stuck with at pass 27? I > >> >> > have > >> >> > pasted the stacktrace below. This error doesn't always happen, > even > >> >> > after > >> >> > I removed the randomness introduced in the input mdl and shuffled > >> >> > egs. > >> >> > (eg, > >> >> > save the input mdl and shuffled egs to files and re-run the failed > >> >> > nnet-train-parallel from those files in debugger). The re-run > would > >> >> > sometimes fail and sometimes succeed. > >> >> > > >> >> > Anyway, I was able catch the error in my debugger and examine the > >> >> > variables. > >> >> > I think the reason is that the deriv variable in > >> >> > NnetUpdater::Backprop() > >> >> > contains some "bad" value, such as 1.50931703e+20. This caused the > >> >> > trace of > >> >> > the matrix to become infinite and in turn cause the p_trace to > become > >> >> > 0 > >> >> > and > >> >> > fail the assert. I probably need more time to see how this value > got > >> >> > in > >> >> > there, but again, since the exact re-run would pass sometimes, it's > >> >> > kind > >> >> > of > >> >> > hard to debug. > >> >> > > >> >> > Any idea? > >> >> > > >> >> > Here's the stacktrace: > >> >> > =============================== > >> >> > KALDI_ASSERT: at > >> >> > > >> >> > > >> >> > > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, > >> >> > failed: p_trace != 0.0 > >> >> > Stack trace is: > >> >> > kaldi::KaldiGetStackTrace() > >> >> > kaldi::KaldiAssertFailure_(char const*, char const*, int, char > >> >> > const*) > >> >> > > >> >> > > >> >> > > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> > >> >> > const&, double, kaldi::MatrixBase<float>*) > >> >> > > >> >> > > >> >> > > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> > >> >> > const&, kaldi::MatrixBase<float> const&) > >> >> > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> > >> >> > const&, > >> >> > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, > >> >> > int, > >> >> > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const > >> >> > > >> >> > > >> >> > > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, > >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > >> >> > kaldi::Matrix<float>*) > >> >> > > >> >> > > >> >> > > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, > >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) > >> >> > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, > >> >> > std::vector<kaldi::nnet2::NnetTrainingExample, > >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > >> >> > kaldi::nnet2::Nnet*) > >> >> > kaldi::nnet2::DoBackpropParallelClass::operator()() > >> >> > kaldi::MultiThreadable::run(void*) > >> >> > > >> >> > Ben > >> >> > > >> >> > > >> >> > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> > >> >> > wrote: > >> >> >> > >> >> >> That's how it's supposed to be-- AFAIK that's basically the point > of > >> >> >> Hogwild, that you allow these kinds of updates and accept the > >> >> >> possibility that due to race conditions you will occasionally > lose a > >> >> >> bit of date. The parameters only change slightly on the > timescales > >> >> >> that these different threads access them. > >> >> >> Dan > >> >> >> > >> >> >> > >> >> >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> > wrote: > >> >> >> > Hi all, > >> >> >> > > >> >> >> > While hunting some random error from nnet-train-parallel, I > >> >> >> > noticed > >> >> >> > the > >> >> >> > nnet_to_update is shared among the threads, but there is no > >> >> >> > synchronization > >> >> >> > checks when updating the components in the threads. I still > >> >> >> > haven't > >> >> >> > gone > >> >> >> > too deep in the code yet, but should there be synchronization > >> >> >> > checks? > >> >> >> > > >> >> >> > For example, the deriv variable in NnetUpdater::Backprop() is > >> >> >> > updated > >> >> >> > and > >> >> >> > passed between the components. Could this be an issue if the > >> >> >> > components > >> >> >> > are > >> >> >> > being updated by other threads? > >> >> >> > > >> >> >> > > >> >> >> > Or am I missing something totally? > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > Thanks > >> >> >> > Ben > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ------------------------------------------------------------------------------ > >> >> >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, > >> >> >> > more! > >> >> >> > Discover the easy way to master current and previous Microsoft > >> >> >> > technologies > >> >> >> > and advance your career. Get an incredible 1,500+ hours of > >> >> >> > step-by-step > >> >> >> > tutorial videos with LearnDevNow. Subscribe today and save! > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > >> >> >> > _______________________________________________ > >> >> >> > Kaldi-developers mailing list > >> >> >> > Kal...@li... > >> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > > >> >> > -- > >> >> > Thanks > >> >> > Ben Jiang > >> >> > > >> >> > Co-Founder/Principal/CTO > >> >> > Nexiwave.com > >> >> > Tel: 226-975-2172 / 617-245-0916 > >> >> > "Confidential & Privileged: This email message is for the sole use > of > >> >> > the > >> >> > intended recipient(s) and may contain confidential and privileged > >> >> > information. Any unauthorized review, use, disclosure or > distribution > >> >> > is > >> >> > prohibited. if you are not the intended recipient, please contact > the > >> >> > sender > >> >> > by reply email and destroy all copies of the original message.” > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > -- > >> > Thanks > >> > Ben Jiang > >> > > >> > Co-Founder/Principal/CTO > >> > Nexiwave.com > >> > Tel: 226-975-2172 / 617-245-0916 > >> > "Confidential & Privileged: This email message is for the sole use of > >> > the > >> > intended recipient(s) and may contain confidential and privileged > >> > information. Any unauthorized review, use, disclosure or distribution > is > >> > prohibited. if you are not the intended recipient, please contact the > >> > sender > >> > by reply email and destroy all copies of the original message.” > > > > > > > > > > -- > > > > -- > > Thanks > > Ben Jiang > > > > Co-Founder/Principal/CTO > > Nexiwave.com > > Tel: 226-975-2172 / 617-245-0916 > > "Confidential & Privileged: This email message is for the sole use of the > > intended recipient(s) and may contain confidential and privileged > > information. Any unauthorized review, use, disclosure or distribution is > > prohibited. if you are not the intended recipient, please contact the > sender > > by reply email and destroy all copies of the original message.” > -- -- Thanks Ben Jiang Co-Founder/Principal/CTO Nexiwave.com Tel: 226-975-2172 / 617-245-0916 "Confidential & Privileged: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. if you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.” |
From: Daniel P. <dp...@gm...> - 2013-09-03 00:07:07
|
Sorry, in rm/s5, it's local/run_nnet2.sh, in wsj/s5 it's local/run_nnet_cpu.sh Dan On Mon, Sep 2, 2013 at 8:04 PM, Ben Jiang <be...@ne...> wrote: > Ok, got it. Let me try 200k first. > > I just updated the trunk, but couldn't find run_nnet2.sh. Is it supposed to > be in wsj/s5/local/? > > > Thanks > Ben > > > > On Mon, Sep 2, 2013 at 7:52 PM, Daniel Povey <dp...@gm...> wrote: >> >> That log-prob per frame if -7.31 is too low, it should be something >> like -2, no lower-- maybe -3 on the 1st iteration. The size of your >> training data does not matter, what matters is the #samples you >> process per iteration. Maybe try reducing it from 400k (the default, >> I think) to 200k. Or use the newer example scripts where I think that >> is the default. (if you update the trunk and look at the example >> scripts run_nnet2.sh, you'll see what I mean). >> >> But definitely something is wrong here. >> >> Dan >> >> >> On Mon, Sep 2, 2013 at 7:47 PM, Ben Jiang <be...@ne...> wrote: >> > The nonlinearaty type should be the default in train_nnet_cpu.sh, which >> > should be tanh. The log-prob doesn't look too bad. Below is the output >> > from >> > a run that actually succeeded: >> > LOG >> > (nnet-train-parallel:DoBackpropParallel():nnet-update-parallel.cc:179) >> > Did backprop on 399889 examples, average log-prob per frame is -7.31817 >> > >> > The learning rates are 0.01 initial and 0.001 final. I kind of used the >> > value from swbd, but maybe my training data is quite bigger than swbd. >> > I >> > previously tried 0.001 and 0.0001, which also failed due to an error of >> > "Cannot invert: matrix is singular", but I didn't have debug on back >> > then, >> > so it's probably the same issue. Maybe I should try even smaller, such >> > as >> > 0.0001 and 0.00001? >> > >> > >> > Ben >> > >> > >> > >> > On Mon, Sep 2, 2013 at 6:55 PM, Daniel Povey <dp...@gm...> wrote: >> >> >> >> I think the underlying cause is instability in the training, causing >> >> the derivatives to become too large. This is something that commonly >> >> happens in neural net training, and the solution is generally to >> >> decrease the learning rate. What nonlinearity type are you using? >> >> And do the log-probs printed out in train.*.log or compute_prob_*.log >> >> get very negative? >> >> >> >> Unbounded nonlinearities such as ReLUs are more susceptible to this >> >> instability. >> >> Dan >> >> >> >> >> >> On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: >> >> > I see. Thanks for the fast response, Dan. >> >> > >> >> > So any idea on this "random" error I am stuck with at pass 27? I >> >> > have >> >> > pasted the stacktrace below. This error doesn't always happen, even >> >> > after >> >> > I removed the randomness introduced in the input mdl and shuffled >> >> > egs. >> >> > (eg, >> >> > save the input mdl and shuffled egs to files and re-run the failed >> >> > nnet-train-parallel from those files in debugger). The re-run would >> >> > sometimes fail and sometimes succeed. >> >> > >> >> > Anyway, I was able catch the error in my debugger and examine the >> >> > variables. >> >> > I think the reason is that the deriv variable in >> >> > NnetUpdater::Backprop() >> >> > contains some "bad" value, such as 1.50931703e+20. This caused the >> >> > trace of >> >> > the matrix to become infinite and in turn cause the p_trace to become >> >> > 0 >> >> > and >> >> > fail the assert. I probably need more time to see how this value got >> >> > in >> >> > there, but again, since the exact re-run would pass sometimes, it's >> >> > kind >> >> > of >> >> > hard to debug. >> >> > >> >> > Any idea? >> >> > >> >> > Here's the stacktrace: >> >> > =============================== >> >> > KALDI_ASSERT: at >> >> > >> >> > >> >> > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, >> >> > failed: p_trace != 0.0 >> >> > Stack trace is: >> >> > kaldi::KaldiGetStackTrace() >> >> > kaldi::KaldiAssertFailure_(char const*, char const*, int, char >> >> > const*) >> >> > >> >> > >> >> > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> >> >> > const&, double, kaldi::MatrixBase<float>*) >> >> > >> >> > >> >> > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> >> >> > const&, kaldi::MatrixBase<float> const&) >> >> > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> >> >> > const&, >> >> > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, >> >> > int, >> >> > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const >> >> > >> >> > >> >> > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, >> >> > kaldi::Matrix<float>*) >> >> > >> >> > >> >> > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) >> >> > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, >> >> > std::vector<kaldi::nnet2::NnetTrainingExample, >> >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, >> >> > kaldi::nnet2::Nnet*) >> >> > kaldi::nnet2::DoBackpropParallelClass::operator()() >> >> > kaldi::MultiThreadable::run(void*) >> >> > >> >> > Ben >> >> > >> >> > >> >> > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> >> >> > wrote: >> >> >> >> >> >> That's how it's supposed to be-- AFAIK that's basically the point of >> >> >> Hogwild, that you allow these kinds of updates and accept the >> >> >> possibility that due to race conditions you will occasionally lose a >> >> >> bit of date. The parameters only change slightly on the timescales >> >> >> that these different threads access them. >> >> >> Dan >> >> >> >> >> >> >> >> >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: >> >> >> > Hi all, >> >> >> > >> >> >> > While hunting some random error from nnet-train-parallel, I >> >> >> > noticed >> >> >> > the >> >> >> > nnet_to_update is shared among the threads, but there is no >> >> >> > synchronization >> >> >> > checks when updating the components in the threads. I still >> >> >> > haven't >> >> >> > gone >> >> >> > too deep in the code yet, but should there be synchronization >> >> >> > checks? >> >> >> > >> >> >> > For example, the deriv variable in NnetUpdater::Backprop() is >> >> >> > updated >> >> >> > and >> >> >> > passed between the components. Could this be an issue if the >> >> >> > components >> >> >> > are >> >> >> > being updated by other threads? >> >> >> > >> >> >> > >> >> >> > Or am I missing something totally? >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Thanks >> >> >> > Ben >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------------ >> >> >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, >> >> >> > more! >> >> >> > Discover the easy way to master current and previous Microsoft >> >> >> > technologies >> >> >> > and advance your career. Get an incredible 1,500+ hours of >> >> >> > step-by-step >> >> >> > tutorial videos with LearnDevNow. Subscribe today and save! >> >> >> > >> >> >> > >> >> >> > >> >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> >> >> > _______________________________________________ >> >> >> > Kaldi-developers mailing list >> >> >> > Kal...@li... >> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > -- >> >> > Thanks >> >> > Ben Jiang >> >> > >> >> > Co-Founder/Principal/CTO >> >> > Nexiwave.com >> >> > Tel: 226-975-2172 / 617-245-0916 >> >> > "Confidential & Privileged: This email message is for the sole use of >> >> > the >> >> > intended recipient(s) and may contain confidential and privileged >> >> > information. Any unauthorized review, use, disclosure or distribution >> >> > is >> >> > prohibited. if you are not the intended recipient, please contact the >> >> > sender >> >> > by reply email and destroy all copies of the original message.” >> > >> > >> > >> > >> > -- >> > >> > -- >> > Thanks >> > Ben Jiang >> > >> > Co-Founder/Principal/CTO >> > Nexiwave.com >> > Tel: 226-975-2172 / 617-245-0916 >> > "Confidential & Privileged: This email message is for the sole use of >> > the >> > intended recipient(s) and may contain confidential and privileged >> > information. Any unauthorized review, use, disclosure or distribution is >> > prohibited. if you are not the intended recipient, please contact the >> > sender >> > by reply email and destroy all copies of the original message.” > > > > > -- > > -- > Thanks > Ben Jiang > > Co-Founder/Principal/CTO > Nexiwave.com > Tel: 226-975-2172 / 617-245-0916 > "Confidential & Privileged: This email message is for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. if you are not the intended recipient, please contact the sender > by reply email and destroy all copies of the original message.” |
From: Daniel P. <dp...@gm...> - 2013-09-02 23:52:30
|
That log-prob per frame if -7.31 is too low, it should be something like -2, no lower-- maybe -3 on the 1st iteration. The size of your training data does not matter, what matters is the #samples you process per iteration. Maybe try reducing it from 400k (the default, I think) to 200k. Or use the newer example scripts where I think that is the default. (if you update the trunk and look at the example scripts run_nnet2.sh, you'll see what I mean). But definitely something is wrong here. Dan On Mon, Sep 2, 2013 at 7:47 PM, Ben Jiang <be...@ne...> wrote: > The nonlinearaty type should be the default in train_nnet_cpu.sh, which > should be tanh. The log-prob doesn't look too bad. Below is the output from > a run that actually succeeded: > LOG (nnet-train-parallel:DoBackpropParallel():nnet-update-parallel.cc:179) > Did backprop on 399889 examples, average log-prob per frame is -7.31817 > > The learning rates are 0.01 initial and 0.001 final. I kind of used the > value from swbd, but maybe my training data is quite bigger than swbd. I > previously tried 0.001 and 0.0001, which also failed due to an error of > "Cannot invert: matrix is singular", but I didn't have debug on back then, > so it's probably the same issue. Maybe I should try even smaller, such as > 0.0001 and 0.00001? > > > Ben > > > > On Mon, Sep 2, 2013 at 6:55 PM, Daniel Povey <dp...@gm...> wrote: >> >> I think the underlying cause is instability in the training, causing >> the derivatives to become too large. This is something that commonly >> happens in neural net training, and the solution is generally to >> decrease the learning rate. What nonlinearity type are you using? >> And do the log-probs printed out in train.*.log or compute_prob_*.log >> get very negative? >> >> Unbounded nonlinearities such as ReLUs are more susceptible to this >> instability. >> Dan >> >> >> On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: >> > I see. Thanks for the fast response, Dan. >> > >> > So any idea on this "random" error I am stuck with at pass 27? I have >> > pasted the stacktrace below. This error doesn't always happen, even >> > after >> > I removed the randomness introduced in the input mdl and shuffled egs. >> > (eg, >> > save the input mdl and shuffled egs to files and re-run the failed >> > nnet-train-parallel from those files in debugger). The re-run would >> > sometimes fail and sometimes succeed. >> > >> > Anyway, I was able catch the error in my debugger and examine the >> > variables. >> > I think the reason is that the deriv variable in >> > NnetUpdater::Backprop() >> > contains some "bad" value, such as 1.50931703e+20. This caused the >> > trace of >> > the matrix to become infinite and in turn cause the p_trace to become 0 >> > and >> > fail the assert. I probably need more time to see how this value got in >> > there, but again, since the exact re-run would pass sometimes, it's kind >> > of >> > hard to debug. >> > >> > Any idea? >> > >> > Here's the stacktrace: >> > =============================== >> > KALDI_ASSERT: at >> > >> > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, >> > failed: p_trace != 0.0 >> > Stack trace is: >> > kaldi::KaldiGetStackTrace() >> > kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) >> > >> > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> >> > const&, double, kaldi::MatrixBase<float>*) >> > >> > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> >> > const&, kaldi::MatrixBase<float> const&) >> > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> const&, >> > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, int, >> > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const >> > >> > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, >> > kaldi::Matrix<float>*) >> > >> > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) >> > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, >> > std::vector<kaldi::nnet2::NnetTrainingExample, >> > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, >> > kaldi::nnet2::Nnet*) >> > kaldi::nnet2::DoBackpropParallelClass::operator()() >> > kaldi::MultiThreadable::run(void*) >> > >> > Ben >> > >> > >> > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> wrote: >> >> >> >> That's how it's supposed to be-- AFAIK that's basically the point of >> >> Hogwild, that you allow these kinds of updates and accept the >> >> possibility that due to race conditions you will occasionally lose a >> >> bit of date. The parameters only change slightly on the timescales >> >> that these different threads access them. >> >> Dan >> >> >> >> >> >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: >> >> > Hi all, >> >> > >> >> > While hunting some random error from nnet-train-parallel, I noticed >> >> > the >> >> > nnet_to_update is shared among the threads, but there is no >> >> > synchronization >> >> > checks when updating the components in the threads. I still haven't >> >> > gone >> >> > too deep in the code yet, but should there be synchronization checks? >> >> > >> >> > For example, the deriv variable in NnetUpdater::Backprop() is updated >> >> > and >> >> > passed between the components. Could this be an issue if the >> >> > components >> >> > are >> >> > being updated by other threads? >> >> > >> >> > >> >> > Or am I missing something totally? >> >> > >> >> > >> >> > -- >> >> > Thanks >> >> > Ben >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, >> >> > more! >> >> > Discover the easy way to master current and previous Microsoft >> >> > technologies >> >> > and advance your career. Get an incredible 1,500+ hours of >> >> > step-by-step >> >> > tutorial videos with LearnDevNow. Subscribe today and save! >> >> > >> >> > >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> >> > _______________________________________________ >> >> > Kaldi-developers mailing list >> >> > Kal...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > >> > >> > >> > >> > >> > -- >> > >> > -- >> > Thanks >> > Ben Jiang >> > >> > Co-Founder/Principal/CTO >> > Nexiwave.com >> > Tel: 226-975-2172 / 617-245-0916 >> > "Confidential & Privileged: This email message is for the sole use of >> > the >> > intended recipient(s) and may contain confidential and privileged >> > information. Any unauthorized review, use, disclosure or distribution is >> > prohibited. if you are not the intended recipient, please contact the >> > sender >> > by reply email and destroy all copies of the original message.” > > > > > -- > > -- > Thanks > Ben Jiang > > Co-Founder/Principal/CTO > Nexiwave.com > Tel: 226-975-2172 / 617-245-0916 > "Confidential & Privileged: This email message is for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. if you are not the intended recipient, please contact the sender > by reply email and destroy all copies of the original message.” |
From: Ben J. <be...@ne...> - 2013-09-02 23:48:12
|
The nonlinearaty type should be the default in train_nnet_cpu.sh, which should be tanh. The log-prob doesn't look too bad. Below is the output from a run that actually succeeded: LOG (nnet-train-parallel:DoBackpropParallel():nnet-update-parallel.cc:179) Did backprop on 399889 examples, average log-prob per frame is -7.31817 The learning rates are 0.01 initial and 0.001 final. I kind of used the value from swbd, but maybe my training data is quite bigger than swbd. I previously tried 0.001 and 0.0001, which also failed due to an error of "Cannot invert: matrix is singular", but I didn't have debug on back then, so it's probably the same issue. Maybe I should try even smaller, such as 0.0001 and 0.00001? Ben On Mon, Sep 2, 2013 at 6:55 PM, Daniel Povey <dp...@gm...> wrote: > I think the underlying cause is instability in the training, causing > the derivatives to become too large. This is something that commonly > happens in neural net training, and the solution is generally to > decrease the learning rate. What nonlinearity type are you using? > And do the log-probs printed out in train.*.log or compute_prob_*.log > get very negative? > > Unbounded nonlinearities such as ReLUs are more susceptible to this > instability. > Dan > > > On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: > > I see. Thanks for the fast response, Dan. > > > > So any idea on this "random" error I am stuck with at pass 27? I have > > pasted the stacktrace below. This error doesn't always happen, even > after > > I removed the randomness introduced in the input mdl and shuffled egs. > (eg, > > save the input mdl and shuffled egs to files and re-run the failed > > nnet-train-parallel from those files in debugger). The re-run would > > sometimes fail and sometimes succeed. > > > > Anyway, I was able catch the error in my debugger and examine the > variables. > > I think the reason is that the deriv variable in NnetUpdater::Backprop() > > contains some "bad" value, such as 1.50931703e+20. This caused the > trace of > > the matrix to become infinite and in turn cause the p_trace to become 0 > and > > fail the assert. I probably need more time to see how this value got in > > there, but again, since the exact re-run would pass sometimes, it's kind > of > > hard to debug. > > > > Any idea? > > > > Here's the stacktrace: > > =============================== > > KALDI_ASSERT: at > > > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, > > failed: p_trace != 0.0 > > Stack trace is: > > kaldi::KaldiGetStackTrace() > > kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) > > > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> > > const&, double, kaldi::MatrixBase<float>*) > > > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> > > const&, kaldi::MatrixBase<float> const&) > > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> const&, > > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, int, > > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const > > > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, > > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > > kaldi::Matrix<float>*) > > > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, > > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) > > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, > > std::vector<kaldi::nnet2::NnetTrainingExample, > > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > > kaldi::nnet2::Nnet*) > > kaldi::nnet2::DoBackpropParallelClass::operator()() > > kaldi::MultiThreadable::run(void*) > > > > Ben > > > > > > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> That's how it's supposed to be-- AFAIK that's basically the point of > >> Hogwild, that you allow these kinds of updates and accept the > >> possibility that due to race conditions you will occasionally lose a > >> bit of date. The parameters only change slightly on the timescales > >> that these different threads access them. > >> Dan > >> > >> > >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: > >> > Hi all, > >> > > >> > While hunting some random error from nnet-train-parallel, I noticed > the > >> > nnet_to_update is shared among the threads, but there is no > >> > synchronization > >> > checks when updating the components in the threads. I still haven't > >> > gone > >> > too deep in the code yet, but should there be synchronization checks? > >> > > >> > For example, the deriv variable in NnetUpdater::Backprop() is updated > >> > and > >> > passed between the components. Could this be an issue if the > components > >> > are > >> > being updated by other threads? > >> > > >> > > >> > Or am I missing something totally? > >> > > >> > > >> > -- > >> > Thanks > >> > Ben > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > >> > Discover the easy way to master current and previous Microsoft > >> > technologies > >> > and advance your career. Get an incredible 1,500+ hours of > step-by-step > >> > tutorial videos with LearnDevNow. Subscribe today and save! > >> > > >> > > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > >> > _______________________________________________ > >> > Kaldi-developers mailing list > >> > Kal...@li... > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> > > > > > > > > > > > -- > > > > -- > > Thanks > > Ben Jiang > > > > Co-Founder/Principal/CTO > > Nexiwave.com > > Tel: 226-975-2172 / 617-245-0916 > > "Confidential & Privileged: This email message is for the sole use of the > > intended recipient(s) and may contain confidential and privileged > > information. Any unauthorized review, use, disclosure or distribution is > > prohibited. if you are not the intended recipient, please contact the > sender > > by reply email and destroy all copies of the original message.” > -- -- Thanks Ben Jiang Co-Founder/Principal/CTO Nexiwave.com Tel: 226-975-2172 / 617-245-0916 "Confidential & Privileged: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. if you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.” |
From: Daniel P. <dp...@gm...> - 2013-09-02 22:55:13
|
I think the underlying cause is instability in the training, causing the derivatives to become too large. This is something that commonly happens in neural net training, and the solution is generally to decrease the learning rate. What nonlinearity type are you using? And do the log-probs printed out in train.*.log or compute_prob_*.log get very negative? Unbounded nonlinearities such as ReLUs are more susceptible to this instability. Dan On Mon, Sep 2, 2013 at 6:50 PM, Ben Jiang <be...@ne...> wrote: > I see. Thanks for the fast response, Dan. > > So any idea on this "random" error I am stuck with at pass 27? I have > pasted the stacktrace below. This error doesn't always happen, even after > I removed the randomness introduced in the input mdl and shuffled egs. (eg, > save the input mdl and shuffled egs to files and re-run the failed > nnet-train-parallel from those files in debugger). The re-run would > sometimes fail and sometimes succeed. > > Anyway, I was able catch the error in my debugger and examine the variables. > I think the reason is that the deriv variable in NnetUpdater::Backprop() > contains some "bad" value, such as 1.50931703e+20. This caused the trace of > the matrix to become infinite and in turn cause the p_trace to become 0 and > fail the assert. I probably need more time to see how this value got in > there, but again, since the exact re-run would pass sometimes, it's kind of > hard to debug. > > Any idea? > > Here's the stacktrace: > =============================== > KALDI_ASSERT: at > nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, > failed: p_trace != 0.0 > Stack trace is: > kaldi::KaldiGetStackTrace() > kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) > kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> > const&, double, kaldi::MatrixBase<float>*) > kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> > const&, kaldi::MatrixBase<float> const&) > kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> const&, > kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, int, > kaldi::nnet2::Component*, kaldi::Matrix<float>*) const > kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > kaldi::Matrix<float>*) > kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) > kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, > std::vector<kaldi::nnet2::NnetTrainingExample, > std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, > kaldi::nnet2::Nnet*) > kaldi::nnet2::DoBackpropParallelClass::operator()() > kaldi::MultiThreadable::run(void*) > > Ben > > > On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> wrote: >> >> That's how it's supposed to be-- AFAIK that's basically the point of >> Hogwild, that you allow these kinds of updates and accept the >> possibility that due to race conditions you will occasionally lose a >> bit of date. The parameters only change slightly on the timescales >> that these different threads access them. >> Dan >> >> >> On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: >> > Hi all, >> > >> > While hunting some random error from nnet-train-parallel, I noticed the >> > nnet_to_update is shared among the threads, but there is no >> > synchronization >> > checks when updating the components in the threads. I still haven't >> > gone >> > too deep in the code yet, but should there be synchronization checks? >> > >> > For example, the deriv variable in NnetUpdater::Backprop() is updated >> > and >> > passed between the components. Could this be an issue if the components >> > are >> > being updated by other threads? >> > >> > >> > Or am I missing something totally? >> > >> > >> > -- >> > Thanks >> > Ben >> > >> > >> > ------------------------------------------------------------------------------ >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> > Discover the easy way to master current and previous Microsoft >> > technologies >> > and advance your career. Get an incredible 1,500+ hours of step-by-step >> > tutorial videos with LearnDevNow. Subscribe today and save! >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-developers mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > > > > -- > > -- > Thanks > Ben Jiang > > Co-Founder/Principal/CTO > Nexiwave.com > Tel: 226-975-2172 / 617-245-0916 > "Confidential & Privileged: This email message is for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. if you are not the intended recipient, please contact the sender > by reply email and destroy all copies of the original message.” |
From: Ben J. <be...@ne...> - 2013-09-02 22:51:01
|
I see. Thanks for the fast response, Dan. So any idea on this "random" error I am stuck with at pass 27? I have pasted the stacktrace below. This error doesn't always happen, even after I removed the randomness introduced in the input mdl and shuffled egs. (eg, save the input mdl and shuffled egs to files and re-run the failed nnet-train-parallel from those files in debugger). The re-run would sometimes fail and sometimes succeed. Anyway, I was able catch the error in my debugger and examine the variables. I think the reason is that the deriv variable in NnetUpdater::Backprop() contains some "bad" value, such as 1.50931703e+20. This caused the trace of the matrix to become infinite and in turn cause the p_trace to become 0 and fail the assert. I probably need more time to see how this value got in there, but again, since the exact re-run would pass sometimes, it's kind of hard to debug. Any idea? Here's the stacktrace: =============================== KALDI_ASSERT: at nnet-train-parallel:PreconditionDirectionsAlphaRescaled:nnet-precondition.cc:128, failed: p_trace != 0.0 Stack trace is: kaldi::KaldiGetStackTrace() kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) kaldi::nnet2::PreconditionDirectionsAlphaRescaled(kaldi::MatrixBase<float> const&, double, kaldi::MatrixBase<float>*) kaldi::nnet2::AffineComponentPreconditioned::Update(kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&) kaldi::nnet2::AffineComponent::Backprop(kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, kaldi::MatrixBase<float> const&, int, kaldi::nnet2::Component*, kaldi::Matrix<float>*) const kaldi::nnet2::NnetUpdater::Backprop(std::vector<kaldi::nnet2::NnetTrainingExample, std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, kaldi::Matrix<float>*) kaldi::nnet2::NnetUpdater::ComputeForMinibatch(std::vector<kaldi::nnet2::NnetTrainingExample, std::allocator<kaldi::nnet2::NnetTrainingExample> > const&) kaldi::nnet2::DoBackprop(kaldi::nnet2::Nnet const&, std::vector<kaldi::nnet2::NnetTrainingExample, std::allocator<kaldi::nnet2::NnetTrainingExample> > const&, kaldi::nnet2::Nnet*) kaldi::nnet2::DoBackpropParallelClass::operator()() kaldi::MultiThreadable::run(void*) Ben On Mon, Sep 2, 2013 at 6:25 PM, Daniel Povey <dp...@gm...> wrote: > That's how it's supposed to be-- AFAIK that's basically the point of > Hogwild, that you allow these kinds of updates and accept the > possibility that due to race conditions you will occasionally lose a > bit of date. The parameters only change slightly on the timescales > that these different threads access them. > Dan > > > On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: > > Hi all, > > > > While hunting some random error from nnet-train-parallel, I noticed the > > nnet_to_update is shared among the threads, but there is no > synchronization > > checks when updating the components in the threads. I still haven't gone > > too deep in the code yet, but should there be synchronization checks? > > > > For example, the deriv variable in NnetUpdater::Backprop() is updated and > > passed between the components. Could this be an issue if the components > are > > being updated by other threads? > > > > > > Or am I missing something totally? > > > > > > -- > > Thanks > > Ben > > > > > ------------------------------------------------------------------------------ > > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > > Discover the easy way to master current and previous Microsoft > technologies > > and advance your career. Get an incredible 1,500+ hours of step-by-step > > tutorial videos with LearnDevNow. Subscribe today and save! > > > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > -- -- Thanks Ben Jiang Co-Founder/Principal/CTO Nexiwave.com Tel: 226-975-2172 / 617-245-0916 "Confidential & Privileged: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. if you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.” |
From: Daniel P. <dp...@gm...> - 2013-09-02 22:25:45
|
That's how it's supposed to be-- AFAIK that's basically the point of Hogwild, that you allow these kinds of updates and accept the possibility that due to race conditions you will occasionally lose a bit of date. The parameters only change slightly on the timescales that these different threads access them. Dan On Mon, Sep 2, 2013 at 6:01 PM, Ben Jiang <be...@ne...> wrote: > Hi all, > > While hunting some random error from nnet-train-parallel, I noticed the > nnet_to_update is shared among the threads, but there is no synchronization > checks when updating the components in the threads. I still haven't gone > too deep in the code yet, but should there be synchronization checks? > > For example, the deriv variable in NnetUpdater::Backprop() is updated and > passed between the components. Could this be an issue if the components are > being updated by other threads? > > > Or am I missing something totally? > > > -- > Thanks > Ben > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Ben J. <be...@ne...> - 2013-09-02 22:01:28
|
Hi all, While hunting some random error from nnet-train-parallel, I noticed the nnet_to_update is shared among the threads, but there is no synchronization checks when updating the components in the threads. I still haven't gone too deep in the code yet, but should there be synchronization checks? For example, the deriv variable in NnetUpdater::Backprop() is updated and passed between the components. Could this be an issue if the components are being updated by other threads? Or am I missing something totally? -- Thanks Ben |
From: Daniel P. <dp...@gm...> - 2013-08-28 16:01:46
|
That does sound like a memory leak-- try running it for just a few utterances with valgrind. It's possible the memory leak happens only under certain rare circumstances, so if you don't see it there you may have to run for a thousand utterances or so and see if valgrind reports a leak. Dan On Wed, Aug 28, 2013 at 3:48 PM, Li Peng <lip...@gm...> wrote: > Sorry for that I don't know the details of how the program works, so I'm not > sure how to tune the options properly. > > The memory consumption grows little by little as the number of processed > utterances increases. When the program runs for one day, it grows from the > starting 100MB to several GB. > > 在 2013年8月28日 下午9:33,"Arnab Ghoshal" <ar...@gm...>写道: > >> It is possible for the lattice generation to take a lot of memory. >> Have you tried changing the --max-mem, --lattice-beam, --beam options >> to see if it runs properly? Also, does this happen for a particular >> utterance or all utterances? >> -Arnab >> >> On Wed, Aug 28, 2013 at 7:22 AM, Li Peng <lip...@gm...> wrote: >> > Hi, >> > >> > When I used latgen-faster-mapped with DNN model to generate lattices, I >> > observed that the memory usage keeps growing and exhausts the system's >> > memory at last. I tried to use valgrind to find if there's a memory >> > leak, >> > but got no clues. So I write to report this problem, but I'm not sure if >> > there is a bug or it is just my own case. >> > >> > Best regards, >> > >> > Li Peng >> > >> > >> > ------------------------------------------------------------------------------ >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> > Discover the easy way to master current and previous Microsoft >> > technologies >> > and advance your career. Get an incredible 1,500+ hours of step-by-step >> > tutorial videos with LearnDevNow. Subscribe today and save! >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-developers mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |