Re: [Kaldi-users] LibriSpeech nnet2 model: training more on a new dataset

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

To make sure it all makes sense. Bitrate is not property of an *audio* stream, it is rather property of an encoded, *network* stream. This is about how much bandwidth one needs to transmit audio, and/or how much space it will take at rest. You should not worry about the bitrate at all.

Audio is determined by its sample rate, sample bit width and sample format. The latter is loosely a way of interpreting the N bits of the sample: is it a signed or unsigned int? A float maybe? The bit rate is something one worries about when they transmit (the compressed) audio over the network. The same 16-bit, 44.1 kHz audio track may be compressed into a 128 kbps MP3 stream, or a 64 kbps stream, at an expense of decoding quality. For an uncompressed PCM audio the "bit rate" is simply the sample rate multiplied by the number of bits in a sample; it is a derived quantity that you are unlikely to need to specify.

soxi is arguably the easiest tool to show audio format data:

$ soxi /data/LibriSpeech/train-clean-100/103/1241/103-1241-0001.flac
Input File     : '103-1241-0001.flac'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:15.55 = 248800 samples ~ 1166.25 CDDA sectors
File Size      : 272k
Bit Rate       : 140k
Sample Encoding: 16-bit FLAC

What you really want to know is the sample rate (16000) and sample bit width (16). You hardly care about the bit rate (140k) at all. This is only an indication how tightly the file is compressed: 16000*16=256000 bits/s are packaged by flac into 140000 bit/s for storage and transmission, ~2× compression, so what? We process only uncompressed audio data anyway. MP3 compresses much more tightly, but, unlike flac, it is lossy.

$ soxi ~/08-01.mp3
Input File     : '/home/kkm/08-01.mp3'
Channels       : 2
Sample Rate    : 44100
Precision      : 16-bit
Duration       : 00:08:13.73 = 21773273 samples = 37029.4 CDDA sectors
File Size      : 19.8M
Bit Rate       : 320k
Sample Encoding: MPEG audio (layer I, II or III)

 -kkm

> -----Original Message-----
> From: Jonathan L [mailto:jon...@gm...]
> Sent: 2015-07-03 1045
> To: Vijayaditya Peddinti
> Cc: soroush mehri; kal...@li...
> Subject: Re: [Kaldi-users] LibriSpeech nnet2 model: training more on a
> new dataset
> 
> The data I want to train on is in MP3 format at a 128kbps bitrate and a
> 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but
> doesn't seem to have a specified bitrate, When I convert the MP3 files
> into 16kHz sample-rate WAV files, what bitrate should I convert them
> to?
> 
> Is there anything else I should consider when converting the speech
> files?
> 
> On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti
> <p.v...@gm...> wrote:
> 
> 
> 	You need to provide the egs directory, not exp directory. You can
> check stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how
> egs directory can be created from the alignment and data directories.
> 	The context variables necessary for creating these examples can
> be found in nnet_ms_a_online/conf/splice.conf file.
> 
> 	Vijay
> 
> 	On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L
> <jon...@gm...> wrote:
> 
> 
> 		The train_more*.sh scripts accept an 'exp' directory
> instead of a 'data/train' directory. Is there another script that would
> accept the 'data/train' directory as input instead?
> 
> 		On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti
> <p.v...@gm...> wrote:
> 
> 
> 			See the scripts steps/nnet2/train_more*.sh
> 
> 			Vijay
> 
> 			On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L
> <jon...@gm...> wrote:
> 
> 
> 
> 				I'm looking to further train an existing
> LibriSpeech nnet2_a_online model on a new dataset.
> 
> 				I have prepared the files for this new dataset
> inside a data/train directory, as described in the Data Preparation
> tutorial. I want to keep the nnet2_a_online model initialized to the
> parameters it learned from training on LibriSpeech, but continue its
> training on this new dataset. Is there a script that would allow me to
> specify the nnet2_a_online model and the dataset's data/train directory
> as input in order to output a model that has been trained more on this
> new dataset?
> 
> 
> -----------------------------------------------------------------------
> -
> ------
> 				Monitor 25 network devices or servers for free
> with OpManager!
> 				OpManager is web-based network
> management software that monitors
> 				network devices and physical & virtual servers,
> alerts via email & sms
> 				for fault. Monitor 25 devices for free with no
> restriction. Download now
> 
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> 
> _______________________________________________
> 				Kaldi-users mailing list
> 				Kal...@li...
> 
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
> 
> 
> 
> 
> 
>