kaldi-users Mailing List for Kaldi (Page 43)

Brought to you by: bouliagi, danielpovey, jtrmal, ngoel17, and 2 others

This project can now be found here.

kaldi-users — Kaldi Users

You can subscribe to this list here.

2011	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul (2)	_Aug (2)	_Sep (1)	_Oct (1)	_Nov	_Dec
2012	_Jan	_Feb	_Mar (8)	_Apr (4)	_May (2)	_Jun (1)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2013	_Jan	_Feb (2)	_Mar (2)	_Apr (7)	_May (31)	_Jun (40)	_Jul (65)	_Aug (37)	_Sep (12)	_Oct (57)	_Nov (15)	_Dec (35)
2014	_Jan (3)	_Feb (30)	_Mar (57)	_Apr (26)	_May (49)	_Jun (26)	_Jul (63)	_Aug (33)	_Sep (20)	_Oct (153)	_Nov (62)	_Dec (20)
2015	_Jan (6)	_Feb (21)	_Mar (42)	_Apr (33)	_May (76)	_Jun (102)	_Jul (39)	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 41 42 43 44 45 .. 48 > >> (Page 43 of 48)

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-18 11:27:03

Hi all,

I have added a script "install_sctk_patched.sh" in tools/extras for smooth
sctk-2.4.0 installation under Cygwin. Corresponding tools/Makefile has
changed to reflect it.

Ricky

On Tue, Jul 9, 2013 at 12:21 AM, Daniel Povey <dp...@gm...> wrote:

> Thanks, everyone!
> Dan
>
>
> On Mon, Jul 8, 2013 at 4:59 AM, ondrej platek <ond...@se...>
> wrote:
> > I just check the results for my modified Voxforge-like recipe.
> > Everything worked, training, decoding, evaluation.
> >
> > My configuration: Ubuntu 10.04, using OpenBlas and shared flag:
> > ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install
> > --fst-root=`pwd`/../tools/openfst --shared
> >
> > Ondra
> >
> >
> > On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>
> > wrote:
> >>
> >> Simulated mode on online decoding demo run fine on CentOS too.
> >>
> >> Ricky
> >>
> >> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov
> >> <vas...@gm...> wrote:
> >>>
> >>> The compilation(including "make ext") is working OK for me too on
> Ubuntu
> >>> 10.04.
> >>> Only tried to run the online decoders(voxforge/online_demo) so far -
> >>> everything seems to be fine with them.
> >>>
> >>> Vassil
> >>>
> >>>
> >>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
> >>> > Everyone,
> >>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
> >>> > Platek and others have been working on different build scripts that
> >>> > now support a shared-library option.  If anyone can test it and make
> >>> > sure it still works for them it would be great.
> >>> > If people have made local changes to their Makefiles they may get
> >>> > conflicts.
> >>> > Dan
> >>
> >>
> >
>

Re: [Kaldi-users] questions.txt vs topo

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:52:02

Thanks.  

Nathan


On Jul 16, 2013, at 12:39 PM, Daniel Povey wrote:

> The issue seems to be script incompatiblity: the yesno example is
> based on the older "s3" scripts.  The "s5" ones are recommended and
> aren't compatible with the older ones.
> 
> You could probably adapt the Switchboard setup, although you'd have to
> mess with the data preparation scripts a bit and figure out how to
> build the language model and the dictionary.  The s5 scripts are all
> basically the same-- only the data preparation and things like the
> number of Gaussians differs.
> Dan
> 
> 
> 
> On Tue, Jul 16, 2013 at 3:37 PM, Nathan Dunn <nd...@ca...> wrote:
>> 
>> I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno.   The more I read through this, the more I'm thinking that I need to rewrite it.
>> 
>> Would you suggest basing it on switchboard?
>> 
>> I have Switchboard-2 Phase II  LDC97S62 versus LDC97S62 Switchboard-1 Release II.    I'm assuming that it could be adapted without too much effort?
>> 
>> Nathan
>> 
>> 
>> On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:
>> 
>>> Are you using an older script?  When I look at the current scripts
>>> (s5/), I see things like this:
>>> 
>>> if [ $stage -le -3 ] && $train_tree; then
>>> echo "$0: Getting questions for tree clustering."
>>> # preparing questions, roots file...
>>> cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
>>> $dir/questions.int 2> $dir/log/questions.log || exit 1;
>>> cat $lang/phones/extra_questions.int >> $dir/questions.int
>>> compile-questions $context_opts $lang/topo $dir/questions.int
>>> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
>>> ...
>>> 
>>> Where did you get this setup?
>>> Dan
>>> 
>>> 
>>> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>>>> 
>>>> I'm having some issues  compiling questions (# error below):
>>>> 
>>>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>>>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>>>> ## this next line goes boom
>>>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>>>> 
>>>> So the issue is that the topo only has the first 323 symbols.  The differences it the disambiguation symbols (#0 . .. . #27).   I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>>>> 
>>>> I can of course remove the disambiguation symbols:
>>>> 
>>>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>> 
>>>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>>>> 
>>>> Thanks,
>>>> 
>>>> Nathan
>>>> 
>>>> 
>>>> === ERROR . . . compile_questions.log ===
>>>> 
>>>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>>>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>>>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>>>> 
>>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>> 
>>>> [stack trace: ]
>>>> 0   compile-questions                   0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>>>> 1   compile-questions                   0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>>>> 2   compile-questions                   0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>>>> 3   compile-questions                   0x000000010966db79 main + 4409
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Nathan
>>>> 
>>

Re: [Kaldi-users] questions.txt vs topo

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:39:46

The issue seems to be script incompatiblity: the yesno example is
based on the older "s3" scripts.  The "s5" ones are recommended and
aren't compatible with the older ones.

You could probably adapt the Switchboard setup, although you'd have to
mess with the data preparation scripts a bit and figure out how to
build the language model and the dictionary.  The s5 scripts are all
basically the same-- only the data preparation and things like the
number of Gaussians differs.
Dan



On Tue, Jul 16, 2013 at 3:37 PM, Nathan Dunn <nd...@ca...> wrote:
>
> I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno.   The more I read through this, the more I'm thinking that I need to rewrite it.
>
> Would you suggest basing it on switchboard?
>
> I have Switchboard-2 Phase II  LDC97S62 versus LDC97S62 Switchboard-1 Release II.    I'm assuming that it could be adapted without too much effort?
>
> Nathan
>
>
> On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:
>
>> Are you using an older script?  When I look at the current scripts
>> (s5/), I see things like this:
>>
>> if [ $stage -le -3 ] && $train_tree; then
>>  echo "$0: Getting questions for tree clustering."
>>  # preparing questions, roots file...
>>  cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
>> $dir/questions.int 2> $dir/log/questions.log || exit 1;
>>  cat $lang/phones/extra_questions.int >> $dir/questions.int
>>  compile-questions $context_opts $lang/topo $dir/questions.int
>> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
>> ...
>>
>> Where did you get this setup?
>> Dan
>>
>>
>> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>>>
>>> I'm having some issues  compiling questions (# error below):
>>>
>>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>>> ## this next line goes boom
>>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>>>
>>> So the issue is that the topo only has the first 323 symbols.  The differences it the disambiguation symbols (#0 . .. . #27).   I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>>>
>>> I can of course remove the disambiguation symbols:
>>>
>>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>
>>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>>>
>>> Thanks,
>>>
>>> Nathan
>>>
>>>
>>> === ERROR . . . compile_questions.log ===
>>>
>>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>>>
>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>
>>> [stack trace: ]
>>> 0   compile-questions                   0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>>> 1   compile-questions                   0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>>> 2   compile-questions                   0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>>> 3   compile-questions                   0x000000010966db79 main + 4409
>>>
>>>
>>>
>>>
>>> Nathan
>>>
>

Re: [Kaldi-users] questions.txt vs topo

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:37:26

I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno.   The more I read through this, the more I'm thinking that I need to rewrite it.  

Would you suggest basing it on switchboard? 

I have Switchboard-2 Phase II  LDC97S62 versus LDC97S62 Switchboard-1 Release II.    I'm assuming that it could be adapted without too much effort? 

Nathan


On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:

> Are you using an older script?  When I look at the current scripts
> (s5/), I see things like this:
> 
> if [ $stage -le -3 ] && $train_tree; then
>  echo "$0: Getting questions for tree clustering."
>  # preparing questions, roots file...
>  cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
> $dir/questions.int 2> $dir/log/questions.log || exit 1;
>  cat $lang/phones/extra_questions.int >> $dir/questions.int
>  compile-questions $context_opts $lang/topo $dir/questions.int
> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
> ...
> 
> Where did you get this setup?
> Dan
> 
> 
> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>> 
>> I'm having some issues  compiling questions (# error below):
>> 
>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>> ## this next line goes boom
>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>> 
>> So the issue is that the topo only has the first 323 symbols.  The differences it the disambiguation symbols (#0 . .. . #27).   I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>> 
>> I can of course remove the disambiguation symbols:
>> 
>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>> 
>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>> 
>> Thanks,
>> 
>> Nathan
>> 
>> 
>> === ERROR . . . compile_questions.log ===
>> 
>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>> 
>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>> 
>> [stack trace: ]
>> 0   compile-questions                   0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>> 1   compile-questions                   0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>> 2   compile-questions                   0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>> 3   compile-questions                   0x000000010966db79 main + 4409
>> 
>> 
>> 
>> 
>> Nathan
>>

Re: [Kaldi-users] questions.txt vs topo

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:24:25

Are you using an older script?  When I look at the current scripts
(s5/), I see things like this:

if [ $stage -le -3 ] && $train_tree; then
  echo "$0: Getting questions for tree clustering."
  # preparing questions, roots file...
  cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
$dir/questions.int 2> $dir/log/questions.log || exit 1;
  cat $lang/phones/extra_questions.int >> $dir/questions.int
  compile-questions $context_opts $lang/topo $dir/questions.int
$dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
...

Where did you get this setup?
Dan


On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>
> I'm having some issues  compiling questions (# error below):
>
> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
> ## this next line goes boom
> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>
> So the issue is that the topo only has the first 323 symbols.  The differences it the disambiguation symbols (#0 . .. . #27).   I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>
> I can of course remove the disambiguation symbols:
>
> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>
> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>
> Thanks,
>
> Nathan
>
>
> === ERROR . . . compile_questions.log ===
>
> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>  vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>
> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>
> [stack trace: ]
> 0   compile-questions                   0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
> 1   compile-questions                   0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
> 2   compile-questions                   0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
> 3   compile-questions                   0x000000010966db79 main + 4409
>
>
>
>
> Nathan
>

[Kaldi-users] questions.txt vs topo

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:21:16

I'm having some issues  compiling questions (# error below):  

cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
## this next line goes boom
compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;

So the issue is that the topo only has the first 323 symbols.  The differences it the disambiguation symbols (#0 . .. . #27).   I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.  

I can of course remove the disambiguation symbols:

cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list

but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.  

Thanks,

Nathan


=== ERROR . . . compile_questions.log ===

compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst 
WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
 vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]

ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.

[stack trace: ]
0   compile-questions                   0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
1   compile-questions                   0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
2   compile-questions                   0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
3   compile-questions                   0x000000010966db79 main + 4409




Nathan

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 23:25:32

Attachments: first_sequence.txt

Alright, I updated the output, which looks closer to what I want, but I'm a little unclear how to pull stuff out of this:

lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:-  | lattice-to-phone-lattice exp/tri2a/final.mdl ark:- ark,t:- | utils/int2sym.pl -f 3 g300_lang/phones.txt

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 23:18:50

Something is definitely wrong there.  You shouldn't see something with
an _E suffix right at the start like that, if it's the only phone in a
word it should have the singleton _S suffix, or if it doesn't have a
word symbol it should have no suffix at all.  I suspect you may have
built the system with a different phone set, or the word-boundary info
is very wrong.
Dan


On Thu, Jul 11, 2013 at 7:03 PM, Nathan Dunn <nd...@ca...> wrote:
>
> Alright, I updated the output, which looks closer to what I want, but I'm a little unclear how to pull stuff out of this:
>
> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:-  | lattice-to-phone-lattice exp/tri2a/final.mdl ark:- ark,t:- | utils/int2sym.pl -f 3 g300_lang/phones.txt
>
>
>
>
> The first few lines look like this where "02.cut1-1" is the name of the transcript:
>
> 02.cut1-1
> 0 1 SEE_TRANSCRIPT_E 14.9888,31091.3,2960_2962_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961
> 1 2 END_CROSSTALK_NOISE_E 0,0,656_655_655_655_655_655_655_655_655_655_655_655_655_655_706_705_705_705
> 2 3 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 3 4 END_MICROPHONE_NOISE_I 3.56562,5210.82,854_853_872
> 4 5 YAWN_B 0,0,114_113_113_113_178_177_177_177_177_177_177_177_177_177
> 5 6 END_YAWN_B 0,0,2008_2007_2007_2074_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073
> 6 7 SEE_TRANSCRIPT_E 11.9189,5022.05,2960_2959_2959_2962_2961_2961
> 7 8 END_NOISE_B 0,0,952_951_951_951_951_996_995_995_995_995_995_995_995
> 8 9 END_YAWN_B 0,0,1958_1957_1957_2036_2035_2035_2035
> 9 10 END_HUMAN_NOISE 0,0,1540_1539_1539_1539_1539_1539_1539_1539_1594_1593_1593
> 10 11 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 11 12 END_MICROPHONE_NOISE_I 7.45918,2101.25,854_872
>
>
> Nathan
>
> On Jul 10, 2013, at 10:10 PM, Daniel Povey wrote:
>
>> It's possible that your word_boundary.txt is OK.
>> You could try to get the one best from the lattice using lattice-1best
>> (I think), get the phone sequence from the 1-best lattice using
>> lat-to-phones (I think), doing output in text form using ark,t:- and
>> then get the text form of the phone-level lattice using
>> utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
>> see if the sequence of phonemes looks reasonable for the word sequence
>> you have.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> I think that was part of it.   I fixed one problem with the oov.txt / oov.int
>>>
>>> I'll try to recompile that bug fix and see if that works.   Its possible that I'm creating word_boundaries incorrectly.  How many entries would you expect to get (I am getting 315).   I wonder if I am using word_boundaries for the wrong set of phones . .
>>>
>>> Checking g300_lang/phones.txt ...
>>> --> g300_lang/phones.txt is OK
>>>
>>> Checking words.txt: #0 ...
>>> --> g300_lang/words.txt has "#0"
>>> --> g300_lang/words.txt is OK
>>>
>>> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/disambig.{txt, int, csl} ...
>>> --> 28 entry/entries in g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
>>> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/silence.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
>>> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/extra_questions.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>>>
>>> Checking g300_lang/phones/roots.{txt, int} ...
>>> --> 75 entry/entries in g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.{txt, int} are OK
>>>
>>> Checking g300_lang/phones/sets.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/sets.int
>>>
>>> Checking g300_lang/phones/word_boundary.{txt, int} ...
>>> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.{txt, int} are OK
>>>
>>> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
>>> --> silence.txt and nonsilence.txt are disjoint
>>> --> silence.txt and disambig.txt are disjoint
>>> --> disambig.txt and nonsilence.txt are disjoint
>>> --> disjoint property is OK
>>>
>>> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> summation property is OK
>>>
>>> Checking optional_silence.txt ...
>>> --> reading g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.txt is OK
>>>
>>> Checking disambiguation symbols: #0 and #1
>>> --> g300_lang/phones/disambig.txt has "#0" and "#1"
>>> --> g300_lang/phones/disambig.txt is OK
>>>
>>> Checking topo ...
>>> --> g300_lang/topo's nonsilence section is OK
>>> --> g300_lang/topo's silence section is OK
>>> --> g300_lang/topo is OK
>>>
>>> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
>>> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
>>> --> g300_lang/phones/word_boundary.txt is OK
>>> --> checking L.fst and L_disambig.fst...
>>> --> generating a 46 words sequence
>>> --> resulting phone sequence from L.fst corresponds to the word sequence
>>> --> L.fst is OK
>>> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
>>> --> L_disambig.fst is OK
>>>
>>> Checking g300_lang/oov.{txt, int} ...
>>> --> 1 entry/entries in g300_lang/oov.txt
>>> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
>>> --> g300_lang/oov.{txt, int} are OK
>>>
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>>>
>>>> OK-- so the word-alignment seems to have failed.  Generally that is
>>>> because of invalid word-boundary information.  That file is indexed by
>>>> phones, not words.  Issues can include a mismatch in phone set; words
>>>> that don't have any phones in them; or phones that have only one state
>>>> in their topology (this is a bug that was recently fixed, those should
>>>> work now if you update and recompile).
>>>> That program should not generally output any warnings, if all is OK.
>>>> Try to use the program utils/validate_lang.pl to make sure your
>>>> g300_lang/ directory is OK.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>>>
>>>>> Sorry, and it ends with this:
>>>>>
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut2
>>>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>>>> path, 0 had errors.
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut3
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>>>> aligned 0 lattices; 132 had errors.
>>>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>>>> to ctm format; 0 had errors.
>>>>> ndunn:childspeech%
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>>>
>>>>>
>>>>> The std err output is this:
>>>>>
>>>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>>>> exp/tri2a/ctm2/output.txt
>>>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>>>> ark:-
>>>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>>>> ark:- ark:-
>>>>> nbest-to-ctm ark:- -
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>>
>>>>>
>>>>> Nathan Dunn, Ph.D.
>>>>> Scientific Programer
>>>>> College of Arts and Science IT
>>>>> 541-221-2418
>>>>> nd...@ca...
>>>>>
>>>>>
>>>>>
>>>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>>>
>>>>> Can you provide the logging output, at least some representative lines
>>>>> from it.  Are there any warnings?
>>>>> Dan
>>>>>
>>>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>>>> Communication and Updates <kal...@li...> wrote:
>>>>>
>>>>>
>>>>> I'm trying to get word timing information out of a successfully trained
>>>>> language model that I've already been able to successfully decode with
>>>>> following these instructions.
>>>>>
>>>>>
>>>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>>>
>>>>>
>>>>> This is command I've run:
>>>>>
>>>>>
>>>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>>>
>>>>>
>>>>>
>>>>> The problem is that I only have one entry per transcript (these transcripts
>>>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>>>> input.    the
>>>>>
>>>>>
>>>>> 02.cut1 1 0.00 67.11 I
>>>>>
>>>>> 02.cut2 1 0.00 62.44 HIS
>>>>>
>>>>> 02.cut3 1 0.00 65.76 MOUNT
>>>>>
>>>>> 03.cut1 1 0.00 62.62 I
>>>>>
>>>>> 03.cut2 1 0.00 62.41 WHO
>>>>>
>>>>> 03.cut3 1 0.00 63.72 I
>>>>>
>>>>> 06.cut1 1 0.00 62.13 STANDING
>>>>>
>>>>> 06.cut2 1 0.00 57.95 A
>>>>>
>>>>> 06.cut3 1 0.00 66.78 I
>>>>>
>>>>> . . .
>>>>>
>>>>> What I want is the things for each word:
>>>>>
>>>>> 02.cut1 1 0.00 43.7 YOU
>>>>>
>>>>> 02.cut1 1 81.2 121.3 ARE
>>>>>
>>>>> 02.cut1 1 145.4 163.8 STANDING
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>>>
>>>>> 1 nonword
>>>>>
>>>>> 2 begin
>>>>>
>>>>> 3 end
>>>>>
>>>>> 4 internal
>>>>>
>>>>> 5 singleton
>>>>>
>>>>> 6 nonword
>>>>>
>>>>> 7 begin
>>>>>
>>>>> 8 end
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>>
>>>>> Any help is much appreciated.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> See everything from the browser to the database with AppDynamics
>>>>>
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>
>>>>> Start your free trial of AppDynamics Pro today!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> Kaldi-users mailing list
>>>>>
>>>>> Kal...@li...
>>>>>
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>>
>>>>>
>>>>>
>>>
>
>

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 05:11:07

It's possible that your word_boundary.txt is OK.
You could try to get the one best from the lattice using lattice-1best
(I think), get the phone sequence from the 1-best lattice using
lat-to-phones (I think), doing output in text form using ark,t:- and
then get the text form of the phone-level lattice using
utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
see if the sequence of phonemes looks reasonable for the word sequence
you have.

Dan


On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>
> I think that was part of it.   I fixed one problem with the oov.txt / oov.int
>
> I'll try to recompile that bug fix and see if that works.   Its possible that I'm creating word_boundaries incorrectly.  How many entries would you expect to get (I am getting 315).   I wonder if I am using word_boundaries for the wrong set of phones . .
>
> Checking g300_lang/phones.txt ...
> --> g300_lang/phones.txt is OK
>
> Checking words.txt: #0 ...
> --> g300_lang/words.txt has "#0"
> --> g300_lang/words.txt is OK
>
> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>
> Checking g300_lang/phones/disambig.{txt, int, csl} ...
> --> 28 entry/entries in g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>
> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/silence.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/extra_questions.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>
> Checking g300_lang/phones/roots.{txt, int} ...
> --> 75 entry/entries in g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.{txt, int} are OK
>
> Checking g300_lang/phones/sets.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/sets.int
>
> Checking g300_lang/phones/word_boundary.{txt, int} ...
> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.{txt, int} are OK
>
> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
> --> silence.txt and nonsilence.txt are disjoint
> --> silence.txt and disambig.txt are disjoint
> --> disambig.txt and nonsilence.txt are disjoint
> --> disjoint property is OK
>
> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
> --> summation property is OK
>
> Checking optional_silence.txt ...
> --> reading g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.txt is OK
>
> Checking disambiguation symbols: #0 and #1
> --> g300_lang/phones/disambig.txt has "#0" and "#1"
> --> g300_lang/phones/disambig.txt is OK
>
> Checking topo ...
> --> g300_lang/topo's nonsilence section is OK
> --> g300_lang/topo's silence section is OK
> --> g300_lang/topo is OK
>
> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
> --> g300_lang/phones/word_boundary.txt is OK
> --> checking L.fst and L_disambig.fst...
> --> generating a 46 words sequence
> --> resulting phone sequence from L.fst corresponds to the word sequence
> --> L.fst is OK
> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
> --> L_disambig.fst is OK
>
> Checking g300_lang/oov.{txt, int} ...
> --> 1 entry/entries in g300_lang/oov.txt
> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
> --> g300_lang/oov.{txt, int} are OK
>
>
>
> Nathan
>
> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>
>> OK-- so the word-alignment seems to have failed.  Generally that is
>> because of invalid word-boundary information.  That file is indexed by
>> phones, not words.  Issues can include a mismatch in phone set; words
>> that don't have any phones in them; or phones that have only one state
>> in their topology (this is a bug that was recently fixed, those should
>> work now if you update and recompile).
>> That program should not generally output any warnings, if all is OK.
>> Try to use the program utils/validate_lang.pl to make sure your
>> g300_lang/ directory is OK.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> Sorry, and it ends with this:
>>>
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut2
>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>> path, 0 had errors.
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut3
>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>> aligned 0 lattices; 132 had errors.
>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>> to ctm format; 0 had errors.
>>> ndunn:childspeech%
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>
>>>
>>> The std err output is this:
>>>
>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>> exp/tri2a/ctm2/output.txt
>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>> ark:-
>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>> ark:- ark:-
>>> nbest-to-ctm ark:- -
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>
>>>
>>> Nathan Dunn, Ph.D.
>>> Scientific Programer
>>> College of Arts and Science IT
>>> 541-221-2418
>>> nd...@ca...
>>>
>>>
>>>
>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>
>>> Can you provide the logging output, at least some representative lines
>>> from it.  Are there any warnings?
>>> Dan
>>>
>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>> Communication and Updates <kal...@li...> wrote:
>>>
>>>
>>> I'm trying to get word timing information out of a successfully trained
>>> language model that I've already been able to successfully decode with
>>> following these instructions.
>>>
>>>
>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>
>>>
>>> This is command I've run:
>>>
>>>
>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>
>>>
>>>
>>> The problem is that I only have one entry per transcript (these transcripts
>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>> input.    the
>>>
>>>
>>> 02.cut1 1 0.00 67.11 I
>>>
>>> 02.cut2 1 0.00 62.44 HIS
>>>
>>> 02.cut3 1 0.00 65.76 MOUNT
>>>
>>> 03.cut1 1 0.00 62.62 I
>>>
>>> 03.cut2 1 0.00 62.41 WHO
>>>
>>> 03.cut3 1 0.00 63.72 I
>>>
>>> 06.cut1 1 0.00 62.13 STANDING
>>>
>>> 06.cut2 1 0.00 57.95 A
>>>
>>> 06.cut3 1 0.00 66.78 I
>>>
>>> . . .
>>>
>>> What I want is the things for each word:
>>>
>>> 02.cut1 1 0.00 43.7 YOU
>>>
>>> 02.cut1 1 81.2 121.3 ARE
>>>
>>> 02.cut1 1 145.4 163.8 STANDING
>>>
>>> . . .
>>>
>>>
>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>
>>> 1 nonword
>>>
>>> 2 begin
>>>
>>> 3 end
>>>
>>> 4 internal
>>>
>>> 5 singleton
>>>
>>> 6 nonword
>>>
>>> 7 begin
>>>
>>> 8 end
>>>
>>> . . .
>>>
>>>
>>>
>>> Any help is much appreciated.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Nathan
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> See everything from the browser to the database with AppDynamics
>>>
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>
>>> _______________________________________________
>>>
>>> Kaldi-users mailing list
>>>
>>> Kal...@li...
>>>
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>>
>>>
>

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 05:00:55

I think that was part of it.   I fixed one problem with the oov.txt / oov.int 

I'll try to recompile that bug fix and see if that works.   Its possible that I'm creating word_boundaries incorrectly.  How many entries would you expect to get (I am getting 315).   I wonder if I am using word_boundaries for the wrong set of phones . . 

Checking g300_lang/phones.txt ...
--> g300_lang/phones.txt is OK

Checking words.txt: #0 ...
--> g300_lang/words.txt has "#0"
--> g300_lang/words.txt is OK

Checking g300_lang/phones/context_indep.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.{txt, int, csl} are OK

Checking g300_lang/phones/disambig.{txt, int, csl} ...
--> 28 entry/entries in g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.{txt, int, csl} are OK

Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
--> 240 entry/entries in g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.{txt, int, csl} are OK

Checking g300_lang/phones/silence.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/silence.txt
--> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.{txt, int, csl} are OK

Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.{txt, int, csl} are OK

Checking g300_lang/phones/extra_questions.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/extra_questions.txt

Checking g300_lang/phones/roots.{txt, int} ...
--> 75 entry/entries in g300_lang/phones/roots.txt
--> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
--> g300_lang/phones/roots.{txt, int} are OK

Checking g300_lang/phones/sets.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/sets.int

Checking g300_lang/phones/word_boundary.{txt, int} ...
--> 315 entry/entries in g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.{txt, int} are OK

Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK

Checking optional_silence.txt ...
--> reading g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> g300_lang/phones/disambig.txt has "#0" and "#1"
--> g300_lang/phones/disambig.txt is OK

Checking topo ...
--> g300_lang/topo's nonsilence section is OK
--> g300_lang/topo's silence section is OK
--> g300_lang/topo is OK

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> g300_lang/phones/word_boundary.txt is OK
--> checking L.fst and L_disambig.fst...
--> generating a 46 words sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK

Checking g300_lang/oov.{txt, int} ...
--> 1 entry/entries in g300_lang/oov.txt
--> g300_lang/oov.int corresponds to g300_lang/oov.txt
--> g300_lang/oov.{txt, int} are OK



Nathan

On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:

> OK-- so the word-alignment seems to have failed.  Generally that is
> because of invalid word-boundary information.  That file is indexed by
> phones, not words.  Issues can include a mismatch in phone set; words
> that don't have any phones in them; or phones that have only one state
> in their topology (this is a bug that was recently fixed, those should
> work now if you update and recompile).
> That program should not generally output any warnings, if all is OK.
> Try to use the program utils/validate_lang.pl to make sure your
> g300_lang/ directory is OK.
> 
> Dan
> 
> 
> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>> 
>> Sorry, and it ends with this:
>> 
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut2
>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>> path, 0 had errors.
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut3
>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>> aligned 0 lattices; 132 had errors.
>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>> to ctm format; 0 had errors.
>> ndunn:childspeech%
>> 
>> 
>> Nathan
>> 
>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>> 
>> 
>> The std err output is this:
>> 
>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>> exp/tri2a/ctm2/output.txt
>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>> ark:-
>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>> ark:- ark:-
>> nbest-to-ctm ark:- -
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> 
>> 
>> Nathan Dunn, Ph.D.
>> Scientific Programer
>> College of Arts and Science IT
>> 541-221-2418
>> nd...@ca...
>> 
>> 
>> 
>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>> 
>> Can you provide the logging output, at least some representative lines
>> from it.  Are there any warnings?
>> Dan
>> 
>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>> 
>> 
>> I'm trying to get word timing information out of a successfully trained
>> language model that I've already been able to successfully decode with
>> following these instructions.
>> 
>> 
>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>> 
>> 
>> This is command I've run:
>> 
>> 
>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>> 
>> 
>> 
>> The problem is that I only have one entry per transcript (these transcripts
>> are 1 minute long) and I don't see any bearing on this relative to the word
>> input.    the
>> 
>> 
>> 02.cut1 1 0.00 67.11 I
>> 
>> 02.cut2 1 0.00 62.44 HIS
>> 
>> 02.cut3 1 0.00 65.76 MOUNT
>> 
>> 03.cut1 1 0.00 62.62 I
>> 
>> 03.cut2 1 0.00 62.41 WHO
>> 
>> 03.cut3 1 0.00 63.72 I
>> 
>> 06.cut1 1 0.00 62.13 STANDING
>> 
>> 06.cut2 1 0.00 57.95 A
>> 
>> 06.cut3 1 0.00 66.78 I
>> 
>> . . .
>> 
>> What I want is the things for each word:
>> 
>> 02.cut1 1 0.00 43.7 YOU
>> 
>> 02.cut1 1 81.2 121.3 ARE
>> 
>> 02.cut1 1 145.4 163.8 STANDING
>> 
>> . . .
>> 
>> 
>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>> 
>> 1 nonword
>> 
>> 2 begin
>> 
>> 3 end
>> 
>> 4 internal
>> 
>> 5 singleton
>> 
>> 6 nonword
>> 
>> 7 begin
>> 
>> 8 end
>> 
>> . . .
>> 
>> 
>> 
>> Any help is much appreciated.
>> 
>> 
>> Thanks,
>> 
>> 
>> Nathan
>> 
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> 
>> See everything from the browser to the database with AppDynamics
>> 
>> Get end-to-end visibility with application monitoring from AppDynamics
>> 
>> Isolate bottlenecks and diagnose root cause in seconds.
>> 
>> Start your free trial of AppDynamics Pro today!
>> 
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> 
>> _______________________________________________
>> 
>> Kaldi-users mailing list
>> 
>> Kal...@li...
>> 
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> 
>> 
>>

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:12:54

OK-- so the word-alignment seems to have failed.  Generally that is
because of invalid word-boundary information.  That file is indexed by
phones, not words.  Issues can include a mismatch in phone set; words
that don't have any phones in them; or phones that have only one state
in their topology (this is a bug that was recently fixed, those should
work now if you update and recompile).
That program should not generally output any warnings, if all is OK.
Try to use the program utils/validate_lang.pl to make sure your
g300_lang/ directory is OK.

Dan


On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>
> Sorry, and it ends with this:
>
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 98.cut1
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 98.cut2
> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
> path, 0 had errors.
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 98.cut3
> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
> aligned 0 lattices; 132 had errors.
> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
> to ctm format; 0 had errors.
> ndunn:childspeech%
>
>
> Nathan
>
> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>
>
> The std err output is this:
>
> ndunn:childspeech% lattice-1best "ark:gunzip -c
> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
> exp/tri2a/ctm2/output.txt
> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
> ark:-
> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
> ark:- ark:-
> nbest-to-ctm ark:- -
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 02.cut1
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 02.cut2
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 02.cut3
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 03.cut1
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 03.cut2
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
> partial lattice for 03.cut3
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
> Invalid word at end of lattice [partial lattice, forced out?]
>
>
> Nathan Dunn, Ph.D.
> Scientific Programer
> College of Arts and Science IT
> 541-221-2418
> nd...@ca...
>
>
>
> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>
> Can you provide the logging output, at least some representative lines
> from it.  Are there any warnings?
> Dan
>
> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
> Communication and Updates <kal...@li...> wrote:
>
>
> I'm trying to get word timing information out of a successfully trained
> language model that I've already been able to successfully decode with
> following these instructions.
>
>
> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>
>
> This is command I've run:
>
>
> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>
>
>
> The problem is that I only have one entry per transcript (these transcripts
> are 1 minute long) and I don't see any bearing on this relative to the word
> input.    the
>
>
> 02.cut1 1 0.00 67.11 I
>
> 02.cut2 1 0.00 62.44 HIS
>
> 02.cut3 1 0.00 65.76 MOUNT
>
> 03.cut1 1 0.00 62.62 I
>
> 03.cut2 1 0.00 62.41 WHO
>
> 03.cut3 1 0.00 63.72 I
>
> 06.cut1 1 0.00 62.13 STANDING
>
> 06.cut2 1 0.00 57.95 A
>
> 06.cut3 1 0.00 66.78 I
>
> . . .
>
> What I want is the things for each word:
>
> 02.cut1 1 0.00 43.7 YOU
>
> 02.cut1 1 81.2 121.3 ARE
>
> 02.cut1 1 145.4 163.8 STANDING
>
> . . .
>
>
> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>
> 1 nonword
>
> 2 begin
>
> 3 end
>
> 4 internal
>
> 5 singleton
>
> 6 nonword
>
> 7 begin
>
> 8 end
>
> . . .
>
>
>
> Any help is much appreciated.
>
>
> Thanks,
>
>
> Nathan
>
>
>
>
> ------------------------------------------------------------------------------
>
> See everything from the browser to the database with AppDynamics
>
> Get end-to-end visibility with application monitoring from AppDynamics
>
> Isolate bottlenecks and diagnose root cause in seconds.
>
> Start your free trial of AppDynamics Pro today!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
> _______________________________________________
>
> Kaldi-users mailing list
>
> Kal...@li...
>
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>
>

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:07:05

Sorry, and it ends with this:

WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut1
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut2
LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best path, 0 had errors.
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut3
LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully aligned 0 lattices; 132 had errors.
LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices to ctm format; 0 had errors.
ndunn:childspeech% 


Nathan

On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:

> 
> The std err output is this:
> 
> ndunn:childspeech% lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|' ark:- 
> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- 
> nbest-to-ctm ark:- - 
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut1
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut2
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut3
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut1
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut2
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut3
> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
> 
> 
> Nathan Dunn, Ph.D.
> Scientific Programer
> College of Arts and Science IT
> 541-221-2418
> nd...@ca...
> 
> 
> 
> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
> 
>> Can you provide the logging output, at least some representative lines
>> from it.  Are there any warnings?
>> Dan
>> 
>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>>> 
>>> I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions.
>>> 
>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>> 
>>> This is command I've run:
>>> 
>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>> 
>>> 
>>> The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input.    the
>>> 
>>> 02.cut1 1 0.00 67.11 I
>>> 02.cut2 1 0.00 62.44 HIS
>>> 02.cut3 1 0.00 65.76 MOUNT
>>> 03.cut1 1 0.00 62.62 I
>>> 03.cut2 1 0.00 62.41 WHO
>>> 03.cut3 1 0.00 63.72 I
>>> 06.cut1 1 0.00 62.13 STANDING
>>> 06.cut2 1 0.00 57.95 A
>>> 06.cut3 1 0.00 66.78 I
>>> . . .
>>> What I want is the things for each word:
>>> 02.cut1 1 0.00 43.7 YOU
>>> 02.cut1 1 81.2 121.3 ARE
>>> 02.cut1 1 145.4 163.8 STANDING
>>> . . .
>>> 
>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>> 1 nonword
>>> 2 begin
>>> 3 end
>>> 4 internal
>>> 5 singleton
>>> 6 nonword
>>> 7 begin
>>> 8 end
>>> . . .
>>> 
>>> 
>>> Any help is much appreciated.
>>> 
>>> Thanks,
>>> 
>>> Nathan
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Kaldi-users mailing list
>>> Kal...@li...
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:06:35

The std err output is this:

ndunn:childspeech% lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|' ark:- 
lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- 
nbest-to-ctm ark:- - 
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut1
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut2
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut3
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut1
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut2
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]
LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut3
WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?]


Nathan Dunn, Ph.D.
Scientific Programer
College of Arts and Science IT
541-221-2418
nd...@ca...



On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:

> Can you provide the logging output, at least some representative lines
> from it.  Are there any warnings?
> Dan
> 
> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
> Communication and Updates <kal...@li...> wrote:
>> 
>> I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions.
>> 
>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>> 
>> This is command I've run:
>> 
>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>> 
>> 
>> The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input.    the
>> 
>> 02.cut1 1 0.00 67.11 I
>> 02.cut2 1 0.00 62.44 HIS
>> 02.cut3 1 0.00 65.76 MOUNT
>> 03.cut1 1 0.00 62.62 I
>> 03.cut2 1 0.00 62.41 WHO
>> 03.cut3 1 0.00 63.72 I
>> 06.cut1 1 0.00 62.13 STANDING
>> 06.cut2 1 0.00 57.95 A
>> 06.cut3 1 0.00 66.78 I
>> . . .
>> What I want is the things for each word:
>> 02.cut1 1 0.00 43.7 YOU
>> 02.cut1 1 81.2 121.3 ARE
>> 02.cut1 1 145.4 163.8 STANDING
>> . . .
>> 
>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>> 1 nonword
>> 2 begin
>> 3 end
>> 4 internal
>> 5 singleton
>> 6 nonword
>> 7 begin
>> 8 end
>> . . .
>> 
>> 
>> Any help is much appreciated.
>> 
>> Thanks,
>> 
>> Nathan
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users

Re: [Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 03:45:23

Can you provide the logging output, at least some representative lines
from it.  Are there any warnings?
Dan

On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
Communication and Updates <kal...@li...> wrote:
>
> I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions.
>
> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>
> This is command I've run:
>
> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>
>
> The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input.    the
>
> 02.cut1 1 0.00 67.11 I
> 02.cut2 1 0.00 62.44 HIS
> 02.cut3 1 0.00 65.76 MOUNT
> 03.cut1 1 0.00 62.62 I
> 03.cut2 1 0.00 62.41 WHO
> 03.cut3 1 0.00 63.72 I
> 06.cut1 1 0.00 62.13 STANDING
> 06.cut2 1 0.00 57.95 A
> 06.cut3 1 0.00 66.78 I
> . . .
> What I want is the things for each word:
> 02.cut1 1 0.00 43.7 YOU
> 02.cut1 1 81.2 121.3 ARE
> 02.cut1 1 145.4 163.8 STANDING
> . . .
>
> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
> 1 nonword
> 2 begin
> 3 end
> 4 internal
> 5 singleton
> 6 nonword
> 7 begin
> 8 end
> . . .
>
>
> Any help is much appreciated.
>
> Thanks,
>
> Nathan
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users

[Kaldi-users] word timing information

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 03:39:05

I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions.

https://sourceforge.net/mailarchive/message.php?msg_id=30729903

This is command I've run:

lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt


The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input.    the 

02.cut1 1 0.00 67.11 I 
02.cut2 1 0.00 62.44 HIS 
02.cut3 1 0.00 65.76 MOUNT 
03.cut1 1 0.00 62.62 I 
03.cut2 1 0.00 62.41 WHO 
03.cut3 1 0.00 63.72 I 
06.cut1 1 0.00 62.13 STANDING 
06.cut2 1 0.00 57.95 A 
06.cut3 1 0.00 66.78 I 
. . . 
What I want is the things for each word:
02.cut1 1 0.00 43.7 YOU
02.cut1 1 81.2 121.3 ARE
02.cut1 1 145.4 163.8 STANDING
. . . 

The words.txt is 116K, but word_boundary.int has only 316 entries like this:
1 nonword
2 begin
3 end
4 internal
5 singleton
6 nonword
7 begin
8 end
. . . 


Any help is much appreciated.  

Thanks,

Nathan

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:39:50

Hi all,
if you do not want read the instruction from mail.
There are in more pleasant form at
https://github.com/oplatek/pykaldi/blob/master/src/python-kaldi-decoding/pykaldi/binutils/README.md

The README.md should stay up-to-date.

Ondra


On Mon, Jul 8, 2013 at 6:25 PM, ondrej platek <ond...@se...>wrote:

> Hi all,
>
> I would like to thank you for implementing the kaldi compilation to shared
> libraries and merging them to trunk.
> It allows me to build python bindings easily for Kaldi decoders using cffi
> library (http://cffi.readthedocs.org/en/latest/.)
>
> So far, I managed to setup decoding example, which is based on Voxforge
> Online demo.
> All the C++ Kaldi functionality is called from Python via cffi.
>
> In order to try it, follow the steps below:
>
> # 1.
> svn checkout svn+ssh://
> op...@sv.../p/kaldi/code/sandbox/oplatek2  # Change your
> username
>
> # 2. INSTALL portaudio and cffi.
> # For portaudio:
> cd oplatek2/tools; ./install_portaudio.sh
> # For cffi you have options a) or b)
> # a) Go to http://cffi.readthedocs.org/en/latest/ and following the
> instructions install the cffi system wide! (Recommended)
> #    Read the Requirements section!
> # b) Go to oplatek2/tools and install cffi locally by
> using install_cffi.sh.
> # After a successful installation the script prompts you to add the
> installation directory to PYTHONPATH.
> # Do it, it will be needed in step 7.
>
> # 3.
> cd oplatek2/src
>
> # 4. Configure it with --shared flag
>  ./configure --fst-root=`pwd`/../tools/openfst --shared
>
> # 5. Build Kaldi. Clean it and tested to be sure that, it is not corrupted.
> make clean; make depend && make ext_depend && make && make ext && make
> test && make ext_test
>
> # 6. change to the directory with the example
> cd python-kaldi-decoding/pykaldi/binutils/
>
> # 7. run make test, it should compile and downloaded everything needed
> make test
>
> # 8. Check the results! My results for python-online-wav-gmm-decode-faster
> are:
>
> python-compute-wer --config=configs/wer.config ark:work/reference.txt
> ark:work/online.trans.compact
> %WER 15.03 [ 55 / 366, 6 ins, 15 del, 34 sub ]
> %SER 100.00 [ 3 / 3 ]
> Scored 3 sentences, 0 not present in hyp.
>
>
> Any feedback is welcome!
>
> I am committing to https://github.com/oplatek/pykaldi .
> To svn.code.sf.net/p/kaldi/code/sandbox/oplatek2 I will commit just major
> updates which should not break things.
>
> Cheers,
>
> Ondra
>
>
> On Mon, Jul 8, 2013 at 10:59 AM, ondrej platek <ond...@se...>wrote:
>
>> I just check the results for my modified Voxforge-like recipe.
>> Everything worked, training, decoding, evaluation.
>>
>> My configuration: Ubuntu 10.04, using OpenBlas and shared flag:
>> ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install
>> --fst-root=`pwd`/../tools/openfst --shared
>>
>> Ondra
>>
>>
>> On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote:
>>
>>> Simulated mode on online decoding demo run fine on CentOS too.
>>>
>>> Ricky
>>>
>>> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov <
>>> vas...@gm...> wrote:
>>>
>>>> The compilation(including "make ext") is working OK for me too on
>>>> Ubuntu 10.04.
>>>> Only tried to run the online decoders(voxforge/online_demo) so far -
>>>> everything seems to be fine with them.
>>>>
>>>> Vassil
>>>>
>>>>
>>>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
>>>> > Everyone,
>>>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
>>>> > Platek and others have been working on different build scripts that
>>>> > now support a shared-library option.  If anyone can test it and make
>>>> > sure it still works for them it would be great.
>>>> > If people have made local changes to their Makefiles they may get
>>>> conflicts.
>>>> > Dan
>>>>
>>>
>>>
>>
>

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:26:04

Hi all,

I would like to thank you for implementing the kaldi compilation to shared
libraries and merging them to trunk.
It allows me to build python bindings easily for Kaldi decoders using cffi
library (http://cffi.readthedocs.org/en/latest/.)

So far, I managed to setup decoding example, which is based on Voxforge
Online demo.
All the C++ Kaldi functionality is called from Python via cffi.

In order to try it, follow the steps below:

# 1.
svn checkout svn+ssh://oplatek@svn.code.sf.net/p/kaldi/code/sandbox/oplatek2
# Change your username

# 2. INSTALL portaudio and cffi.
# For portaudio:
cd oplatek2/tools; ./install_portaudio.sh
# For cffi you have options a) or b)
# a) Go to http://cffi.readthedocs.org/en/latest/ and following the
instructions install the cffi system wide! (Recommended)
#    Read the Requirements section!
# b) Go to oplatek2/tools and install cffi locally by
using install_cffi.sh.
# After a successful installation the script prompts you to add the
installation directory to PYTHONPATH.
# Do it, it will be needed in step 7.

# 3.
cd oplatek2/src

# 4. Configure it with --shared flag
 ./configure --fst-root=`pwd`/../tools/openfst --shared

# 5. Build Kaldi. Clean it and tested to be sure that, it is not corrupted.
make clean; make depend && make ext_depend && make && make ext && make test
&& make ext_test

# 6. change to the directory with the example
cd python-kaldi-decoding/pykaldi/binutils/

# 7. run make test, it should compile and downloaded everything needed
make test

# 8. Check the results! My results for python-online-wav-gmm-decode-faster
are:

python-compute-wer --config=configs/wer.config ark:work/reference.txt
ark:work/online.trans.compact
%WER 15.03 [ 55 / 366, 6 ins, 15 del, 34 sub ]
%SER 100.00 [ 3 / 3 ]
Scored 3 sentences, 0 not present in hyp.

Any feedback is welcome!

I am committing to https://github.com/oplatek/pykaldi .
To svn.code.sf.net/p/kaldi/code/sandbox/oplatek2 I will commit just major
updates which should not break things.

Cheers,

Ondra

On Mon, Jul 8, 2013 at 10:59 AM, ondrej platek <ond...@se...>wrote:

> I just check the results for my modified Voxforge-like recipe.
> Everything worked, training, decoding, evaluation.
>
> My configuration: Ubuntu 10.04, using OpenBlas and shared flag:
> ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install
> --fst-root=`pwd`/../tools/openfst --shared
>
> Ondra
>
>
> On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote:
>
>> Simulated mode on online decoding demo run fine on CentOS too.
>>
>> Ricky
>>
>> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov <
>> vas...@gm...> wrote:
>>
>>> The compilation(including "make ext") is working OK for me too on Ubuntu
>>> 10.04.
>>> Only tried to run the online decoders(voxforge/online_demo) so far -
>>> everything seems to be fine with them.
>>>
>>> Vassil
>>>
>>>
>>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
>>> > Everyone,
>>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
>>> > Platek and others have been working on different build scripts that
>>> > now support a shared-library option.  If anyone can test it and make
>>> > sure it still works for them it would be great.
>>> > If people have made local changes to their Makefiles they may get
>>> conflicts.
>>> > Dan
>>>
>>
>>
>

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:21:35

Thanks, everyone!
Dan


On Mon, Jul 8, 2013 at 4:59 AM, ondrej platek <ond...@se...> wrote:
> I just check the results for my modified Voxforge-like recipe.
> Everything worked, training, decoding, evaluation.
>
> My configuration: Ubuntu 10.04, using OpenBlas and shared flag:
> ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install
> --fst-root=`pwd`/../tools/openfst --shared
>
> Ondra
>
>
> On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>
> wrote:
>>
>> Simulated mode on online decoding demo run fine on CentOS too.
>>
>> Ricky
>>
>> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov
>> <vas...@gm...> wrote:
>>>
>>> The compilation(including "make ext") is working OK for me too on Ubuntu
>>> 10.04.
>>> Only tried to run the online decoders(voxforge/online_demo) so far -
>>> everything seems to be fine with them.
>>>
>>> Vassil
>>>
>>>
>>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
>>> > Everyone,
>>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
>>> > Platek and others have been working on different build scripts that
>>> > now support a shared-library option.  If anyone can test it and make
>>> > sure it still works for them it would be great.
>>> > If people have made local changes to their Makefiles they may get
>>> > conflicts.
>>> > Dan
>>
>>
>

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 08:59:36

I just check the results for my modified Voxforge-like recipe.
Everything worked, training, decoding, evaluation.

My configuration: Ubuntu 10.04, using OpenBlas and shared flag:
./configure --openblas-root=`pwd`/../tools/OpenBLAS/install
--fst-root=`pwd`/../tools/openfst --shared

Ondra


On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote:

> Simulated mode on online decoding demo run fine on CentOS too.
>
> Ricky
>
> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov <
> vas...@gm...> wrote:
>
>> The compilation(including "make ext") is working OK for me too on Ubuntu
>> 10.04.
>> Only tried to run the online decoders(voxforge/online_demo) so far -
>> everything seems to be fine with them.
>>
>> Vassil
>>
>>
>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
>> > Everyone,
>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
>> > Platek and others have been working on different build scripts that
>> > now support a shared-library option.  If anyone can test it and make
>> > sure it still works for them it would be great.
>> > If people have made local changes to their Makefiles they may get
>> conflicts.
>> > Dan
>>
>
>

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 05:54:56

Simulated mode on online decoding demo run fine on CentOS too.

Ricky

On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov <
vas...@gm...> wrote:

> The compilation(including "make ext") is working OK for me too on Ubuntu
> 10.04.
> Only tried to run the online decoders(voxforge/online_demo) so far -
> everything seems to be fine with them.
>
> Vassil
>
>
> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
> > Everyone,
> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
> > Platek and others have been working on different build scripts that
> > now support a shared-library option.  If anyone can test it and make
> > sure it still works for them it would be great.
> > If people have made local changes to their Makefiles they may get
> conflicts.
> > Dan
>

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 14:07:24

The compilation(including "make ext") is working OK for me too on Ubuntu 10.04.
Only tried to run the online decoders(voxforge/online_demo) so far -
everything seems to be fine with them.

Vassil

On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote:
> Everyone,
> I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
> Platek and others have been working on different build scripts that
> now support a shared-library option.  If anyone can test it and make
> sure it still works for them it would be great.
> If people have made local changes to their Makefiles they may get conflicts.
> Dan

Re: [Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 11:31:36

Successfully built on MacOS 10.8 and the commands are working.
I haven't tried training a system yet.

Paul


On 7 July 2013 11:31, Mailing list used for User Communication and Updates <
kal...@li...> wrote:

> Everyone,
> I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
> Platek and others have been working on different build scripts that
> now support a shared-library option.  If anyone can test it and make
> sure it still works for them it would be great.
> If people have made local changes to their Makefiles they may get
> conflicts.
> Dan
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>

[Kaldi-users] Committed change to build setup

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 02:31:13

Everyone,
I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej
Platek and others have been working on different build scripts that
now support a shared-library option.  If anyone can test it and make
sure it still works for them it would be great.
If people have made local changes to their Makefiles they may get conflicts.
Dan

Re: [Kaldi-users] Kaldi DNN implemenation

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-02 13:51:45

Thanks.


On Tue, Jul 2, 2013 at 9:38 PM, Mailing list used for User Communication
and Updates <kal...@li...> wrote:

>  Hi Lahiru,
> I already fixed this issue in the trunk, the PdfPrior is now activated
> only when the option --class-frame-counts is present.
>
> Karel
>
>
> Dne 2.7.2013 9:22, Mailing list used for User Communication and Updates
> napsal(a):
>
>   Sorry, I was wrong. It selects the GPU automatically,
>
>  I found the error in *exp/tri4b_pretrain-dbn/log/cmvn_glob_fwd.log *file.
>
> ERROR (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts
> is empty: Cannot initialize priors without the counts.
> ERROR (nnet-forward:main():nnet-forward.cc:196) ERROR
> (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts is
> empty: Cannot initialize priors without the counts.
>
>
>  Thanks
>  Lahiru
>
>
> On Tue, Jul 2, 2013 at 9:10 PM, Lahiru Samarakoon <lah...@gm...>wrote:
>
>>   Hi All,
>>
>> When running DNN training on GPUs, I am getting following error.
>>
>>  *Log File : exp/tri4b_pretrain-dbn/_pretrain_dbn.log*
>>
>> *# PRE-TRAINING RBM LAYER 1
>> Initializing 'exp/tri4b_pretrain-dbn/1.rbm.init'
>> Traceback (most recent call last):
>>   File "utils/nnet/gen_rbm_init.py", line 40, in ?
>>     dimL.append(int(dimStrL[i]))
>> ValueError: invalid literal for int(): *
>>
>>
>> I am running this in a GPU cluster which assigns the job to a GPU
>> dynamically, So I cannot configure the *_gpu_id= # manually select GPU
>> id to run on, (-1 disables GPU)*.
>>  Can this be the cause?
>>
>>  Thanks,
>>  Lahiru
>>
>>
>> On Fri, Jun 28, 2013 at 11:06 PM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>>
>>> It's not the same as that.  Each machine does SGD separately and,
>>> periodically, the parameters are averaged across machines.
>>> Dan
>>>
>>>
>>> On Fri, Jun 28, 2013 at 11:03 AM, Mailing list used for User
>>>  Communication and Updates <kal...@li...> wrote:
>>> > Wow, nice.
>>> > Does the implementation similar to the Jeff Dean's paper Large Scale
>>> > Distributed Deep Networks
>>> > (
>>> http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf
>>> )
>>> > ?
>>> > Does Kaldi use Asynchronous SGD?
>>> >
>>> > Please give me a brief description.
>>> >
>>> > Thanks,
>>> > Lahiru
>>> >
>>> >
>>> > On Fri, Jun 28, 2013 at 10:28 PM, Mailing list used for User
>>> Communication
>>> > and Updates <kal...@li...> wrote:
>>> >>
>>> >> It's on multiple machines and also multiple threads per machine.
>>> >> Dan
>>> >>
>>> >>
>>> >> On Fri, Jun 28, 2013 at 2:05 AM, Mailing list used for User
>>> >> Communication and Updates <kal...@li...> wrote:
>>> >> > Thanks guys :-)
>>> >> >
>>> >> > Dan, is your setup for distributed training? Or is it only
>>> parallelize
>>> >> > with
>>> >> > in a single machine?
>>> >> >
>>> >> > Thanks,
>>> >> > Lahiru
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Jun 28, 2013 at 5:29 AM, Mailing list used for User
>>> >> > Communication
>>> >> > and Updates <kal...@li...> wrote:
>>> >> >>
>>> >> >> In my setup there is RBM pre-training:
>>> >> >> http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
>>> >> >> <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf>
>>> >> >> followed by per-frame cross entropy training and sMBR training:
>>> >> >> http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
>>> >> >>
>>> >> >>
>>> >> >> Dne 27.6.2013 13:21, Mailing list used for User Communication and
>>> >> >> Updates napsal(a):
>>> >> >> > There are basically two setups there: Karel's setup, generally
>>> called
>>> >> >> > run_dnn.sh or run_nnet.sh, which is for GPUs, and my setup,
>>> called
>>> >> >> > run_nnet_cpu.sh, which is for CPUs in parallel.  Karel's setup
>>> may
>>> >> >> > have an ICASSP paper, Karel can tell you.  Mine is mostly
>>> >> >> > unpublished.
>>> >> >> >
>>> >> >> > Dan
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Jun 27, 2013 at 5:31 AM, Mailing list used for User
>>> >> >> > Communication and Updates <kal...@li...>
>>> wrote:
>>> >> >> >> Hi All,
>>> >> >> >>
>>> >> >> >> I am in the process of running the wsj/s5 recipe. Now I am
>>> about the
>>> >> >> >> run DNN
>>> >> >> >> experiments and specifically interested in the DNN training.  I
>>> am
>>> >> >> >> planning
>>> >> >> >> to look into the DNN code for more understanding. Since there
>>> are
>>> >> >> >> many
>>> >> >> >> DNN
>>> >> >> >> variants, could anyone tell me the papers Kalid DNN
>>> implementation
>>> >> >> >> represents?
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >> Lahiru
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> ------------------------------------------------------------------------------
>>> >> >> >> This SF.net email is sponsored by Windows:
>>> >> >> >>
>>> >> >> >> Build for Windows Store.
>>> >> >> >>
>>> >> >> >> http://p.sf.net/sfu/windows-dev2dev
>>> >> >> >> _______________________________________________
>>> >> >> >> Kaldi-users mailing list
>>> >> >> >> Kal...@li...
>>> >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> ------------------------------------------------------------------------------
>>> >> >> > This SF.net email is sponsored by Windows:
>>> >> >> >
>>> >> >> > Build for Windows Store.
>>> >> >> >
>>> >> >> > http://p.sf.net/sfu/windows-dev2dev
>>> >> >> > _______________________________________________
>>> >> >> > Kaldi-users mailing list
>>> >> >> > Kal...@li...
>>> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> ------------------------------------------------------------------------------
>>> >> >> This SF.net email is sponsored by Windows:
>>> >> >>
>>> >> >> Build for Windows Store.
>>> >> >>
>>> >> >> http://p.sf.net/sfu/windows-dev2dev
>>> >> >> _______________________________________________
>>> >> >> Kaldi-users mailing list
>>> >> >> Kal...@li...
>>> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> ------------------------------------------------------------------------------
>>> >> > This SF.net email is sponsored by Windows:
>>> >> >
>>> >> > Build for Windows Store.
>>> >> >
>>> >> > http://p.sf.net/sfu/windows-dev2dev
>>> >> > _______________________________________________
>>> >> > Kaldi-users mailing list
>>> >> > Kal...@li...
>>> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >> >
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> This SF.net email is sponsored by Windows:
>>> >>
>>> >> Build for Windows Store.
>>> >>
>>> >> http://p.sf.net/sfu/windows-dev2dev
>>> >> _______________________________________________
>>> >> Kaldi-users mailing list
>>> >> Kal...@li...
>>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > This SF.net email is sponsored by Windows:
>>> >
>>> > Build for Windows Store.
>>> >
>>> > http://p.sf.net/sfu/windows-dev2dev
>>> > _______________________________________________
>>> > Kaldi-users mailing list
>>> > Kal...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Kaldi-users mailing list
>>> Kal...@li...
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
> http://p.sf.net/sfu/windows-dev2dev
>
>
>
> _______________________________________________
> Kaldi-users mailing lis...@li...://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>

Re: [Kaldi-users] Kaldi DNN implemenation

From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-02 13:38:24

Hi Lahiru,
I already fixed this issue in the trunk, the PdfPrior is now activated
only when the option --class-frame-counts is present.

Karel


Dne 2.7.2013 9:22, Mailing list used for User Communication and Updates 
napsal(a):
> Sorry, I was wrong. It selects the GPU automatically,
>
> I found the error in *exp/tri4b_pretrain-dbn/log/cmvn_glob_fwd.log *file.
>
> ERROR (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) 
> --class-frame-counts is empty: Cannot initialize priors without the 
> counts.
> ERROR (nnet-forward:main():nnet-forward.cc:196) ERROR 
> (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts is 
> empty: Cannot initialize priors without the counts.
>
>
> Thanks
> Lahiru
>
>
> On Tue, Jul 2, 2013 at 9:10 PM, Lahiru Samarakoon <lah...@gm... 
> <mailto:lah...@gm...>> wrote:
>
>     Hi All,
>
>     When running DNN training on GPUs, I am getting following error.
>
>     _*Log File : exp/tri4b_pretrain-dbn/_pretrain_dbn.log*_
>
>     /# PRE-TRAINING RBM LAYER 1
>     Initializing 'exp/tri4b_pretrain-dbn/1.rbm.init'
>     Traceback (most recent call last):
>       File "*utils/nnet/gen_rbm_init.py*", line 40, in ?
>         dimL.append(int(dimStrL[i]))
>     *ValueError: invalid literal for int(): */
>
>
>     I am running this in a GPU cluster which assigns the job to a GPU
>     dynamically, So I cannot configure the *_gpu_id= # manually select
>     GPU id to run on, (-1 disables GPU)*.
>     Can this be the cause?
>
>     Thanks,
>     Lahiru
>
>
>     On Fri, Jun 28, 2013 at 11:06 PM, Mailing list used for User
>     Communication and Updates <kal...@li...
>     <mailto:kal...@li...>> wrote:
>
>         It's not the same as that.  Each machine does SGD separately and,
>         periodically, the parameters are averaged across machines.
>         Dan
>
>
>         On Fri, Jun 28, 2013 at 11:03 AM, Mailing list used for User
>         Communication and Updates <kal...@li...
>         <mailto:kal...@li...>> wrote:
>         > Wow, nice.
>         > Does the implementation similar to the Jeff Dean's paper
>         Large Scale
>         > Distributed Deep Networks
>         >
>         (http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf
>         <http://www.cs.toronto.edu/%7Eranzato/publications/DistBeliefNIPS2012_withAppendix.pdf>)
>         > ?
>         > Does Kaldi use Asynchronous SGD?
>         >
>         > Please give me a brief description.
>         >
>         > Thanks,
>         > Lahiru
>         >
>         >
>         > On Fri, Jun 28, 2013 at 10:28 PM, Mailing list used for User
>         Communication
>         > and Updates <kal...@li...
>         <mailto:kal...@li...>> wrote:
>         >>
>         >> It's on multiple machines and also multiple threads per
>         machine.
>         >> Dan
>         >>
>         >>
>         >> On Fri, Jun 28, 2013 at 2:05 AM, Mailing list used for User
>         >> Communication and Updates
>         <kal...@li...
>         <mailto:kal...@li...>> wrote:
>         >> > Thanks guys :-)
>         >> >
>         >> > Dan, is your setup for distributed training? Or is it
>         only parallelize
>         >> > with
>         >> > in a single machine?
>         >> >
>         >> > Thanks,
>         >> > Lahiru
>         >> >
>         >> >
>         >> >
>         >> > On Fri, Jun 28, 2013 at 5:29 AM, Mailing list used for User
>         >> > Communication
>         >> > and Updates <kal...@li...
>         <mailto:kal...@li...>> wrote:
>         >> >>
>         >> >> In my setup there is RBM pre-training:
>         >> >> http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
>         <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf>
>         >> >> <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf>
>         >> >> followed by per-frame cross entropy training and sMBR
>         training:
>         >> >> http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
>         >> >>
>         >> >>
>         >> >> Dne 27.6.2013 13:21, Mailing list used for User
>         Communication and
>         >> >> Updates napsal(a):
>         >> >> > There are basically two setups there: Karel's setup,
>         generally called
>         >> >> > run_dnn.sh or run_nnet.sh, which is for GPUs, and my
>         setup, called
>         >> >> > run_nnet_cpu.sh, which is for CPUs in parallel.
>          Karel's setup may
>         >> >> > have an ICASSP paper, Karel can tell you.  Mine is mostly
>         >> >> > unpublished.
>         >> >> >
>         >> >> > Dan
>         >> >> >
>         >> >> >
>         >> >> > On Thu, Jun 27, 2013 at 5:31 AM, Mailing list used for
>         User
>         >> >> > Communication and Updates
>         <kal...@li...
>         <mailto:kal...@li...>> wrote:
>         >> >> >> Hi All,
>         >> >> >>
>         >> >> >> I am in the process of running the wsj/s5 recipe. Now
>         I am about the
>         >> >> >> run DNN
>         >> >> >> experiments and specifically interested in the DNN
>         training.  I am
>         >> >> >> planning
>         >> >> >> to look into the DNN code for more understanding.
>         Since there are
>         >> >> >> many
>         >> >> >> DNN
>         >> >> >> variants, could anyone tell me the papers Kalid DNN
>         implementation
>         >> >> >> represents?
>         >> >> >>
>         >> >> >> Thanks,
>         >> >> >> Lahiru
>         >> >> >>
>         >> >> >>
>         >> >> >>
>         >> >> >>
>         ------------------------------------------------------------------------------
>         >> >> >> This SF.net email is sponsored by Windows:
>         >> >> >>
>         >> >> >> Build for Windows Store.
>         >> >> >>
>         >> >> >> http://p.sf.net/sfu/windows-dev2dev
>         >> >> >> _______________________________________________
>         >> >> >> Kaldi-users mailing list
>         >> >> >> Kal...@li...
>         <mailto:Kal...@li...>
>         >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >> >> >>
>         >> >> >
>         >> >> >
>         >> >> >
>         ------------------------------------------------------------------------------
>         >> >> > This SF.net email is sponsored by Windows:
>         >> >> >
>         >> >> > Build for Windows Store.
>         >> >> >
>         >> >> > http://p.sf.net/sfu/windows-dev2dev
>         >> >> > _______________________________________________
>         >> >> > Kaldi-users mailing list
>         >> >> > Kal...@li...
>         <mailto:Kal...@li...>
>         >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >> >>
>         >> >>
>         >> >>
>         >> >>
>         >> >>
>         ------------------------------------------------------------------------------
>         >> >> This SF.net email is sponsored by Windows:
>         >> >>
>         >> >> Build for Windows Store.
>         >> >>
>         >> >> http://p.sf.net/sfu/windows-dev2dev
>         >> >> _______________________________________________
>         >> >> Kaldi-users mailing list
>         >> >> Kal...@li...
>         <mailto:Kal...@li...>
>         >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >> >
>         >> >
>         >> >
>         >> >
>         >> >
>         ------------------------------------------------------------------------------
>         >> > This SF.net email is sponsored by Windows:
>         >> >
>         >> > Build for Windows Store.
>         >> >
>         >> > http://p.sf.net/sfu/windows-dev2dev
>         >> > _______________________________________________
>         >> > Kaldi-users mailing list
>         >> > Kal...@li...
>         <mailto:Kal...@li...>
>         >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >> >
>         >>
>         >>
>         >>
>         ------------------------------------------------------------------------------
>         >> This SF.net email is sponsored by Windows:
>         >>
>         >> Build for Windows Store.
>         >>
>         >> http://p.sf.net/sfu/windows-dev2dev
>         >> _______________________________________________
>         >> Kaldi-users mailing list
>         >> Kal...@li...
>         <mailto:Kal...@li...>
>         >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >
>         >
>         >
>         >
>         ------------------------------------------------------------------------------
>         > This SF.net email is sponsored by Windows:
>         >
>         > Build for Windows Store.
>         >
>         > http://p.sf.net/sfu/windows-dev2dev
>         > _______________________________________________
>         > Kaldi-users mailing list
>         > Kal...@li...
>         <mailto:Kal...@li...>
>         > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>         >
>
>         ------------------------------------------------------------------------------
>         This SF.net email is sponsored by Windows:
>
>         Build for Windows Store.
>
>         http://p.sf.net/sfu/windows-dev2dev
>         _______________________________________________
>         Kaldi-users mailing list
>         Kal...@li...
>         <mailto:Kal...@li...>
>         https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
>
>
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 41 42 43 44 45 .. 48 > >> (Page 43 of 48)

2011	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug (2)	Sep (1)	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar (8)	Apr (4)	May (2)	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb (2)	Mar (2)	Apr (7)	May (31)	Jun (40)	Jul (65)	Aug (37)	Sep (12)	Oct (57)	Nov (15)	Dec (35)
2014	Jan (3)	Feb (30)	Mar (57)	Apr (26)	May (49)	Jun (26)	Jul (63)	Aug (33)	Sep (20)	Oct (153)	Nov (62)	Dec (20)
2015	Jan (6)	Feb (21)	Mar (42)	Apr (33)	May (76)	Jun (102)	Jul (39)	Aug	Sep	Oct	Nov	Dec