You can subscribe to this list here.
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
| 2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
| 2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-18 11:27:03
|
Hi all, I have added a script "install_sctk_patched.sh" in tools/extras for smooth sctk-2.4.0 installation under Cygwin. Corresponding tools/Makefile has changed to reflect it. Ricky On Tue, Jul 9, 2013 at 12:21 AM, Daniel Povey <dp...@gm...> wrote: > Thanks, everyone! > Dan > > > On Mon, Jul 8, 2013 at 4:59 AM, ondrej platek <ond...@se...> > wrote: > > I just check the results for my modified Voxforge-like recipe. > > Everything worked, training, decoding, evaluation. > > > > My configuration: Ubuntu 10.04, using OpenBlas and shared flag: > > ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install > > --fst-root=`pwd`/../tools/openfst --shared > > > > Ondra > > > > > > On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...> > > wrote: > >> > >> Simulated mode on online decoding demo run fine on CentOS too. > >> > >> Ricky > >> > >> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov > >> <vas...@gm...> wrote: > >>> > >>> The compilation(including "make ext") is working OK for me too on > Ubuntu > >>> 10.04. > >>> Only tried to run the online decoders(voxforge/online_demo) so far - > >>> everything seems to be fine with them. > >>> > >>> Vassil > >>> > >>> > >>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: > >>> > Everyone, > >>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej > >>> > Platek and others have been working on different build scripts that > >>> > now support a shared-library option. If anyone can test it and make > >>> > sure it still works for them it would be great. > >>> > If people have made local changes to their Makefiles they may get > >>> > conflicts. > >>> > Dan > >> > >> > > > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:52:02
|
Thanks.
Nathan
On Jul 16, 2013, at 12:39 PM, Daniel Povey wrote:
> The issue seems to be script incompatiblity: the yesno example is
> based on the older "s3" scripts. The "s5" ones are recommended and
> aren't compatible with the older ones.
>
> You could probably adapt the Switchboard setup, although you'd have to
> mess with the data preparation scripts a bit and figure out how to
> build the language model and the dictionary. The s5 scripts are all
> basically the same-- only the data preparation and things like the
> number of Gaussians differs.
> Dan
>
>
>
> On Tue, Jul 16, 2013 at 3:37 PM, Nathan Dunn <nd...@ca...> wrote:
>>
>> I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno. The more I read through this, the more I'm thinking that I need to rewrite it.
>>
>> Would you suggest basing it on switchboard?
>>
>> I have Switchboard-2 Phase II LDC97S62 versus LDC97S62 Switchboard-1 Release II. I'm assuming that it could be adapted without too much effort?
>>
>> Nathan
>>
>>
>> On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:
>>
>>> Are you using an older script? When I look at the current scripts
>>> (s5/), I see things like this:
>>>
>>> if [ $stage -le -3 ] && $train_tree; then
>>> echo "$0: Getting questions for tree clustering."
>>> # preparing questions, roots file...
>>> cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
>>> $dir/questions.int 2> $dir/log/questions.log || exit 1;
>>> cat $lang/phones/extra_questions.int >> $dir/questions.int
>>> compile-questions $context_opts $lang/topo $dir/questions.int
>>> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
>>> ...
>>>
>>> Where did you get this setup?
>>> Dan
>>>
>>>
>>> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>>>>
>>>> I'm having some issues compiling questions (# error below):
>>>>
>>>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>>>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>>>> ## this next line goes boom
>>>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>>>>
>>>> So the issue is that the topo only has the first 323 symbols. The differences it the disambiguation symbols (#0 . .. . #27). I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>>>>
>>>> I can of course remove the disambiguation symbols:
>>>>
>>>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>>
>>>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>>>>
>>>> Thanks,
>>>>
>>>> Nathan
>>>>
>>>>
>>>> === ERROR . . . compile_questions.log ===
>>>>
>>>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>>>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>>>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>>>>
>>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>>
>>>> [stack trace: ]
>>>> 0 compile-questions 0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>>>> 1 compile-questions 0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>>>> 2 compile-questions 0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>>>> 3 compile-questions 0x000000010966db79 main + 4409
>>>>
>>>>
>>>>
>>>>
>>>> Nathan
>>>>
>>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:39:46
|
The issue seems to be script incompatiblity: the yesno example is
based on the older "s3" scripts. The "s5" ones are recommended and
aren't compatible with the older ones.
You could probably adapt the Switchboard setup, although you'd have to
mess with the data preparation scripts a bit and figure out how to
build the language model and the dictionary. The s5 scripts are all
basically the same-- only the data preparation and things like the
number of Gaussians differs.
Dan
On Tue, Jul 16, 2013 at 3:37 PM, Nathan Dunn <nd...@ca...> wrote:
>
> I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno. The more I read through this, the more I'm thinking that I need to rewrite it.
>
> Would you suggest basing it on switchboard?
>
> I have Switchboard-2 Phase II LDC97S62 versus LDC97S62 Switchboard-1 Release II. I'm assuming that it could be adapted without too much effort?
>
> Nathan
>
>
> On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:
>
>> Are you using an older script? When I look at the current scripts
>> (s5/), I see things like this:
>>
>> if [ $stage -le -3 ] && $train_tree; then
>> echo "$0: Getting questions for tree clustering."
>> # preparing questions, roots file...
>> cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
>> $dir/questions.int 2> $dir/log/questions.log || exit 1;
>> cat $lang/phones/extra_questions.int >> $dir/questions.int
>> compile-questions $context_opts $lang/topo $dir/questions.int
>> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
>> ...
>>
>> Where did you get this setup?
>> Dan
>>
>>
>> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>>>
>>> I'm having some issues compiling questions (# error below):
>>>
>>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>>> ## this next line goes boom
>>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>>>
>>> So the issue is that the topo only has the first 323 symbols. The differences it the disambiguation symbols (#0 . .. . #27). I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>>>
>>> I can of course remove the disambiguation symbols:
>>>
>>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>>
>>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>>>
>>> Thanks,
>>>
>>> Nathan
>>>
>>>
>>> === ERROR . . . compile_questions.log ===
>>>
>>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>>>
>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>>
>>> [stack trace: ]
>>> 0 compile-questions 0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>>> 1 compile-questions 0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>>> 2 compile-questions 0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>>> 3 compile-questions 0x000000010966db79 main + 4409
>>>
>>>
>>>
>>>
>>> Nathan
>>>
>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:37:26
|
I adapted from something a grad student had written using a combination of rm/s5 and quite possibly yesno. The more I read through this, the more I'm thinking that I need to rewrite it.
Would you suggest basing it on switchboard?
I have Switchboard-2 Phase II LDC97S62 versus LDC97S62 Switchboard-1 Release II. I'm assuming that it could be adapted without too much effort?
Nathan
On Jul 16, 2013, at 12:24 PM, Daniel Povey wrote:
> Are you using an older script? When I look at the current scripts
> (s5/), I see things like this:
>
> if [ $stage -le -3 ] && $train_tree; then
> echo "$0: Getting questions for tree clustering."
> # preparing questions, roots file...
> cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
> $dir/questions.int 2> $dir/log/questions.log || exit 1;
> cat $lang/phones/extra_questions.int >> $dir/questions.int
> compile-questions $context_opts $lang/topo $dir/questions.int
> $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
> ...
>
> Where did you get this setup?
> Dan
>
>
> On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>>
>> I'm having some issues compiling questions (# error below):
>>
>> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
>> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
>> ## this next line goes boom
>> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>>
>> So the issue is that the topo only has the first 323 symbols. The differences it the disambiguation symbols (#0 . .. . #27). I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>>
>> I can of course remove the disambiguation symbols:
>>
>> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>>
>> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>>
>> Thanks,
>>
>> Nathan
>>
>>
>> === ERROR . . . compile_questions.log ===
>>
>> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
>> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
>> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>>
>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>>
>> [stack trace: ]
>> 0 compile-questions 0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
>> 1 compile-questions 0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
>> 2 compile-questions 0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
>> 3 compile-questions 0x000000010966db79 main + 4409
>>
>>
>>
>>
>> Nathan
>>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:24:25
|
Are you using an older script? When I look at the current scripts
(s5/), I see things like this:
if [ $stage -le -3 ] && $train_tree; then
echo "$0: Getting questions for tree clustering."
# preparing questions, roots file...
cluster-phones $context_opts $dir/treeacc $lang/phones/sets.int
$dir/questions.int 2> $dir/log/questions.log || exit 1;
cat $lang/phones/extra_questions.int >> $dir/questions.int
compile-questions $context_opts $lang/topo $dir/questions.int
$dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
...
Where did you get this setup?
Dan
On Tue, Jul 16, 2013 at 3:21 PM, Nathan Dunn <nd...@ca...> wrote:
>
> I'm having some issues compiling questions (# error below):
>
> cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
> cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
> scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
> ## this next line goes boom
> compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
>
> So the issue is that the topo only has the first 323 symbols. The differences it the disambiguation symbols (#0 . .. . #27). I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
>
> I can of course remove the disambiguation symbols:
>
> cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
>
> but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
>
> Thanks,
>
> Nathan
>
>
> === ERROR . . . compile_questions.log ===
>
> compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
> WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
> vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
>
> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
> ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
>
> [stack trace: ]
> 0 compile-questions 0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
> 1 compile-questions 0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
> 2 compile-questions 0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
> 3 compile-questions 0x000000010966db79 main + 4409
>
>
>
>
> Nathan
>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-16 19:21:16
|
I'm having some issues compiling questions (# error below):
cat $lang/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl $lang/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
## this next line goes boom
compile-questions $lang/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
So the issue is that the topo only has the first 323 symbols. The differences it the disambiguation symbols (#0 . .. . #27). I tried hacking the disambiguation phones into topo, but then I got complaints about my language directory.
I can of course remove the disambiguation symbols:
cat $lang/phones.txt | grep -v "^#" | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
but I'm not sure if that is the write thing to do in this instance, or if it is correct overall.
Thanks,
Nathan
=== ERROR . . . compile_questions.log ===
compile-questions data/lang/topo exp/tri1/questions.txt exp/tri1/questions.qst
WARNING (compile-questions:ProcessTopo():compile-questions.cc:36) ProcessTopo: phones seen in questions differ from those in topology: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 ]
vs. [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 ]
ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
ERROR (compile-questions:ProcessTopo():compile-questions.cc:39) ProcessTopo: phones are asked about that are undefined in the topology.
[stack trace: ]
0 compile-questions 0x0000000109711f2b _ZN5kaldi18KaldiGetStackTraceEv + 59
1 compile-questions 0x00000001097122c1 _ZN5kaldi17KaldiErrorMessageD1Ev + 241
2 compile-questions 0x000000010966c4c4 _ZN5kaldi11ProcessTopoERKNS_11HmmTopologyERKSt6vectorIS3_IiSaIiEESaIS5_EE + 1284
3 compile-questions 0x000000010966db79 main + 4409
Nathan
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 23:25:32
|
Alright, I updated the output, which looks closer to what I want, but I'm a little unclear how to pull stuff out of this: lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-to-phone-lattice exp/tri2a/final.mdl ark:- ark,t:- | utils/int2sym.pl -f 3 g300_lang/phones.txt |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 23:18:50
|
Something is definitely wrong there. You shouldn't see something with
an _E suffix right at the start like that, if it's the only phone in a
word it should have the singleton _S suffix, or if it doesn't have a
word symbol it should have no suffix at all. I suspect you may have
built the system with a different phone set, or the word-boundary info
is very wrong.
Dan
On Thu, Jul 11, 2013 at 7:03 PM, Nathan Dunn <nd...@ca...> wrote:
>
> Alright, I updated the output, which looks closer to what I want, but I'm a little unclear how to pull stuff out of this:
>
> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-to-phone-lattice exp/tri2a/final.mdl ark:- ark,t:- | utils/int2sym.pl -f 3 g300_lang/phones.txt
>
>
>
>
> The first few lines look like this where "02.cut1-1" is the name of the transcript:
>
> 02.cut1-1
> 0 1 SEE_TRANSCRIPT_E 14.9888,31091.3,2960_2962_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961
> 1 2 END_CROSSTALK_NOISE_E 0,0,656_655_655_655_655_655_655_655_655_655_655_655_655_655_706_705_705_705
> 2 3 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 3 4 END_MICROPHONE_NOISE_I 3.56562,5210.82,854_853_872
> 4 5 YAWN_B 0,0,114_113_113_113_178_177_177_177_177_177_177_177_177_177
> 5 6 END_YAWN_B 0,0,2008_2007_2007_2074_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073
> 6 7 SEE_TRANSCRIPT_E 11.9189,5022.05,2960_2959_2959_2962_2961_2961
> 7 8 END_NOISE_B 0,0,952_951_951_951_951_996_995_995_995_995_995_995_995
> 8 9 END_YAWN_B 0,0,1958_1957_1957_2036_2035_2035_2035
> 9 10 END_HUMAN_NOISE 0,0,1540_1539_1539_1539_1539_1539_1539_1539_1594_1593_1593
> 10 11 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 11 12 END_MICROPHONE_NOISE_I 7.45918,2101.25,854_872
>
>
> Nathan
>
> On Jul 10, 2013, at 10:10 PM, Daniel Povey wrote:
>
>> It's possible that your word_boundary.txt is OK.
>> You could try to get the one best from the lattice using lattice-1best
>> (I think), get the phone sequence from the 1-best lattice using
>> lat-to-phones (I think), doing output in text form using ark,t:- and
>> then get the text form of the phone-level lattice using
>> utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
>> see if the sequence of phonemes looks reasonable for the word sequence
>> you have.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> I think that was part of it. I fixed one problem with the oov.txt / oov.int
>>>
>>> I'll try to recompile that bug fix and see if that works. Its possible that I'm creating word_boundaries incorrectly. How many entries would you expect to get (I am getting 315). I wonder if I am using word_boundaries for the wrong set of phones . .
>>>
>>> Checking g300_lang/phones.txt ...
>>> --> g300_lang/phones.txt is OK
>>>
>>> Checking words.txt: #0 ...
>>> --> g300_lang/words.txt has "#0"
>>> --> g300_lang/words.txt is OK
>>>
>>> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/disambig.{txt, int, csl} ...
>>> --> 28 entry/entries in g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
>>> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/silence.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
>>> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/extra_questions.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>>>
>>> Checking g300_lang/phones/roots.{txt, int} ...
>>> --> 75 entry/entries in g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.{txt, int} are OK
>>>
>>> Checking g300_lang/phones/sets.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/sets.int
>>>
>>> Checking g300_lang/phones/word_boundary.{txt, int} ...
>>> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.{txt, int} are OK
>>>
>>> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
>>> --> silence.txt and nonsilence.txt are disjoint
>>> --> silence.txt and disambig.txt are disjoint
>>> --> disambig.txt and nonsilence.txt are disjoint
>>> --> disjoint property is OK
>>>
>>> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> summation property is OK
>>>
>>> Checking optional_silence.txt ...
>>> --> reading g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.txt is OK
>>>
>>> Checking disambiguation symbols: #0 and #1
>>> --> g300_lang/phones/disambig.txt has "#0" and "#1"
>>> --> g300_lang/phones/disambig.txt is OK
>>>
>>> Checking topo ...
>>> --> g300_lang/topo's nonsilence section is OK
>>> --> g300_lang/topo's silence section is OK
>>> --> g300_lang/topo is OK
>>>
>>> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
>>> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
>>> --> g300_lang/phones/word_boundary.txt is OK
>>> --> checking L.fst and L_disambig.fst...
>>> --> generating a 46 words sequence
>>> --> resulting phone sequence from L.fst corresponds to the word sequence
>>> --> L.fst is OK
>>> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
>>> --> L_disambig.fst is OK
>>>
>>> Checking g300_lang/oov.{txt, int} ...
>>> --> 1 entry/entries in g300_lang/oov.txt
>>> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
>>> --> g300_lang/oov.{txt, int} are OK
>>>
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>>>
>>>> OK-- so the word-alignment seems to have failed. Generally that is
>>>> because of invalid word-boundary information. That file is indexed by
>>>> phones, not words. Issues can include a mismatch in phone set; words
>>>> that don't have any phones in them; or phones that have only one state
>>>> in their topology (this is a bug that was recently fixed, those should
>>>> work now if you update and recompile).
>>>> That program should not generally output any warnings, if all is OK.
>>>> Try to use the program utils/validate_lang.pl to make sure your
>>>> g300_lang/ directory is OK.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>>>
>>>>> Sorry, and it ends with this:
>>>>>
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut2
>>>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>>>> path, 0 had errors.
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut3
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>>>> aligned 0 lattices; 132 had errors.
>>>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>>>> to ctm format; 0 had errors.
>>>>> ndunn:childspeech%
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>>>
>>>>>
>>>>> The std err output is this:
>>>>>
>>>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>>>> exp/tri2a/ctm2/output.txt
>>>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>>>> ark:-
>>>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>>>> ark:- ark:-
>>>>> nbest-to-ctm ark:- -
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>>
>>>>>
>>>>> Nathan Dunn, Ph.D.
>>>>> Scientific Programer
>>>>> College of Arts and Science IT
>>>>> 541-221-2418
>>>>> nd...@ca...
>>>>>
>>>>>
>>>>>
>>>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>>>
>>>>> Can you provide the logging output, at least some representative lines
>>>>> from it. Are there any warnings?
>>>>> Dan
>>>>>
>>>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>>>> Communication and Updates <kal...@li...> wrote:
>>>>>
>>>>>
>>>>> I'm trying to get word timing information out of a successfully trained
>>>>> language model that I've already been able to successfully decode with
>>>>> following these instructions.
>>>>>
>>>>>
>>>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>>>
>>>>>
>>>>> This is command I've run:
>>>>>
>>>>>
>>>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>>>
>>>>>
>>>>>
>>>>> The problem is that I only have one entry per transcript (these transcripts
>>>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>>>> input. the
>>>>>
>>>>>
>>>>> 02.cut1 1 0.00 67.11 I
>>>>>
>>>>> 02.cut2 1 0.00 62.44 HIS
>>>>>
>>>>> 02.cut3 1 0.00 65.76 MOUNT
>>>>>
>>>>> 03.cut1 1 0.00 62.62 I
>>>>>
>>>>> 03.cut2 1 0.00 62.41 WHO
>>>>>
>>>>> 03.cut3 1 0.00 63.72 I
>>>>>
>>>>> 06.cut1 1 0.00 62.13 STANDING
>>>>>
>>>>> 06.cut2 1 0.00 57.95 A
>>>>>
>>>>> 06.cut3 1 0.00 66.78 I
>>>>>
>>>>> . . .
>>>>>
>>>>> What I want is the things for each word:
>>>>>
>>>>> 02.cut1 1 0.00 43.7 YOU
>>>>>
>>>>> 02.cut1 1 81.2 121.3 ARE
>>>>>
>>>>> 02.cut1 1 145.4 163.8 STANDING
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>>>
>>>>> 1 nonword
>>>>>
>>>>> 2 begin
>>>>>
>>>>> 3 end
>>>>>
>>>>> 4 internal
>>>>>
>>>>> 5 singleton
>>>>>
>>>>> 6 nonword
>>>>>
>>>>> 7 begin
>>>>>
>>>>> 8 end
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>>
>>>>> Any help is much appreciated.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> See everything from the browser to the database with AppDynamics
>>>>>
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>
>>>>> Start your free trial of AppDynamics Pro today!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> Kaldi-users mailing list
>>>>>
>>>>> Kal...@li...
>>>>>
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>>
>>>>>
>>>>>
>>>
>
>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 05:11:07
|
It's possible that your word_boundary.txt is OK.
You could try to get the one best from the lattice using lattice-1best
(I think), get the phone sequence from the 1-best lattice using
lat-to-phones (I think), doing output in text form using ark,t:- and
then get the text form of the phone-level lattice using
utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
see if the sequence of phonemes looks reasonable for the word sequence
you have.
Dan
On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>
> I think that was part of it. I fixed one problem with the oov.txt / oov.int
>
> I'll try to recompile that bug fix and see if that works. Its possible that I'm creating word_boundaries incorrectly. How many entries would you expect to get (I am getting 315). I wonder if I am using word_boundaries for the wrong set of phones . .
>
> Checking g300_lang/phones.txt ...
> --> g300_lang/phones.txt is OK
>
> Checking words.txt: #0 ...
> --> g300_lang/words.txt has "#0"
> --> g300_lang/words.txt is OK
>
> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>
> Checking g300_lang/phones/disambig.{txt, int, csl} ...
> --> 28 entry/entries in g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>
> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/silence.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/extra_questions.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>
> Checking g300_lang/phones/roots.{txt, int} ...
> --> 75 entry/entries in g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.{txt, int} are OK
>
> Checking g300_lang/phones/sets.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/sets.int
>
> Checking g300_lang/phones/word_boundary.{txt, int} ...
> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.{txt, int} are OK
>
> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
> --> silence.txt and nonsilence.txt are disjoint
> --> silence.txt and disambig.txt are disjoint
> --> disambig.txt and nonsilence.txt are disjoint
> --> disjoint property is OK
>
> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
> --> summation property is OK
>
> Checking optional_silence.txt ...
> --> reading g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.txt is OK
>
> Checking disambiguation symbols: #0 and #1
> --> g300_lang/phones/disambig.txt has "#0" and "#1"
> --> g300_lang/phones/disambig.txt is OK
>
> Checking topo ...
> --> g300_lang/topo's nonsilence section is OK
> --> g300_lang/topo's silence section is OK
> --> g300_lang/topo is OK
>
> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
> --> g300_lang/phones/word_boundary.txt is OK
> --> checking L.fst and L_disambig.fst...
> --> generating a 46 words sequence
> --> resulting phone sequence from L.fst corresponds to the word sequence
> --> L.fst is OK
> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
> --> L_disambig.fst is OK
>
> Checking g300_lang/oov.{txt, int} ...
> --> 1 entry/entries in g300_lang/oov.txt
> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
> --> g300_lang/oov.{txt, int} are OK
>
>
>
> Nathan
>
> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>
>> OK-- so the word-alignment seems to have failed. Generally that is
>> because of invalid word-boundary information. That file is indexed by
>> phones, not words. Issues can include a mismatch in phone set; words
>> that don't have any phones in them; or phones that have only one state
>> in their topology (this is a bug that was recently fixed, those should
>> work now if you update and recompile).
>> That program should not generally output any warnings, if all is OK.
>> Try to use the program utils/validate_lang.pl to make sure your
>> g300_lang/ directory is OK.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> Sorry, and it ends with this:
>>>
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut2
>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>> path, 0 had errors.
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut3
>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>> aligned 0 lattices; 132 had errors.
>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>> to ctm format; 0 had errors.
>>> ndunn:childspeech%
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>
>>>
>>> The std err output is this:
>>>
>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>> exp/tri2a/ctm2/output.txt
>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>> ark:-
>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>> ark:- ark:-
>>> nbest-to-ctm ark:- -
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>
>>>
>>> Nathan Dunn, Ph.D.
>>> Scientific Programer
>>> College of Arts and Science IT
>>> 541-221-2418
>>> nd...@ca...
>>>
>>>
>>>
>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>
>>> Can you provide the logging output, at least some representative lines
>>> from it. Are there any warnings?
>>> Dan
>>>
>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>> Communication and Updates <kal...@li...> wrote:
>>>
>>>
>>> I'm trying to get word timing information out of a successfully trained
>>> language model that I've already been able to successfully decode with
>>> following these instructions.
>>>
>>>
>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>
>>>
>>> This is command I've run:
>>>
>>>
>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>
>>>
>>>
>>> The problem is that I only have one entry per transcript (these transcripts
>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>> input. the
>>>
>>>
>>> 02.cut1 1 0.00 67.11 I
>>>
>>> 02.cut2 1 0.00 62.44 HIS
>>>
>>> 02.cut3 1 0.00 65.76 MOUNT
>>>
>>> 03.cut1 1 0.00 62.62 I
>>>
>>> 03.cut2 1 0.00 62.41 WHO
>>>
>>> 03.cut3 1 0.00 63.72 I
>>>
>>> 06.cut1 1 0.00 62.13 STANDING
>>>
>>> 06.cut2 1 0.00 57.95 A
>>>
>>> 06.cut3 1 0.00 66.78 I
>>>
>>> . . .
>>>
>>> What I want is the things for each word:
>>>
>>> 02.cut1 1 0.00 43.7 YOU
>>>
>>> 02.cut1 1 81.2 121.3 ARE
>>>
>>> 02.cut1 1 145.4 163.8 STANDING
>>>
>>> . . .
>>>
>>>
>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>
>>> 1 nonword
>>>
>>> 2 begin
>>>
>>> 3 end
>>>
>>> 4 internal
>>>
>>> 5 singleton
>>>
>>> 6 nonword
>>>
>>> 7 begin
>>>
>>> 8 end
>>>
>>> . . .
>>>
>>>
>>>
>>> Any help is much appreciated.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Nathan
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> See everything from the browser to the database with AppDynamics
>>>
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>
>>> _______________________________________________
>>>
>>> Kaldi-users mailing list
>>>
>>> Kal...@li...
>>>
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>>
>>>
>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 05:00:55
|
I think that was part of it. I fixed one problem with the oov.txt / oov.int
I'll try to recompile that bug fix and see if that works. Its possible that I'm creating word_boundaries incorrectly. How many entries would you expect to get (I am getting 315). I wonder if I am using word_boundaries for the wrong set of phones . .
Checking g300_lang/phones.txt ...
--> g300_lang/phones.txt is OK
Checking words.txt: #0 ...
--> g300_lang/words.txt has "#0"
--> g300_lang/words.txt is OK
Checking g300_lang/phones/context_indep.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.{txt, int, csl} are OK
Checking g300_lang/phones/disambig.{txt, int, csl} ...
--> 28 entry/entries in g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.{txt, int, csl} are OK
Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
--> 240 entry/entries in g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.{txt, int, csl} are OK
Checking g300_lang/phones/silence.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/silence.txt
--> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.{txt, int, csl} are OK
Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.{txt, int, csl} are OK
Checking g300_lang/phones/extra_questions.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/extra_questions.txt
Checking g300_lang/phones/roots.{txt, int} ...
--> 75 entry/entries in g300_lang/phones/roots.txt
--> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
--> g300_lang/phones/roots.{txt, int} are OK
Checking g300_lang/phones/sets.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/sets.int
Checking g300_lang/phones/word_boundary.{txt, int} ...
--> 315 entry/entries in g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.{txt, int} are OK
Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK
Checking optional_silence.txt ...
--> reading g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> g300_lang/phones/disambig.txt has "#0" and "#1"
--> g300_lang/phones/disambig.txt is OK
Checking topo ...
--> g300_lang/topo's nonsilence section is OK
--> g300_lang/topo's silence section is OK
--> g300_lang/topo is OK
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> g300_lang/phones/word_boundary.txt is OK
--> checking L.fst and L_disambig.fst...
--> generating a 46 words sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
Checking g300_lang/oov.{txt, int} ...
--> 1 entry/entries in g300_lang/oov.txt
--> g300_lang/oov.int corresponds to g300_lang/oov.txt
--> g300_lang/oov.{txt, int} are OK
Nathan
On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
> OK-- so the word-alignment seems to have failed. Generally that is
> because of invalid word-boundary information. That file is indexed by
> phones, not words. Issues can include a mismatch in phone set; words
> that don't have any phones in them; or phones that have only one state
> in their topology (this is a bug that was recently fixed, those should
> work now if you update and recompile).
> That program should not generally output any warnings, if all is OK.
> Try to use the program utils/validate_lang.pl to make sure your
> g300_lang/ directory is OK.
>
> Dan
>
>
> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>
>> Sorry, and it ends with this:
>>
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut2
>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>> path, 0 had errors.
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut3
>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>> aligned 0 lattices; 132 had errors.
>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>> to ctm format; 0 had errors.
>> ndunn:childspeech%
>>
>>
>> Nathan
>>
>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>
>>
>> The std err output is this:
>>
>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>> exp/tri2a/ctm2/output.txt
>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>> ark:-
>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>> ark:- ark:-
>> nbest-to-ctm ark:- -
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>>
>>
>> Nathan Dunn, Ph.D.
>> Scientific Programer
>> College of Arts and Science IT
>> 541-221-2418
>> nd...@ca...
>>
>>
>>
>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>
>> Can you provide the logging output, at least some representative lines
>> from it. Are there any warnings?
>> Dan
>>
>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>>
>>
>> I'm trying to get word timing information out of a successfully trained
>> language model that I've already been able to successfully decode with
>> following these instructions.
>>
>>
>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>
>>
>> This is command I've run:
>>
>>
>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>
>>
>>
>> The problem is that I only have one entry per transcript (these transcripts
>> are 1 minute long) and I don't see any bearing on this relative to the word
>> input. the
>>
>>
>> 02.cut1 1 0.00 67.11 I
>>
>> 02.cut2 1 0.00 62.44 HIS
>>
>> 02.cut3 1 0.00 65.76 MOUNT
>>
>> 03.cut1 1 0.00 62.62 I
>>
>> 03.cut2 1 0.00 62.41 WHO
>>
>> 03.cut3 1 0.00 63.72 I
>>
>> 06.cut1 1 0.00 62.13 STANDING
>>
>> 06.cut2 1 0.00 57.95 A
>>
>> 06.cut3 1 0.00 66.78 I
>>
>> . . .
>>
>> What I want is the things for each word:
>>
>> 02.cut1 1 0.00 43.7 YOU
>>
>> 02.cut1 1 81.2 121.3 ARE
>>
>> 02.cut1 1 145.4 163.8 STANDING
>>
>> . . .
>>
>>
>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>
>> 1 nonword
>>
>> 2 begin
>>
>> 3 end
>>
>> 4 internal
>>
>> 5 singleton
>>
>> 6 nonword
>>
>> 7 begin
>>
>> 8 end
>>
>> . . .
>>
>>
>>
>> Any help is much appreciated.
>>
>>
>> Thanks,
>>
>>
>> Nathan
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> See everything from the browser to the database with AppDynamics
>>
>> Get end-to-end visibility with application monitoring from AppDynamics
>>
>> Isolate bottlenecks and diagnose root cause in seconds.
>>
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>>
>> Kaldi-users mailing list
>>
>> Kal...@li...
>>
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>
>>
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:12:54
|
OK-- so the word-alignment seems to have failed. Generally that is because of invalid word-boundary information. That file is indexed by phones, not words. Issues can include a mismatch in phone set; words that don't have any phones in them; or phones that have only one state in their topology (this is a bug that was recently fixed, those should work now if you update and recompile). That program should not generally output any warnings, if all is OK. Try to use the program utils/validate_lang.pl to make sure your g300_lang/ directory is OK. Dan On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote: > > Sorry, and it ends with this: > > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 98.cut1 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 98.cut2 > LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best > path, 0 had errors. > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 98.cut3 > LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully > aligned 0 lattices; 132 had errors. > LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices > to ctm format; 0 had errors. > ndunn:childspeech% > > > Nathan > > On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote: > > > The std err output is this: > > ndunn:childspeech% lattice-1best "ark:gunzip -c > exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words > g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | > nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > > exp/tri2a/ctm2/output.txt > lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|' > ark:- > lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl > ark:- ark:- > nbest-to-ctm ark:- - > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 02.cut1 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 02.cut2 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 02.cut3 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 03.cut1 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 03.cut2 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting > partial lattice for 03.cut3 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) > Invalid word at end of lattice [partial lattice, forced out?] > > > Nathan Dunn, Ph.D. > Scientific Programer > College of Arts and Science IT > 541-221-2418 > nd...@ca... > > > > On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote: > > Can you provide the logging output, at least some representative lines > from it. Are there any warnings? > Dan > > On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User > Communication and Updates <kal...@li...> wrote: > > > I'm trying to get word timing information out of a successfully trained > language model that I've already been able to successfully decode with > following these instructions. > > > https://sourceforge.net/mailarchive/message.php?msg_id=30729903 > > > This is command I've run: > > > lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" > ark:- | lattice-align-words g300_lang/phones/word_boundary.int > exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f > 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt > > > > The problem is that I only have one entry per transcript (these transcripts > are 1 minute long) and I don't see any bearing on this relative to the word > input. the > > > 02.cut1 1 0.00 67.11 I > > 02.cut2 1 0.00 62.44 HIS > > 02.cut3 1 0.00 65.76 MOUNT > > 03.cut1 1 0.00 62.62 I > > 03.cut2 1 0.00 62.41 WHO > > 03.cut3 1 0.00 63.72 I > > 06.cut1 1 0.00 62.13 STANDING > > 06.cut2 1 0.00 57.95 A > > 06.cut3 1 0.00 66.78 I > > . . . > > What I want is the things for each word: > > 02.cut1 1 0.00 43.7 YOU > > 02.cut1 1 81.2 121.3 ARE > > 02.cut1 1 145.4 163.8 STANDING > > . . . > > > The words.txt is 116K, but word_boundary.int has only 316 entries like this: > > 1 nonword > > 2 begin > > 3 end > > 4 internal > > 5 singleton > > 6 nonword > > 7 begin > > 8 end > > . . . > > > > Any help is much appreciated. > > > Thanks, > > > Nathan > > > > > ------------------------------------------------------------------------------ > > See everything from the browser to the database with AppDynamics > > Get end-to-end visibility with application monitoring from AppDynamics > > Isolate bottlenecks and diagnose root cause in seconds. > > Start your free trial of AppDynamics Pro today! > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:07:05
|
Sorry, and it ends with this: WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut1 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut2 LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best path, 0 had errors. WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 98.cut3 LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully aligned 0 lattices; 132 had errors. LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices to ctm format; 0 had errors. ndunn:childspeech% Nathan On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote: > > The std err output is this: > > ndunn:childspeech% lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt > lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|' ark:- > lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- > nbest-to-ctm ark:- - > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut1 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut2 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut3 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut1 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut2 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut3 > WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] > > > Nathan Dunn, Ph.D. > Scientific Programer > College of Arts and Science IT > 541-221-2418 > nd...@ca... > > > > On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote: > >> Can you provide the logging output, at least some representative lines >> from it. Are there any warnings? >> Dan >> >> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User >> Communication and Updates <kal...@li...> wrote: >>> >>> I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions. >>> >>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903 >>> >>> This is command I've run: >>> >>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt >>> >>> >>> The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input. the >>> >>> 02.cut1 1 0.00 67.11 I >>> 02.cut2 1 0.00 62.44 HIS >>> 02.cut3 1 0.00 65.76 MOUNT >>> 03.cut1 1 0.00 62.62 I >>> 03.cut2 1 0.00 62.41 WHO >>> 03.cut3 1 0.00 63.72 I >>> 06.cut1 1 0.00 62.13 STANDING >>> 06.cut2 1 0.00 57.95 A >>> 06.cut3 1 0.00 66.78 I >>> . . . >>> What I want is the things for each word: >>> 02.cut1 1 0.00 43.7 YOU >>> 02.cut1 1 81.2 121.3 ARE >>> 02.cut1 1 145.4 163.8 STANDING >>> . . . >>> >>> The words.txt is 116K, but word_boundary.int has only 316 entries like this: >>> 1 nonword >>> 2 begin >>> 3 end >>> 4 internal >>> 5 singleton >>> 6 nonword >>> 7 begin >>> 8 end >>> . . . >>> >>> >>> Any help is much appreciated. >>> >>> Thanks, >>> >>> Nathan >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 04:06:35
|
The std err output is this: ndunn:childspeech% lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|' ark:- lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- nbest-to-ctm ark:- - WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut1 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut2 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 02.cut3 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut1 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut2 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting partial lattice for 03.cut3 WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541) Invalid word at end of lattice [partial lattice, forced out?] Nathan Dunn, Ph.D. Scientific Programer College of Arts and Science IT 541-221-2418 nd...@ca... On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote: > Can you provide the logging output, at least some representative lines > from it. Are there any warnings? > Dan > > On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User > Communication and Updates <kal...@li...> wrote: >> >> I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions. >> >> https://sourceforge.net/mailarchive/message.php?msg_id=30729903 >> >> This is command I've run: >> >> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt >> >> >> The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input. the >> >> 02.cut1 1 0.00 67.11 I >> 02.cut2 1 0.00 62.44 HIS >> 02.cut3 1 0.00 65.76 MOUNT >> 03.cut1 1 0.00 62.62 I >> 03.cut2 1 0.00 62.41 WHO >> 03.cut3 1 0.00 63.72 I >> 06.cut1 1 0.00 62.13 STANDING >> 06.cut2 1 0.00 57.95 A >> 06.cut3 1 0.00 66.78 I >> . . . >> What I want is the things for each word: >> 02.cut1 1 0.00 43.7 YOU >> 02.cut1 1 81.2 121.3 ARE >> 02.cut1 1 145.4 163.8 STANDING >> . . . >> >> The words.txt is 116K, but word_boundary.int has only 316 entries like this: >> 1 nonword >> 2 begin >> 3 end >> 4 internal >> 5 singleton >> 6 nonword >> 7 begin >> 8 end >> . . . >> >> >> Any help is much appreciated. >> >> Thanks, >> >> Nathan >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 03:45:23
|
Can you provide the logging output, at least some representative lines from it. Are there any warnings? Dan On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > > I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions. > > https://sourceforge.net/mailarchive/message.php?msg_id=30729903 > > This is command I've run: > > lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt > > > The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input. the > > 02.cut1 1 0.00 67.11 I > 02.cut2 1 0.00 62.44 HIS > 02.cut3 1 0.00 65.76 MOUNT > 03.cut1 1 0.00 62.62 I > 03.cut2 1 0.00 62.41 WHO > 03.cut3 1 0.00 63.72 I > 06.cut1 1 0.00 62.13 STANDING > 06.cut2 1 0.00 57.95 A > 06.cut3 1 0.00 66.78 I > . . . > What I want is the things for each word: > 02.cut1 1 0.00 43.7 YOU > 02.cut1 1 81.2 121.3 ARE > 02.cut1 1 145.4 163.8 STANDING > . . . > > The words.txt is 116K, but word_boundary.int has only 316 entries like this: > 1 nonword > 2 begin > 3 end > 4 internal > 5 singleton > 6 nonword > 7 begin > 8 end > . . . > > > Any help is much appreciated. > > Thanks, > > Nathan > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 03:39:05
|
I'm trying to get word timing information out of a successfully trained language model that I've already been able to successfully decode with following these instructions. https://sourceforge.net/mailarchive/message.php?msg_id=30729903 This is command I've run: lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt The problem is that I only have one entry per transcript (these transcripts are 1 minute long) and I don't see any bearing on this relative to the word input. the 02.cut1 1 0.00 67.11 I 02.cut2 1 0.00 62.44 HIS 02.cut3 1 0.00 65.76 MOUNT 03.cut1 1 0.00 62.62 I 03.cut2 1 0.00 62.41 WHO 03.cut3 1 0.00 63.72 I 06.cut1 1 0.00 62.13 STANDING 06.cut2 1 0.00 57.95 A 06.cut3 1 0.00 66.78 I . . . What I want is the things for each word: 02.cut1 1 0.00 43.7 YOU 02.cut1 1 81.2 121.3 ARE 02.cut1 1 145.4 163.8 STANDING . . . The words.txt is 116K, but word_boundary.int has only 316 entries like this: 1 nonword 2 begin 3 end 4 internal 5 singleton 6 nonword 7 begin 8 end . . . Any help is much appreciated. Thanks, Nathan |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:39:50
|
Hi all, if you do not want read the instruction from mail. There are in more pleasant form at https://github.com/oplatek/pykaldi/blob/master/src/python-kaldi-decoding/pykaldi/binutils/README.md The README.md should stay up-to-date. Ondra On Mon, Jul 8, 2013 at 6:25 PM, ondrej platek <ond...@se...>wrote: > Hi all, > > I would like to thank you for implementing the kaldi compilation to shared > libraries and merging them to trunk. > It allows me to build python bindings easily for Kaldi decoders using cffi > library (http://cffi.readthedocs.org/en/latest/.) > > So far, I managed to setup decoding example, which is based on Voxforge > Online demo. > All the C++ Kaldi functionality is called from Python via cffi. > > In order to try it, follow the steps below: > > # 1. > svn checkout svn+ssh:// > op...@sv.../p/kaldi/code/sandbox/oplatek2 # Change your > username > > # 2. INSTALL portaudio and cffi. > # For portaudio: > cd oplatek2/tools; ./install_portaudio.sh > # For cffi you have options a) or b) > # a) Go to http://cffi.readthedocs.org/en/latest/ and following the > instructions install the cffi system wide! (Recommended) > # Read the Requirements section! > # b) Go to oplatek2/tools and install cffi locally by > using install_cffi.sh. > # After a successful installation the script prompts you to add the > installation directory to PYTHONPATH. > # Do it, it will be needed in step 7. > > # 3. > cd oplatek2/src > > # 4. Configure it with --shared flag > ./configure --fst-root=`pwd`/../tools/openfst --shared > > # 5. Build Kaldi. Clean it and tested to be sure that, it is not corrupted. > make clean; make depend && make ext_depend && make && make ext && make > test && make ext_test > > # 6. change to the directory with the example > cd python-kaldi-decoding/pykaldi/binutils/ > > # 7. run make test, it should compile and downloaded everything needed > make test > > # 8. Check the results! My results for python-online-wav-gmm-decode-faster > are: > > python-compute-wer --config=configs/wer.config ark:work/reference.txt > ark:work/online.trans.compact > %WER 15.03 [ 55 / 366, 6 ins, 15 del, 34 sub ] > %SER 100.00 [ 3 / 3 ] > Scored 3 sentences, 0 not present in hyp. > > > Any feedback is welcome! > > I am committing to https://github.com/oplatek/pykaldi . > To svn.code.sf.net/p/kaldi/code/sandbox/oplatek2 I will commit just major > updates which should not break things. > > Cheers, > > Ondra > > > On Mon, Jul 8, 2013 at 10:59 AM, ondrej platek <ond...@se...>wrote: > >> I just check the results for my modified Voxforge-like recipe. >> Everything worked, training, decoding, evaluation. >> >> My configuration: Ubuntu 10.04, using OpenBlas and shared flag: >> ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install >> --fst-root=`pwd`/../tools/openfst --shared >> >> Ondra >> >> >> On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote: >> >>> Simulated mode on online decoding demo run fine on CentOS too. >>> >>> Ricky >>> >>> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov < >>> vas...@gm...> wrote: >>> >>>> The compilation(including "make ext") is working OK for me too on >>>> Ubuntu 10.04. >>>> Only tried to run the online decoders(voxforge/online_demo) so far - >>>> everything seems to be fine with them. >>>> >>>> Vassil >>>> >>>> >>>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: >>>> > Everyone, >>>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej >>>> > Platek and others have been working on different build scripts that >>>> > now support a shared-library option. If anyone can test it and make >>>> > sure it still works for them it would be great. >>>> > If people have made local changes to their Makefiles they may get >>>> conflicts. >>>> > Dan >>>> >>> >>> >> > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:26:04
|
Hi all, I would like to thank you for implementing the kaldi compilation to shared libraries and merging them to trunk. It allows me to build python bindings easily for Kaldi decoders using cffi library (http://cffi.readthedocs.org/en/latest/.) So far, I managed to setup decoding example, which is based on Voxforge Online demo. All the C++ Kaldi functionality is called from Python via cffi. In order to try it, follow the steps below: # 1. svn checkout svn+ssh://oplatek@svn.code.sf.net/p/kaldi/code/sandbox/oplatek2 # Change your username # 2. INSTALL portaudio and cffi. # For portaudio: cd oplatek2/tools; ./install_portaudio.sh # For cffi you have options a) or b) # a) Go to http://cffi.readthedocs.org/en/latest/ and following the instructions install the cffi system wide! (Recommended) # Read the Requirements section! # b) Go to oplatek2/tools and install cffi locally by using install_cffi.sh. # After a successful installation the script prompts you to add the installation directory to PYTHONPATH. # Do it, it will be needed in step 7. # 3. cd oplatek2/src # 4. Configure it with --shared flag ./configure --fst-root=`pwd`/../tools/openfst --shared # 5. Build Kaldi. Clean it and tested to be sure that, it is not corrupted. make clean; make depend && make ext_depend && make && make ext && make test && make ext_test # 6. change to the directory with the example cd python-kaldi-decoding/pykaldi/binutils/ # 7. run make test, it should compile and downloaded everything needed make test # 8. Check the results! My results for python-online-wav-gmm-decode-faster are: python-compute-wer --config=configs/wer.config ark:work/reference.txt ark:work/online.trans.compact %WER 15.03 [ 55 / 366, 6 ins, 15 del, 34 sub ] %SER 100.00 [ 3 / 3 ] Scored 3 sentences, 0 not present in hyp. Any feedback is welcome! I am committing to https://github.com/oplatek/pykaldi . To svn.code.sf.net/p/kaldi/code/sandbox/oplatek2 I will commit just major updates which should not break things. Cheers, Ondra On Mon, Jul 8, 2013 at 10:59 AM, ondrej platek <ond...@se...>wrote: > I just check the results for my modified Voxforge-like recipe. > Everything worked, training, decoding, evaluation. > > My configuration: Ubuntu 10.04, using OpenBlas and shared flag: > ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install > --fst-root=`pwd`/../tools/openfst --shared > > Ondra > > > On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote: > >> Simulated mode on online decoding demo run fine on CentOS too. >> >> Ricky >> >> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov < >> vas...@gm...> wrote: >> >>> The compilation(including "make ext") is working OK for me too on Ubuntu >>> 10.04. >>> Only tried to run the online decoders(voxforge/online_demo) so far - >>> everything seems to be fine with them. >>> >>> Vassil >>> >>> >>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: >>> > Everyone, >>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej >>> > Platek and others have been working on different build scripts that >>> > now support a shared-library option. If anyone can test it and make >>> > sure it still works for them it would be great. >>> > If people have made local changes to their Makefiles they may get >>> conflicts. >>> > Dan >>> >> >> > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 16:21:35
|
Thanks, everyone! Dan On Mon, Jul 8, 2013 at 4:59 AM, ondrej platek <ond...@se...> wrote: > I just check the results for my modified Voxforge-like recipe. > Everything worked, training, decoding, evaluation. > > My configuration: Ubuntu 10.04, using OpenBlas and shared flag: > ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install > --fst-root=`pwd`/../tools/openfst --shared > > Ondra > > > On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...> > wrote: >> >> Simulated mode on online decoding demo run fine on CentOS too. >> >> Ricky >> >> On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov >> <vas...@gm...> wrote: >>> >>> The compilation(including "make ext") is working OK for me too on Ubuntu >>> 10.04. >>> Only tried to run the online decoders(voxforge/online_demo) so far - >>> everything seems to be fine with them. >>> >>> Vassil >>> >>> >>> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: >>> > Everyone, >>> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej >>> > Platek and others have been working on different build scripts that >>> > now support a shared-library option. If anyone can test it and make >>> > sure it still works for them it would be great. >>> > If people have made local changes to their Makefiles they may get >>> > conflicts. >>> > Dan >> >> > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 08:59:36
|
I just check the results for my modified Voxforge-like recipe. Everything worked, training, decoding, evaluation. My configuration: Ubuntu 10.04, using OpenBlas and shared flag: ./configure --openblas-root=`pwd`/../tools/OpenBLAS/install --fst-root=`pwd`/../tools/openfst --shared Ondra On Mon, Jul 8, 2013 at 7:54 AM, Ho Yin Chan <ric...@gm...>wrote: > Simulated mode on online decoding demo run fine on CentOS too. > > Ricky > > On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov < > vas...@gm...> wrote: > >> The compilation(including "make ext") is working OK for me too on Ubuntu >> 10.04. >> Only tried to run the online decoders(voxforge/online_demo) so far - >> everything seems to be fine with them. >> >> Vassil >> >> >> On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: >> > Everyone, >> > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej >> > Platek and others have been working on different build scripts that >> > now support a shared-library option. If anyone can test it and make >> > sure it still works for them it would be great. >> > If people have made local changes to their Makefiles they may get >> conflicts. >> > Dan >> > > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-08 05:54:56
|
Simulated mode on online decoding demo run fine on CentOS too. Ricky On Sun, Jul 7, 2013 at 10:07 PM, Vassil Panayotov < vas...@gm...> wrote: > The compilation(including "make ext") is working OK for me too on Ubuntu > 10.04. > Only tried to run the online decoders(voxforge/online_demo) so far - > everything seems to be fine with them. > > Vassil > > > On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: > > Everyone, > > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej > > Platek and others have been working on different build scripts that > > now support a shared-library option. If anyone can test it and make > > sure it still works for them it would be great. > > If people have made local changes to their Makefiles they may get > conflicts. > > Dan > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 14:07:24
|
The compilation(including "make ext") is working OK for me too on Ubuntu 10.04. Only tried to run the online decoders(voxforge/online_demo) so far - everything seems to be fine with them. Vassil On Sun, Jul 7, 2013 at 5:31 AM, Daniel Povey <dp...@gm...> wrote: > Everyone, > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej > Platek and others have been working on different build scripts that > now support a shared-library option. If anyone can test it and make > sure it still works for them it would be great. > If people have made local changes to their Makefiles they may get conflicts. > Dan |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 11:31:36
|
Successfully built on MacOS 10.8 and the commands are working. I haven't tried training a system yet. Paul On 7 July 2013 11:31, Mailing list used for User Communication and Updates < kal...@li...> wrote: > Everyone, > I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej > Platek and others have been working on different build scripts that > now support a shared-library option. If anyone can test it and make > sure it still works for them it would be great. > If people have made local changes to their Makefiles they may get > conflicts. > Dan > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-07 02:31:13
|
Everyone, I have just merged from ^/sandbox/sharedlibs, where Jan Trmal, Ondrej Platek and others have been working on different build scripts that now support a shared-library option. If anyone can test it and make sure it still works for them it would be great. If people have made local changes to their Makefiles they may get conflicts. Dan |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-02 13:51:45
|
Thanks. On Tue, Jul 2, 2013 at 9:38 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi Lahiru, > I already fixed this issue in the trunk, the PdfPrior is now activated > only when the option --class-frame-counts is present. > > Karel > > > Dne 2.7.2013 9:22, Mailing list used for User Communication and Updates > napsal(a): > > Sorry, I was wrong. It selects the GPU automatically, > > I found the error in *exp/tri4b_pretrain-dbn/log/cmvn_glob_fwd.log *file. > > ERROR (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts > is empty: Cannot initialize priors without the counts. > ERROR (nnet-forward:main():nnet-forward.cc:196) ERROR > (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts is > empty: Cannot initialize priors without the counts. > > > Thanks > Lahiru > > > On Tue, Jul 2, 2013 at 9:10 PM, Lahiru Samarakoon <lah...@gm...>wrote: > >> Hi All, >> >> When running DNN training on GPUs, I am getting following error. >> >> *Log File : exp/tri4b_pretrain-dbn/_pretrain_dbn.log* >> >> *# PRE-TRAINING RBM LAYER 1 >> Initializing 'exp/tri4b_pretrain-dbn/1.rbm.init' >> Traceback (most recent call last): >> File "utils/nnet/gen_rbm_init.py", line 40, in ? >> dimL.append(int(dimStrL[i])) >> ValueError: invalid literal for int(): * >> >> >> I am running this in a GPU cluster which assigns the job to a GPU >> dynamically, So I cannot configure the *_gpu_id= # manually select GPU >> id to run on, (-1 disables GPU)*. >> Can this be the cause? >> >> Thanks, >> Lahiru >> >> >> On Fri, Jun 28, 2013 at 11:06 PM, Mailing list used for User >> Communication and Updates <kal...@li...> wrote: >> >>> It's not the same as that. Each machine does SGD separately and, >>> periodically, the parameters are averaged across machines. >>> Dan >>> >>> >>> On Fri, Jun 28, 2013 at 11:03 AM, Mailing list used for User >>> Communication and Updates <kal...@li...> wrote: >>> > Wow, nice. >>> > Does the implementation similar to the Jeff Dean's paper Large Scale >>> > Distributed Deep Networks >>> > ( >>> http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf >>> ) >>> > ? >>> > Does Kaldi use Asynchronous SGD? >>> > >>> > Please give me a brief description. >>> > >>> > Thanks, >>> > Lahiru >>> > >>> > >>> > On Fri, Jun 28, 2013 at 10:28 PM, Mailing list used for User >>> Communication >>> > and Updates <kal...@li...> wrote: >>> >> >>> >> It's on multiple machines and also multiple threads per machine. >>> >> Dan >>> >> >>> >> >>> >> On Fri, Jun 28, 2013 at 2:05 AM, Mailing list used for User >>> >> Communication and Updates <kal...@li...> wrote: >>> >> > Thanks guys :-) >>> >> > >>> >> > Dan, is your setup for distributed training? Or is it only >>> parallelize >>> >> > with >>> >> > in a single machine? >>> >> > >>> >> > Thanks, >>> >> > Lahiru >>> >> > >>> >> > >>> >> > >>> >> > On Fri, Jun 28, 2013 at 5:29 AM, Mailing list used for User >>> >> > Communication >>> >> > and Updates <kal...@li...> wrote: >>> >> >> >>> >> >> In my setup there is RBM pre-training: >>> >> >> http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf >>> >> >> <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf> >>> >> >> followed by per-frame cross entropy training and sMBR training: >>> >> >> http://www.danielpovey.com/files/2013_interspeech_dnn.pdf >>> >> >> >>> >> >> >>> >> >> Dne 27.6.2013 13:21, Mailing list used for User Communication and >>> >> >> Updates napsal(a): >>> >> >> > There are basically two setups there: Karel's setup, generally >>> called >>> >> >> > run_dnn.sh or run_nnet.sh, which is for GPUs, and my setup, >>> called >>> >> >> > run_nnet_cpu.sh, which is for CPUs in parallel. Karel's setup >>> may >>> >> >> > have an ICASSP paper, Karel can tell you. Mine is mostly >>> >> >> > unpublished. >>> >> >> > >>> >> >> > Dan >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Jun 27, 2013 at 5:31 AM, Mailing list used for User >>> >> >> > Communication and Updates <kal...@li...> >>> wrote: >>> >> >> >> Hi All, >>> >> >> >> >>> >> >> >> I am in the process of running the wsj/s5 recipe. Now I am >>> about the >>> >> >> >> run DNN >>> >> >> >> experiments and specifically interested in the DNN training. I >>> am >>> >> >> >> planning >>> >> >> >> to look into the DNN code for more understanding. Since there >>> are >>> >> >> >> many >>> >> >> >> DNN >>> >> >> >> variants, could anyone tell me the papers Kalid DNN >>> implementation >>> >> >> >> represents? >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> Lahiru >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> ------------------------------------------------------------------------------ >>> >> >> >> This SF.net email is sponsored by Windows: >>> >> >> >> >>> >> >> >> Build for Windows Store. >>> >> >> >> >>> >> >> >> http://p.sf.net/sfu/windows-dev2dev >>> >> >> >> _______________________________________________ >>> >> >> >> Kaldi-users mailing list >>> >> >> >> Kal...@li... >>> >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >> >> >>> >> >> > >>> >> >> > >>> >> >> > >>> ------------------------------------------------------------------------------ >>> >> >> > This SF.net email is sponsored by Windows: >>> >> >> > >>> >> >> > Build for Windows Store. >>> >> >> > >>> >> >> > http://p.sf.net/sfu/windows-dev2dev >>> >> >> > _______________________________________________ >>> >> >> > Kaldi-users mailing list >>> >> >> > Kal...@li... >>> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> ------------------------------------------------------------------------------ >>> >> >> This SF.net email is sponsored by Windows: >>> >> >> >>> >> >> Build for Windows Store. >>> >> >> >>> >> >> http://p.sf.net/sfu/windows-dev2dev >>> >> >> _______________________________________________ >>> >> >> Kaldi-users mailing list >>> >> >> Kal...@li... >>> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> ------------------------------------------------------------------------------ >>> >> > This SF.net email is sponsored by Windows: >>> >> > >>> >> > Build for Windows Store. >>> >> > >>> >> > http://p.sf.net/sfu/windows-dev2dev >>> >> > _______________________________________________ >>> >> > Kaldi-users mailing list >>> >> > Kal...@li... >>> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> > >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> This SF.net email is sponsored by Windows: >>> >> >>> >> Build for Windows Store. >>> >> >>> >> http://p.sf.net/sfu/windows-dev2dev >>> >> _______________________________________________ >>> >> Kaldi-users mailing list >>> >> Kal...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > This SF.net email is sponsored by Windows: >>> > >>> > Build for Windows Store. >>> > >>> > http://p.sf.net/sfu/windows-dev2dev >>> > _______________________________________________ >>> > Kaldi-users mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> This SF.net email is sponsored by Windows: >>> >>> Build for Windows Store. >>> >>> http://p.sf.net/sfu/windows-dev2dev >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > http://p.sf.net/sfu/windows-dev2dev > > > > _______________________________________________ > Kaldi-users mailing lis...@li...://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-02 13:38:24
|
Hi Lahiru, I already fixed this issue in the trunk, the PdfPrior is now activated only when the option --class-frame-counts is present. Karel Dne 2.7.2013 9:22, Mailing list used for User Communication and Updates napsal(a): > Sorry, I was wrong. It selects the GPU automatically, > > I found the error in *exp/tri4b_pretrain-dbn/log/cmvn_glob_fwd.log *file. > > ERROR (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) > --class-frame-counts is empty: Cannot initialize priors without the > counts. > ERROR (nnet-forward:main():nnet-forward.cc:196) ERROR > (nnet-forward:PdfPrior():nnet-pdf-prior.cc:26) --class-frame-counts is > empty: Cannot initialize priors without the counts. > > > Thanks > Lahiru > > > On Tue, Jul 2, 2013 at 9:10 PM, Lahiru Samarakoon <lah...@gm... > <mailto:lah...@gm...>> wrote: > > Hi All, > > When running DNN training on GPUs, I am getting following error. > > _*Log File : exp/tri4b_pretrain-dbn/_pretrain_dbn.log*_ > > /# PRE-TRAINING RBM LAYER 1 > Initializing 'exp/tri4b_pretrain-dbn/1.rbm.init' > Traceback (most recent call last): > File "*utils/nnet/gen_rbm_init.py*", line 40, in ? > dimL.append(int(dimStrL[i])) > *ValueError: invalid literal for int(): */ > > > I am running this in a GPU cluster which assigns the job to a GPU > dynamically, So I cannot configure the *_gpu_id= # manually select > GPU id to run on, (-1 disables GPU)*. > Can this be the cause? > > Thanks, > Lahiru > > > On Fri, Jun 28, 2013 at 11:06 PM, Mailing list used for User > Communication and Updates <kal...@li... > <mailto:kal...@li...>> wrote: > > It's not the same as that. Each machine does SGD separately and, > periodically, the parameters are averaged across machines. > Dan > > > On Fri, Jun 28, 2013 at 11:03 AM, Mailing list used for User > Communication and Updates <kal...@li... > <mailto:kal...@li...>> wrote: > > Wow, nice. > > Does the implementation similar to the Jeff Dean's paper > Large Scale > > Distributed Deep Networks > > > (http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf > <http://www.cs.toronto.edu/%7Eranzato/publications/DistBeliefNIPS2012_withAppendix.pdf>) > > ? > > Does Kaldi use Asynchronous SGD? > > > > Please give me a brief description. > > > > Thanks, > > Lahiru > > > > > > On Fri, Jun 28, 2013 at 10:28 PM, Mailing list used for User > Communication > > and Updates <kal...@li... > <mailto:kal...@li...>> wrote: > >> > >> It's on multiple machines and also multiple threads per > machine. > >> Dan > >> > >> > >> On Fri, Jun 28, 2013 at 2:05 AM, Mailing list used for User > >> Communication and Updates > <kal...@li... > <mailto:kal...@li...>> wrote: > >> > Thanks guys :-) > >> > > >> > Dan, is your setup for distributed training? Or is it > only parallelize > >> > with > >> > in a single machine? > >> > > >> > Thanks, > >> > Lahiru > >> > > >> > > >> > > >> > On Fri, Jun 28, 2013 at 5:29 AM, Mailing list used for User > >> > Communication > >> > and Updates <kal...@li... > <mailto:kal...@li...>> wrote: > >> >> > >> >> In my setup there is RBM pre-training: > >> >> http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf > <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf> > >> >> <http://www.cs.toronto.edu/%7Ehinton/absps/guideTR.pdf> > >> >> followed by per-frame cross entropy training and sMBR > training: > >> >> http://www.danielpovey.com/files/2013_interspeech_dnn.pdf > >> >> > >> >> > >> >> Dne 27.6.2013 13:21, Mailing list used for User > Communication and > >> >> Updates napsal(a): > >> >> > There are basically two setups there: Karel's setup, > generally called > >> >> > run_dnn.sh or run_nnet.sh, which is for GPUs, and my > setup, called > >> >> > run_nnet_cpu.sh, which is for CPUs in parallel. > Karel's setup may > >> >> > have an ICASSP paper, Karel can tell you. Mine is mostly > >> >> > unpublished. > >> >> > > >> >> > Dan > >> >> > > >> >> > > >> >> > On Thu, Jun 27, 2013 at 5:31 AM, Mailing list used for > User > >> >> > Communication and Updates > <kal...@li... > <mailto:kal...@li...>> wrote: > >> >> >> Hi All, > >> >> >> > >> >> >> I am in the process of running the wsj/s5 recipe. Now > I am about the > >> >> >> run DNN > >> >> >> experiments and specifically interested in the DNN > training. I am > >> >> >> planning > >> >> >> to look into the DNN code for more understanding. > Since there are > >> >> >> many > >> >> >> DNN > >> >> >> variants, could anyone tell me the papers Kalid DNN > implementation > >> >> >> represents? > >> >> >> > >> >> >> Thanks, > >> >> >> Lahiru > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------------ > >> >> >> This SF.net email is sponsored by Windows: > >> >> >> > >> >> >> Build for Windows Store. > >> >> >> > >> >> >> http://p.sf.net/sfu/windows-dev2dev > >> >> >> _______________________________________________ > >> >> >> Kaldi-users mailing list > >> >> >> Kal...@li... > <mailto:Kal...@li...> > >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> >> >> > >> >> > > >> >> > > >> >> > > ------------------------------------------------------------------------------ > >> >> > This SF.net email is sponsored by Windows: > >> >> > > >> >> > Build for Windows Store. > >> >> > > >> >> > http://p.sf.net/sfu/windows-dev2dev > >> >> > _______________________________________________ > >> >> > Kaldi-users mailing list > >> >> > Kal...@li... > <mailto:Kal...@li...> > >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> >> > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ > >> >> This SF.net email is sponsored by Windows: > >> >> > >> >> Build for Windows Store. > >> >> > >> >> http://p.sf.net/sfu/windows-dev2dev > >> >> _______________________________________________ > >> >> Kaldi-users mailing list > >> >> Kal...@li... > <mailto:Kal...@li...> > >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > >> > > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > This SF.net email is sponsored by Windows: > >> > > >> > Build for Windows Store. > >> > > >> > http://p.sf.net/sfu/windows-dev2dev > >> > _______________________________________________ > >> > Kaldi-users mailing list > >> > Kal...@li... > <mailto:Kal...@li...> > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > >> > >> > >> > ------------------------------------------------------------------------------ > >> This SF.net email is sponsored by Windows: > >> > >> Build for Windows Store. > >> > >> http://p.sf.net/sfu/windows-dev2dev > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > <mailto:Kal...@li...> > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by Windows: > > > > Build for Windows Store. > > > > http://p.sf.net/sfu/windows-dev2dev > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > <mailto:Kal...@li...> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |