Menu

Pocketsphinx KWS tuning

Help
2015-07-07
2015-07-07
  • Hugo Duport

    Hugo Duport - 2015-07-07

    Hi
    I'm performing some tests using PocketSphinx (5 prealpha) keyword spotting tool with the following configuration :
    - acoustic model : cmusphinx-en-us-5.2 ;
    - dictionary : cmudict.hub4.06d.dict ;
    - Keywords (1263 words) ;
    - Audio file ;

    Using this command line :

    pocketsphinx_continuous -dict cmudict.hub4.06d.dict -hmm cmusphinx-en-us-5.2/ -kws en.kws -kws_threshold 75 -time yes -infile conversation.wav

    Gives me this stderr ouput, with the following spotted keywords :

    BULL 0.430 0.640 1.004511
    CLOTHE 0.640 0.860 1.009546
    ZOO 1.180 1.370 1.018471
    RESUME 1.980 2.290 1.022860
    SICK 2.860 2.980 1.012276
    HIKE 3.210 3.350 1.008739
    MUSTARD 4.090 4.400 1.004310
    OST 4.620 4.770 1.010557
    DOG 5.300 5.690 1.015521
    ABBA 6.480 6.740 1.008941
    PORK 7.170 7.350 1.006623
    CULTURE 7.350 7.670 1.007731
    VILLA 7.810 8.040 1.006421
    OBOE 8.140 8.390 1.010455
    VILLA 8.580 8.810 1.010152
    FILM 8.930 9.210 1.012276
    HEN 9.820 10.040 1.013086
    FLU 10.350 10.470 1.005516
    RENT 10.870 11.050 1.013593
    SICK 11.270 11.380 1.013086
    EGG 11.430 11.600 1.005818
    ABBA 12.390 12.540 1.014303
    HIKE 13.300 13.470 1.014709
    VEST 13.530 13.900 1.004209
    DISEASE 14.870 15.280 1.006321
    TEA 15.310 15.410 1.007025
    TI 15.310 15.410 1.007025
    BANK 15.500 15.690 1.004913
    DISEASE 16.130 16.440 1.014607
    REGGAE 17.040 17.240 1.007328
    COATS 20.170 20.400 1.020306
    ROOM 20.520 20.640 1.016842
    WINDOW 20.680 20.930 1.025421
    AYO 20.930 21.080 1.005717
    POEM 21.700 21.950 1.004209
    SAAB 21.950 22.390 1.010658
    BIRTH 23.020 23.260 1.006723
    EGG 23.330 23.400 1.007025
    EGG 23.430 23.510 1.011568
    KIA 24.780 24.990 1.008739
    FORD 25.100 25.260 1.024396
    HONG-KONG 25.400 25.800 1.011062
    RACE 26.410 26.620 1.013188
    RICE 26.410 26.620 1.008739
    BULL 27.020 27.160 1.006019
    NOVEL 26.870 27.160 1.006421
    DOWNTOWN 27.310 27.750 1.018165
    BIRTH 28.230 28.440 1.005114
    WINES 29.080 29.310 1.007025
    GOOSE 30.110 30.270 1.007831
    BULL 30.530 30.850 1.005717
    HAIR 30.980 31.120 1.005918
    ALLAH 31.190 31.350 1.005818
    SCHOOL 31.380 31.590 1.007126
    AYO 31.760 31.890 1.005114
    BEER 32.090 32.370 1.008739
    HAIR 32.150 32.370 1.012377
    MARE 32.160 32.370 1.012276
    HIKE 33.330 33.480 1.005114
    BEER 33.530 33.710 1.009445
    FORD 33.810 33.980 1.006723
    BEER 34.000 34.190 1.012479
    KIA 33.990 34.190 1.011264
    BULL 34.530 34.660 1.012276
    CAR 34.680 34.850 1.011972
    WORKING 34.850 35.080 1.016334
    LADA 35.400 35.570 1.006421
    PIANO 35.340 35.570 1.005717
    VEAL 36.280 36.460 1.007328

    SODA 37.200 37.380 1.004712
    MOVIE 37.790 38.030 1.007227
    LOFT 38.170 38.450 1.014810
    LORIE 38.560 38.830 1.018165
    FORD 39.420 39.610 1.005415
    WART 39.470 39.610 1.006623
    SOW 39.930 40.060 1.005818
    SICK 40.550 40.840 1.005516
    WINE 41.810 41.960 1.009344
    GOAT 42.800 42.950 1.004410
    HIKE 43.600 43.730 1.016232
    BULL 43.800 44.020 1.004611
    GOAT 43.800 44.020 1.011365
    BUNS 44.060 44.260 1.005516
    BEER 44.810 44.960 1.008739
    DUCK 46.480 46.680 1.004712
    KID 46.690 46.830 1.023269
    RUGBY 47.080 47.470 1.004209
    HAT 47.950 48.570 1.004812
    HAT 48.800 49.030 1.004511
    HOUSE 48.570 49.030 1.012175
    HAT 49.030 49.480 1.006321
    HAT 49.520 49.720 1.006421
    HAT 50.040 50.230 1.004410
    HAT 51.660 51.910 1.004209
    BULL 52.140 52.280 1.005818
    WRITER 52.990 53.220 1.006522
    KID 53.650 53.840 1.005013
    WATCH 53.990 54.170 1.004611
    RAT 54.350 54.540 1.012580
    EGG 54.580 54.690 1.004310
    SOAP 55.520 55.700 1.005214
    WATCH 55.700 55.880 1.004712
    RAT 56.300 56.560 1.005013
    VEAL 57.300 57.500 1.013897
    MARK 57.910 58.120 1.004913
    PARK 57.930 58.120 1.004812
    ARBOR 58.340 58.530 1.005315
    MUSE 58.720 58.940 1.005516
    BECK 59.570 59.700 1.005818
    LADA 59.340 59.700 1.010658
    HAT 60.720 60.990 1.005013
    OST 62.120 62.280 1.005315
    OPEL 63.520 63.730 1.005315
    PIZZA 63.970 64.390 1.008436
    HAT 65.040 65.250 1.005516
    HATS 65.040 65.250 1.009042
    RATS 65.080 65.250 1.006321
    BECK 65.340 65.460 1.009445
    HIKE 65.570 65.710 1.010860
    DUFFY 65.850 66.040 1.010557
    SURF 67.480 67.940 1.005717

    As you can see, the spotted keywords are not the expected ones (they are not in the wav).
    So I would like to know what is wrong with my configuration and what I can tune to have fewer false positive.

    Any help would be greatly appreciable :)

    Thanks

     
    • Nickolay V. Shmyrev

      You are using wrong mode.

      To look for more than 1000 keywords you need to transcribe the audio and just check the words in the output.

      Keyword spotting mode is supposed to be used with 10-20 keyphrases each at least 3 syllables.

       
  • Hugo Duport

    Hugo Duport - 2015-07-07

    Oh, okay
    Thank you for your quick reply anyway !

     

Log in to post a comment.