The first word 'here' is reported to be in position (23.0,23.65) while it
isn't. The rest of the words timing are also incorrect and as you can see
lots of the words are missing.
I applied no change to config.xml.
How can I fix these issues and improve accuracy? My transcriptions are %100
accurate.
-
"Mathematics is the queen of the sciences and number theory is the queen of
mathematics."
--Gauss
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I changed the text file, but the result is the same. As I understand the code prepares the source text file under the desired format.
Using liveCMN get a good position for words at the end, using BatchCMN get good position at the beginning. But in both cases, most of the words in the middle not fall into their positions.
What else can you recommend setting up for pass to improve?
Hello Nickolay.
I see that pocketsphinx has functionality for align (test_state_align.c). How I can test it in my case (from command line, or other way) ? I don't know how compile this code in XCode.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I split a large file into parts that contain only dialogues. This made a result better. But on a few files i get an error. Either no result.
This is Exception on files in attacment
java.lang.NegativeArraySizeException
at edu.cmu.sphinx.frontend.feature.LiveCMN.initMeansSums(LiveCMN.java:130)
at edu.cmu.sphinx.frontend.feature.LiveCMN.getData(LiveCMN.java:161)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getNextData(AbstractFeatureExtractor.java:124)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getData(AbstractFeatureExtractor.java:97)
at edu.cmu.sphinx.frontend.feature.FeatureTransform.getData(FeatureTransform.java:85)
at edu.cmu.sphinx.frontend.FrontEnd.getData(FrontEnd.java:219)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.getNextData(SimpleAcousticScorer.java:100)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.startRecognition(SimpleAcousticScorer.java:127)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:261)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:62)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:109)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:125)
at edu.cmu.sphinx.alignment.SpeechAligner.align(SpeechAligner.java:132)
at edu.cmu.sphinx.alignment.SpeechAligner.align(SpeechAligner.java:80)
at edu.cmu.sphinx.alignment.SpeechAligner$align.call(Unknown Source)
Hi CMUSphinx coders.
I built the Long Audio Aligner out of the source obtained from
herehttps://svn.code.sf.net/p/cmusphinx/code/branches/long-audio-aligner/Aligner/
.
My sample audio file is 3 minutes (180 seconds length) but the Aligner
output this:
Total Time Audio: 360.02s
Sample Audio with text (http://www.listen-to-english.com/index.php?id=582
http://www.listen-to-english.com/index.php?id=582)
I converted the MP3 to 1411kbps 44.1 KHz PCM WAV format.
Running the aligner with the following command:
java -ms400m -mx1500m -jar bin\aligner.jar ShortPodcast.wav
ShortPodcast.txt > out.txt
And the recognized words as follows:
--------------- Summary statistics ---------
Total Time Audio: 360.02s Proc: 17.18s Speed: 0.05 X real time
<unk>(0.0,19.41) here(23.0,23.65) in(23.65,23.71) britain(23.71,26.35)
have(27.38,27.59) been(27.62,27.9) a(27.9,29.02) not(30.19,31.6)
the(33.74,33.83) of(33.83,36.43) a(36.43,36.64) person(37.45,37.84)
but(37.84,38.81) the(39.08,39.16) of(39.16,39.6) a(40.1,40.74)
one(40.74,40.92) and(40.92,42.13) years(44.93,45.1) ago(45.1,46.93)
in(47.15,47.72) january(50.21,51.0) the(51.14,51.46)
underground(51.65,52.42) in(52.42,52.56) the(53.71,53.97)
carried(54.03,55.14) first(56.31,56.87) it(58.34,58.9) ran(58.9,59.01)
for(59.29,59.51) kilometres(59.54,60.9) paddington(61.07,61.56)
london(61.56,61.9) to(61.9,62.84) a(62.92,63.29) close(71.01,71.79)
the(71.79,72.5) which(72.5,72.95) is(73.07,73.55) the(74.04,74.35)
name(74.66,74.95) call(75.12,75.57) londonâs(75.57,76.33)
business(76.36,77.37) the(77.8,78.04) new(78.91,79.19) was(81.25,81.44)
and(81.44,81.54) unpopular(81.54,84.08) with(84.3,84.53) many(84.53,85.0)
the(86.49,86.79) men(87.3,87.75) the(87.75,87.84) railway(88.15,89.13)
dug(89.13,91.21) the(93.37,93.47) and(93.47,94.53) knocked(95.09,95.35)
down(95.35,96.0) and(97.47,97.78) other(97.81,98.03) they(98.03,98.13)
a(98.13,99.2) deep(100.02,100.18) and(100.18,100.51) the(100.54,100.78)
track(100.78,101.32) the(103.26,104.06) bottom(104.09,104.78)
then(104.78,105.14) they(105.14,105.54) over(105.59,106.0)
the(106.03,106.09) new(106.09,106.25) and(106.31,106.73) the(106.73,107.08)
of(107.76,108.07) the(108.07,108.38) not(108.79,109.29) the(109.35,109.88)
construction(109.88,110.81) caused(110.95,111.54) in(113.27,113.39)
for(114.7,115.0) many(115.29,115.45) months(115.45,116.07)
steam(117.34,118.51) engines(118.51,119.6) pulled(120.84,122.21)
the(122.63,122.83) underground(123.87,125.09) although(127.65,128.22)
the(128.32,128.52) tunnels(129.4,131.33) had(131.7,131.85) in(131.85,132.0)
the(132.0,132.26) to(132.4,134.08) let(134.13,134.52) the(135.18,135.58)
smoke(138.45,139.3) they(139.3,140.0) still(140.0,140.57)
full(140.65,141.29) of(142.92,143.01) and(143.01,143.64)
steam(143.75,144.12) the(144.12,144.32) railway(146.86,147.28)
company(147.31,147.92) said(147.92,148.28) that(148.28,149.56)
atmosphere(149.56,151.1) was(153.63,153.75) and(153.75,153.86)
particularly(153.86,156.0) good(156.0,156.23) for(156.64,157.04)
with(157.64,158.09) i(158.21,158.51) that(158.77,159.23) it(159.23,159.67)
have(159.67,160.36) been(160.36,160.6) unpleasant(161.46,162.99)
from(162.99,163.46) the(164.42,164.78) very(164.81,165.05)
day(165.05,167.17) the(167.62,167.85) was(168.35,168.63)
with(169.05,169.26) who(172.24,173.6) to(173.65,173.84) to(173.84,174.22)
their(174.52,175.29) in(175.29,175.77) london(179.19,179.82)
about(179.82,182.71) people(183.98,184.41) used(184.41,185.33)
the(185.48,185.82) every(185.94,186.22) day(187.64,187.84)
in(187.84,192.49) its(192.49,192.58) first(196.65,197.49)
months(197.69,198.16) of(198.9,198.97) more(199.66,200.06)
underground(200.06,200.75) lines(201.27,201.57) in(201.57,201.67)
the(202.47,202.75) following(203.1,203.8) the(203.8,204.29)
companies(204.95,205.35) found(205.71,206.07) new(207.69,207.86)
to(207.86,208.09) and(208.09,209.36) them(209.36,211.26) of(211.26,211.35)
digging(216.55,217.28) huge(217.28,219.63) trenches(219.91,220.66)
in(221.2,221.3) the(221.3,221.62) they(221.65,221.83) bored(222.29,223.6)
holes(223.6,224.56) under(226.5,227.43) the(227.43,228.26)
city(228.51,229.17) people(231.89,232.6) called(235.36,236.99)
these(236.99,237.46) underground(238.8,239.92) lines(239.95,240.78)
because(243.0,243.35) the(243.39,243.72) had(243.72,245.41)
a(246.96,247.14) circular(247.2,248.06) shape(248.13,248.52)
like(251.39,251.55) nowadays(251.55,252.96) we(253.99,254.05)
âœthe(254.08,254.14) to(254.14,254.68) all(254.68,255.18)
the(255.18,256.03) london(257.08,257.67) underground(259.05,259.92)
it(259.92,260.47) of(260.47,260.53) impossible(260.53,261.38)
to(261.38,261.51) steam(261.51,263.05) on(263.05,263.23) the(264.06,264.39)
deep(266.99,267.43) tube(269.19,269.56) lines(269.56,269.97)
they(270.64,270.85) had(270.85,271.39) trains(271.73,272.35)
by(273.63,273.79) the(275.66,275.83) of(275.83,278.61) the(281.72,281.84)
20th(282.76,282.79) electricity(282.79,283.72) had(283.75,284.55)
replaced(284.55,285.33) on(285.33,285.85) all(286.24,286.58)
the(286.58,286.77) underground(287.32,289.16) to(289.39,289.93)
the(290.04,290.24) 150th(290.27,290.33) of(290.45,290.58)
the(290.69,290.77) underground(290.77,292.47) one(293.61,293.7)
of(293.7,294.44) the(294.98,295.17) old(296.07,296.17)
engines(296.17,296.84) came(297.06,297.3) out(297.68,297.96)
of(297.96,298.06) retirement(300.63,301.19) home(301.19,302.07)
a(302.07,302.11) museum(302.11,303.93) pull(303.96,304.48) a(305.35,305.4)
underground(305.4,306.36) train(306.36,306.55) the(306.55,306.97)
office(307.31,307.99) issued(308.39,309.25) some(309.29,309.71)
new(309.71,310.02) to(310.91,311.59) mark(311.66,312.76) the(318.92,319.01)
anniversary(319.01,320.31)
The first word 'here' is reported to be in position (23.0,23.65) while it
isn't. The rest of the words timing are also incorrect and as you can see
lots of the words are missing.
I applied no change to config.xml.
How can I fix these issues and improve accuracy? My transcriptions are %100
accurate.
-
"Mathematics is the queen of the sciences and number theory is the queen of
mathematics."
--Gauss
Audio files must be 16khz, not 44.1khz. Convert audio to a proper format.
Thank you @Nickolay for your suggestion. The problem was with endianness of file, too.
I converted it using the following Sox command:
sox input.wav -r 16000 -c 1 -L output.wav
and the output timing is just perfect till second 98 when it completely stopped.
What can cause this? Silence? There's no silence after the word 'about' in speech.
--------------- Summary statistics ---------
Total Time Audio: 180.01s Proc: 3.34s Speed: 0.02 X real time
here(0.24,0.58) in(0.58,0.8) britain(0.8,1.25) we(1.33,1.54) have(1.54,1.68) been(1.68,1.9) celebrating(1.9,2.75) a(2.93,3.04) birthday(3.12,3.69) not(5.15,5.36) the(5.39,5.51) birthday(5.51,5.93) of(5.93,6.18) a(6.18,6.41) person(6.44,6.7) however(6.7,7.32) but(7.61,7.88) the(7.91,8.03) birthday(8.03,8.53) of(8.56,8.85) a(8.89,9.07) railway(9.07,9.64) one(11.47,11.69) hundred(11.69,12.03) and(12.03,12.21) fifty(12.21,12.54) years(12.54,12.87) ago(12.87,13.22) in(13.65,13.93) january(13.93,14.66) 1863(14.66,16.13) the(16.71,16.96) first(16.96,17.34) underground(17.34,18.08) railway(18.08,18.57) in(18.57,18.72) the(18.72,18.83) world(18.83,19.34) carried(19.37,19.96) its(19.96,20.18) first(20.18,20.63) passengers(20.63,21.35) it(22.29,22.57) ran(22.57,22.89) for(22.89,23.19) 6(23.19,23.56) kilometres(23.56,24.37) from(24.69,25.05) paddington(25.08,25.75) in(25.75,25.97) london(25.97,26.45) to(26.75,26.88) a(26.88,26.99) place(26.99,27.31) close(27.31,27.69) to(27.69,28.05) the(28.1,28.27) city(28.27,28.83) which(29.5,29.84) is(29.84,29.97) the(29.97,30.05) name(30.05,30.34) we(30.34,30.45) call(30.45,30.94) londonâs(30.94,31.59) main(31.59,32.05) business(32.08,32.56) district(32.56,33.11) the(35.47,35.61) new(35.61,35.85) railway(35.85,36.26) was(36.26,36.46) controversial(36.46,37.61) and(37.7,37.92) unpopular(37.92,38.66) with(38.66,38.85) many(38.85,39.14) people(39.14,39.64) the(40.59,40.7) men(40.7,41.07) building(41.07,41.46) the(41.46,41.59) railway(41.59,42.0) dug(42.0,42.3) up(42.3,42.49) the(42.49,42.57) streets(42.57,43.29) and(43.37,43.59) knocked(43.63,44.05) down(44.05,44.31) houses(44.31,44.87) and(44.87,45.02) other(45.02,45.25) buildings(45.25,45.81) they(46.64,46.85) dug(46.85,47.07) a(47.07,47.24) deep(47.24,47.66) trench(47.66,48.1) and(48.72,48.9) put(48.9,49.1) the(49.1,49.21) railway(49.21,49.66) track(49.66,49.99) at(49.99,50.12) the(50.12,50.2) bottom(50.2,50.78) then(51.6,51.82) they(51.82,52.03) covered(52.03,52.42) over(52.42,52.73) the(52.73,52.87) new(52.87,53.06) railway(53.06,53.62) and(53.87,54.15) remade(54.15,54.68) the(54.68,54.78) surface(54.78,55.3) of(55.3,55.43) the(55.43,55.53) street(55.53,56.02) not(57.36,57.57) surprisingly(57.57,58.41) the(58.55,58.75) construction(58.75,59.53) work(59.53,59.89) caused(59.92,60.31) chaos(60.34,61.01) in(61.01,61.3) london(61.3,61.73) for(61.83,62.05) many(62.05,62.32) months(62.32,62.85) steam(64.68,65.25) engines(65.25,65.75) pulled(65.75,66.19) the(66.19,66.29) first(66.29,66.73) underground(66.73,67.46) trains(67.49,68.08) although(69.22,69.55) the(69.55,69.67) tunnels(69.67,70.06) had(70.06,70.23) vents(70.23,70.79) in(70.79,70.95) the(70.95,71.04) roof(71.04,71.35) to(71.38,71.51) let(71.51,71.76) the(71.79,71.87) smoke(71.87,72.24) escape(72.24,72.77) they(73.11,73.51) were(73.51,73.64) still(73.64,73.99) full(73.99,74.26) of(74.26,74.44) soot(74.44,74.89) and(74.89,75.11) steam(75.11,75.84) the(77.23,77.38) railway(77.38,77.79) company(77.79,78.28) bravely(78.28,78.7) said(78.7,79.08) that(79.27,79.5) the(79.5,79.62) atmosphere(79.62,80.18) was(80.18,80.39) invigorating(80.68,81.73) and(82.2,82.4) particularly(82.4,83.1) good(83.1,83.39) for(83.39,83.55) people(83.55,83.89) with(83.89,84.08) asthma(84.12,84.7) i(86.09,86.29) think(86.29,86.54) that(86.54,86.7) it(86.7,86.78) must(86.78,87.01) have(87.01,87.1) been(87.1,87.3) very(87.3,87.51) unpleasant(87.51,88.24) nonetheless(89.57,90.29) from(90.41,90.68) the(90.68,90.77) very(90.77,91.02) first(91.02,91.45) day(91.45,91.73) the(91.88,92.1) railway(92.1,92.48) was(92.48,92.63) popular(92.63,93.36) with(93.43,93.59) people(93.59,94.0) who(94.0,94.11) needed(94.11,94.64) to(94.82,94.98) travel(95.02,95.49) to(95.49,95.69) their(95.69,95.88) work(95.88,96.18) in(96.57,96.83) london(96.83,97.25) about(98.31,179.98)
Just replaced liveCMN with BatchCMN in config.xml and got the following full length timed output.
--------------- Summary statistics ---------
Total Time Audio: 180.01s Proc: 3.42s Speed: 0.02 X real time
here(0.24,0.58) in(0.58,0.8) britain(0.8,1.26) we(1.33,1.49) have(1.49,1.69) been(1.69,1.9) celebrating(1.9,2.75) a(2.93,3.09) birthday(3.09,3.69) not(5.16,5.4) the(5.4,5.51) birthday(5.51,5.93) of(5.93,6.18) a(6.18,6.41) person(6.44,6.69) however(6.69,7.54) but(7.6,7.88) the(7.92,8.03) birthday(8.03,8.47) of(8.47,8.57) a(8.57,9.08) railway(9.08,9.64) one(11.44,11.67) hundred(11.67,12.03) and(12.03,12.2) fifty(12.2,12.53) years(12.53,12.87) ago(12.87,13.22) in(13.62,13.92) january(13.92,14.66) 1863(14.66,16.13) the(16.85,16.95) first(16.95,17.32) underground(17.32,18.08) railway(18.08,18.57) in(18.57,18.72) the(18.72,18.84) world(18.84,19.44) carried(19.52,19.94) its(19.94,20.18) first(20.18,20.63) passengers(20.63,21.35) it(22.29,22.57) ran(22.57,22.86) for(22.86,23.18) 6(23.22,23.56) kilometres(23.56,24.37) from(24.7,25.03) paddington(25.06,25.75) in(25.75,25.97) london(25.97,26.45) to(26.75,26.88) a(26.88,26.97) place(27.0,27.31) close(27.31,27.69) to(27.69,28.06) the(28.1,28.27) city(28.27,28.83) which(29.63,29.84) is(29.84,29.97) the(29.97,30.05) name(30.05,30.35) we(30.35,30.45) call(30.45,30.94) londonâs(30.94,31.59) main(31.59,32.05) business(32.05,32.56) district(32.56,33.12) the(35.46,35.6) new(35.6,35.83) railway(35.83,36.26) was(36.26,36.46) controversial(36.46,37.62) and(37.7,37.92) unpopular(37.92,38.66) with(38.66,38.83) many(38.83,39.14) people(39.14,39.64) the(40.59,40.7) men(40.7,41.05) building(41.05,41.46) the(41.46,41.59) railway(41.59,42.02) dug(42.02,42.3) up(42.3,42.49) the(42.49,42.57) streets(42.57,43.3) and(43.37,43.59) knocked(43.63,44.05) down(44.05,44.3) houses(44.3,44.86) and(44.86,45.01) other(45.01,45.25) buildings(45.25,45.86) they(46.65,46.84) dug(46.84,47.07) a(47.07,47.22) deep(47.22,47.66) trench(47.66,48.21) and(48.7,48.9) put(48.9,49.1) the(49.1,49.21) railway(49.21,49.66) track(49.66,49.98) at(49.98,50.12) the(50.12,50.21) bottom(50.21,50.78) then(51.58,51.82) they(51.82,52.03) covered(52.03,52.41) over(52.41,52.73) the(52.73,52.87) new(52.87,53.06) railway(53.06,53.65) and(53.86,54.15) remade(54.15,54.68) the(54.68,54.82) surface(54.82,55.3) of(55.3,55.43) the(55.43,55.53) street(55.53,56.12) not(57.36,57.5) surprisingly(57.53,58.41) the(58.55,58.75) construction(58.75,59.53) work(59.53,59.89) caused(59.92,60.35) chaos(60.38,61.01) in(61.01,61.3) london(61.3,61.73) for(61.83,62.09) many(62.09,62.32) months(62.32,62.84) steam(64.68,65.24) engines(65.24,65.75) pulled(65.75,66.19) the(66.19,66.29) first(66.29,66.73) underground(66.73,67.46) trains(67.49,68.09) although(69.21,69.55) the(69.55,69.67) tunnels(69.67,70.07) had(70.07,70.23) vents(70.23,70.79) in(70.79,70.95) the(70.95,71.05) roof(71.05,71.35) to(71.38,71.52) let(71.52,71.75) the(71.78,71.87) smoke(71.87,72.24) escape(72.24,72.77) they(73.35,73.52) were(73.52,73.64) still(73.64,73.99) full(73.99,74.26) of(74.26,74.44) soot(74.44,74.89) and(74.89,75.11) steam(75.11,75.84) the(77.2,77.38) railway(77.38,77.79) company(77.79,78.27) bravely(78.27,78.7) said(78.7,79.08) that(79.28,79.5) the(79.5,79.62) atmosphere(79.62,80.18) was(80.18,80.7) invigorating(80.7,81.73) and(82.18,82.4) particularly(82.4,83.11) good(83.11,83.39) for(83.39,83.55) people(83.55,83.89) with(83.89,84.09) asthma(84.13,84.8) i(86.09,86.29) think(86.29,86.54) that(86.54,86.7) it(86.7,86.78) must(86.78,87.0) have(87.0,87.09) been(87.09,87.3) very(87.3,87.5) unpleasant(87.5,88.24) nonetheless(89.58,90.29) from(90.41,90.68) the(90.68,90.77) very(90.77,91.02) first(91.02,91.4) day(91.4,91.7) the(91.7,92.1) railway(92.1,92.48) was(92.48,92.63) popular(92.63,93.34) with(93.37,93.6) people(93.6,94.01) who(94.01,94.11) needed(94.11,94.68) to(94.81,94.94) travel(94.94,95.49) to(95.49,95.69) their(95.69,95.88) work(95.88,96.44) in(96.59,96.83) london(96.83,97.25) about(98.31,99.19) 26000(99.19,100.4) people(100.4,100.84) used(101.13,101.49) the(101.49,101.61) railway(101.61,102.2) every(102.46,102.82) day(102.82,103.14) in(103.73,103.88) its(103.88,104.06) first(104.06,104.5) six(104.5,104.76) months(104.76,105.12) of(105.12,105.32) operation(105.32,106.0) more(108.54,108.89) underground(108.89,109.53) railway(109.53,109.91) lines(109.91,110.41) opened(110.41,111.04) in(111.04,111.14) the(111.14,111.21) following(111.21,111.77) years(111.77,112.33) the(113.23,113.35) railway(113.35,113.73) companies(113.73,114.19) found(114.19,114.62) new(114.62,114.84) ways(114.84,115.37) to(115.37,115.49) build(115.49,115.89) and(115.89,116.02) operate(116.02,116.48) them(116.48,116.73) instead(117.52,118.0) of(118.0,118.17) digging(118.17,118.69) huge(118.76,119.34) trenches(119.34,119.96) in(119.96,120.1) the(120.1,120.25) streets(120.25,120.88) they(121.43,121.69) bored(121.69,122.09) holes(122.09,122.94) deep(123.11,123.62) under(123.65,123.96) the(123.96,124.07) city(124.07,124.57) people(125.66,126.07) called(126.07,126.4) these(126.4,126.64) deep(126.64,126.97) underground(126.97,127.73) lines(127.73,128.18) âœtubesâ(128.39,129.08) because(129.5,129.85) the(129.85,130.04) tunnels(130.04,130.45) had(130.45,130.66) a(130.66,130.74) circular(130.74,131.29) shape(131.29,131.7) like(131.99,132.27) tubes(132.3,132.9) nowadays(133.85,134.44) we(134.44,134.6) say(134.6,134.93) âœthe(134.93,135.01) tubeâ(135.34,135.64) to(135.64,136.07) mean(136.88,137.29) all(137.82,137.97) of(137.97,138.16) the(138.16,138.28) london(138.28,138.76) underground(138.76,139.37) system(139.37,139.94) it(141.36,141.53) was(141.53,141.73) of(141.73,141.85) course(141.85,142.15) impossible(142.15,142.95) to(143.02,143.14) use(143.14,143.44) steam(143.44,143.91) engines(143.91,144.56) on(144.67,144.85) the(144.85,144.94) deep(144.94,145.32) tube(145.32,145.71) lines(145.71,146.23) they(146.69,146.94) had(146.94,147.33) electric(147.33,147.9) trains(147.9,148.51) instead(148.51,149.22) by(150.28,150.43) the(150.43,150.55) beginning(150.55,150.99) of(150.99,151.12) the(151.12,151.2) 20th(151.2,151.27) electricity(151.27,152.33) had(152.67,153.2) replaced(153.2,154.45) steam(154.45,155.05) on(155.51,155.75) all(155.75,156.11) the(156.11,156.25) underground(156.25,156.86) lines(156.86,157.39) to(159.44,159.61) celebrate(159.61,160.32) the(160.35,161.27) 150th(161.27,161.41) of(161.41,161.47) the(161.47,161.73) underground(161.73,163.88) one(164.44,164.73) of(164.73,164.81) the(164.81,165.06) old(165.06,165.54) steam(165.54,166.0) engines(166.0,166.55) came(166.87,167.27) out(167.27,167.51) of(167.51,167.62) its(167.62,167.86) retirement(167.86,168.5) home(168.5,168.69) in(168.69,168.87) a(168.9,168.99) museum(168.99,169.75) to(170.26,170.42) pull(170.42,170.82) a(170.82,170.94) special(170.94,171.52) underground(171.55,172.19) train(172.28,172.74) the(174.0,174.27) post(174.3,174.69) office(174.69,175.15) issued(175.27,175.69) some(175.69,175.97) new(175.97,176.17) stamps(176.17,176.86) to(177.02,177.23) mark(177.23,177.7) the(177.7,177.87) anniversary(177.87,178.7)
We are working on a new more robust alignment, but it's not there yet.
@Nickolay V.Shmyrev, may I know what is approximate date of such a release?
Hello!
What's wrong with my audio file. I can not get a good result.
https://dl.dropboxusercontent.com/u/87225845/part01.wav
https://dl.dropboxusercontent.com/u/87225845/part01.txt
in any case i get not all words (but it's not a big problem), position is not correct for more than half word.
Thanks for your help!
Your audio file is ok, your text file contains punctuation and not in lower case as it should be.
Hello!
Thanks for your answer!
I changed the text file, but the result is the same. As I understand the code prepares the source text file under the desired format.
Using liveCMN get a good position for words at the end, using BatchCMN get good position at the beginning. But in both cases, most of the words in the middle not fall into their positions.
What else can you recommend setting up for pass to improve?
Hello Dima and Mohsen
The work on long audio aligner is in process, unfortunately it's a bit complex task and we need some time to implement it properly.
We have just committed a new algorithm implementation into our subversion trunk, it should work way better than a branch. You can try it out this way:
java -cp sphinx4-samples/target/sphinx4-samples-1.0-SNAPSHOT-jar-with-dependencies.jar edu.cmu.sphinx.demo.aligner.AlignerDemo part01.wav part01.txt en-us-generic cmudict-5prealpha.dict cmudict-5prealpha.fst.ser
Result is attached, it looks ok, but probably needs some more love.
Hello Nickolay.
I see that pocketsphinx has functionality for align (test_state_align.c). How I can test it in my case (from command line, or other way) ? I don't know how compile this code in XCode.
test_state_align is for short utterances only and it is not a part of public API, you should not use it.
I split a large file into parts that contain only dialogues. This made a result better. But on a few files i get an error. Either no result.
This is Exception on files in attacment
java.lang.NegativeArraySizeException
at edu.cmu.sphinx.frontend.feature.LiveCMN.initMeansSums(LiveCMN.java:130)
at edu.cmu.sphinx.frontend.feature.LiveCMN.getData(LiveCMN.java:161)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getNextData(AbstractFeatureExtractor.java:124)
at edu.cmu.sphinx.frontend.feature.AbstractFeatureExtractor.getData(AbstractFeatureExtractor.java:97)
at edu.cmu.sphinx.frontend.feature.FeatureTransform.getData(FeatureTransform.java:85)
at edu.cmu.sphinx.frontend.FrontEnd.getData(FrontEnd.java:219)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.getNextData(SimpleAcousticScorer.java:100)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.startRecognition(SimpleAcousticScorer.java:127)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:261)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:62)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:109)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:125)
at edu.cmu.sphinx.alignment.SpeechAligner.align(SpeechAligner.java:132)
at edu.cmu.sphinx.alignment.SpeechAligner.align(SpeechAligner.java:80)
at edu.cmu.sphinx.alignment.SpeechAligner$align.call(Unknown Source)
Last edit: Dima Kruk 2014-07-31
And this file without result.