sphinx4-1.0beta5-src on Windows 7
compiled with Ant
jdk1.6.0_24
I modified Aligner.java and its config.xml to use the french_f0 dictionary, a
sample wav, and a sample sentence. The result is poor. Is this the best I can
expect or should I be doing this differently?
/**Copyright1999-2004CarnegieMellonUniversity.*PortionsCopyright2004SunMicrosystems,Inc.*PortionsCopyright2004MitsubishiElectricResearchLaboratories.*AllRightsReserved.Useissubjecttolicenseterms.**Seethefile"license.terms"forinformationonusageand*redistributionofthisfile,andforaDISCLAIMEROFALL*WARRANTIES.**/packageedu.cmu.sphinx.demo.aligner;importedu.cmu.sphinx.frontend.util.AudioFileDataSource;importedu.cmu.sphinx.recognizer.Recognizer;importedu.cmu.sphinx.result.Result;importedu.cmu.sphinx.util.props.ConfigurationManager;importedu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;importjavax.sound.sampled.UnsupportedAudioFileException;importjava.io.IOException;importjava.net.URL;/***Asimpleexamplethatshowshowtoalignspeechtoexistingtranscriptionto*gettimes.*/publicclassAligner{publicstaticvoidmain(String[]args)throwsIOException,UnsupportedAudioFileException{ConfigurationManagercm=newConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");Recognizerrecognizer=(Recognizer)cm.lookup("recognizer");TextAlignerGrammargrammar=(TextAlignerGrammar)cm.lookup("textAlignGrammar");grammar.setText("Dans le faubourg une rue assourdissante populeuse où du matin au soir les vitres tremblaient au fracas des camions et des omnibus tout le monde connaissait estimait et respectait la petite papetière");recognizer.addResultListener(grammar);/*allocatetheresourcenecessaryfortherecognizer*/recognizer.allocate();//configuretheaudioinputfortherecognizerAudioFileDataSourcedataSource=(AudioFileDataSource)cm.lookup("audioFileDataSource");dataSource.setAudioFile(newURL("file:src/apps/edu/cmu/sphinx/demo/transcriber/10001-90210-01803.wav"),null);Resultresult;while((result=recognizer.recognize())!=null){StringresultText=result.getTimedBestResult(false,true);System.out.println(resultText);}}}
French model uses AGC, you need to include BatchAGC component into frontend
pipeline. You can search this forum for details.
It's also recommended to use sphinx4-1.0 beta6, not beta5.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-03-03
It's not any better but it might help if I showed you the results using the
correct audio file (I thought those results were weird).
Still, none of the words are correctly located. I also tried with an audio
file containing someone counting from 0-9 in French. It was bad except the
zero was perfect.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-03-04
I saw that as well. I didn't assume that I needed BatchCMN, but I've added it
as you have suggested and the result remains exactly the same. To make sure I
understand correctly, I will post my files again. I very much appreciate your
attention.
ALIGNER.JAVA
/**Copyright1999-2004CarnegieMellonUniversity.*PortionsCopyright2004SunMicrosystems,Inc.*PortionsCopyright2004MitsubishiElectricResearchLaboratories.*AllRightsReserved.Useissubjecttolicenseterms.**Seethefile"license.terms"forinformationonusageand*redistributionofthisfile,andforaDISCLAIMEROFALL*WARRANTIES.**/packageedu.cmu.sphinx.demo.aligner;importedu.cmu.sphinx.frontend.util.AudioFileDataSource;importedu.cmu.sphinx.recognizer.Recognizer;importedu.cmu.sphinx.result.Result;importedu.cmu.sphinx.util.props.ConfigurationManager;importedu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;importedu.cmu.sphinx.frontend.feature.BatchCMN;importedu.cmu.sphinx.frontend.feature.BatchAGC;importjavax.sound.sampled.UnsupportedAudioFileException;importjava.io.IOException;importjava.net.URL;/***Asimpleexamplethatshowshowtoalignspeechtoexistingtranscriptionto*gettimes.*/publicclassAligner{publicstaticvoidmain(String[]args)throwsIOException,UnsupportedAudioFileException{ConfigurationManagercm=newConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");Recognizerrecognizer=(Recognizer)cm.lookup("recognizer");TextAlignerGrammargrammar=(TextAlignerGrammar)cm.lookup("textAlignGrammar");grammar.setText("zero un deux trois quatre cinq six sept huit neuf");recognizer.addResultListener(grammar);/*allocatetheresourcenecessaryfortherecognizer*/recognizer.allocate();//configuretheaudioinputfortherecognizerAudioFileDataSourcedataSource=(AudioFileDataSource)cm.lookup("audioFileDataSource");dataSource.setAudioFile(newURL("file:src/apps/edu/cmu/sphinx/demo/0-9.wav"),null);Resultresult;while((result=recognizer.recognize())!=null){StringresultText=result.getTimedBestResult(false,true);System.out.println(resultText);}}}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-03-04
I just learned that the placement of BatchAGC is important. With the frontend
configured in the following way, I get mostly correct results, a lot of junk,
and one missing word (quatre). Using BatchCMN makes things worse, so I removed
it.
A sample from the end of the results. Now I need to figure out how if I can
filter the results like LiveCMN did, and why quatre is not recognized. If you
could throw me a bone in that respect, I would appreciate it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-03-04
Unfortunately for my longer example, which is a story rather than a simple
count from 0-9, the recognition is horribly poor, despite being so accurate
for the count example. It seems that there may be some deeply embedded tuning
here for the English language which doesn't work for the French language
model.
I believe there is possibly some tuning that could be done for French, and I
would be interested in learning from those that have already blazed the trail.
Please let me know if you can provide some insight.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please be more accurate and try to understand how things work. Your frontend
pipeline is wrong. Proper pipeline is cited in the forum thread we referenced,
you just need to read it carefully. Proper pipeline is:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-03-04
It's clear that you also did not read post number 10 where indicated that I
did have BatchCMN in the pipeline before BatchAGC and it made things worse. Go
back up in the thread and see for yourself. Just in case I was wrong I double-
checked using your recommended pipeline. It's worse. The BatchAGC alone
pipeline is so far the best result I have seen (as I already wrote). Reread
10.
Simply put: what you have suggested is not the solution to this problem,
although the BatchAGC did improve matters. Don't blame me for a lack of
information or for lack of attention, I read everything carefully, which you
would know if you had read carefully yourself.
I have given you all of the information you need to reproduce the situation on
your end with a simple copy and paste. It would be one thing if you had this
set up and it was working for you but I understand that this is some educated
guesswork. I've given you the complete contents of all of my files and the
result. The one thing that is missing is the audio file.
If you have the inclination, you could use the file and code I provided to
reproduce the problem. I just modified the included Aligner.java, aligner.xml
files and added french_f0 into the mix.
My guess is that this is not enough to provide for accurate French recognition
and that some further tuning is needed. It seems that you believe otherwise,
but maybe you or an experienced user are willing to reproduce the problem
using the information I have provided and show me that it is just a simple
tweak as you have indicated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried your audio and indeed it returns not so good results. There is some
issue with long silences between digits, if you'll remove them everything will
be way better. Result with cut silence is:
Aligner algorithm need some work it seems to deal with this particular case.
But that doesn't change the proper frontend configuration listed above since
the configuration is based on prior knowledge, not on the experiments. If
experiments were based on bigger amount of data, they can show which one
performs better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
According to previous discussions on this forum, problem with proper silence
classification might be caused by wrong positioning of dataBlocker component
in pipeline. Placing this component after VAD (after nonSpeechDataFilter to be
more exact) or removing it (if applicable) may solve this problem (see https:
//sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3894779/index/p
age/2)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sphinx4-1.0beta5-src on Windows 7
compiled with Ant
jdk1.6.0_24
I modified Aligner.java and its config.xml to use the french_f0 dictionary, a
sample wav, and a sample sentence. The result is poor. Is this the best I can
expect or should I be doing this differently?
CONFIG:
ALIGNER.XML
RESULT:
The three aligned words are completely wrong.
French model uses AGC, you need to include BatchAGC component into frontend
pipeline. You can search this forum for details.
It's also recommended to use sphinx4-1.0 beta6, not beta5.
It's not any better but it might help if I showed you the results using the
correct audio file (I thought those results were weird).
Still, none of the words are correctly located. I also tried with an audio
file containing someone counting from 0-9 in French. It was bad except the
zero was perfect.
Thank you for the reply. sphinx4-1.0beta6 with the following addition to the
config.xml:
The results on the counting test are poor:
You need to add it into frontend pipeline not just in a list of the
components.
Do I understand correctly that to add BatchAGC into the frontend pipeline, I
make the following changes to the configuration?
and
I also added the following line to Aligner.java:
I also scoured these forums for "BatchAGC" but did not see anything more
detailed than what I have written above.
I have searched forums an found
"you just need to add BatchAGC into the frontend pipeline after the BatchCMN"
(https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3360385
?message=7556457)
So according to this search result, in your pipeline you there is BatchCMN
component missing and BatchAGC is misplaced.
I saw that as well. I didn't assume that I needed BatchCMN, but I've added it
as you have suggested and the result remains exactly the same. To make sure I
understand correctly, I will post my files again. I very much appreciate your
attention.
ALIGNER.JAVA
ALIGNER.XML
RESULT (counting from 0-9 in French):
I just learned that the placement of BatchAGC is important. With the frontend
configured in the following way, I get mostly correct results, a lot of junk,
and one missing word (quatre). Using BatchCMN makes things worse, so I removed
it.
A sample from the end of the results. Now I need to figure out how if I can
filter the results like LiveCMN did, and why quatre is not recognized. If you
could throw me a bone in that respect, I would appreciate it.
Unfortunately for my longer example, which is a story rather than a simple
count from 0-9, the recognition is horribly poor, despite being so accurate
for the count example. It seems that there may be some deeply embedded tuning
here for the English language which doesn't work for the French language
model.
I believe there is possibly some tuning that could be done for French, and I
would be interested in learning from those that have already blazed the trail.
Please let me know if you can provide some insight.
Hello
Please be more accurate and try to understand how things work. Your frontend
pipeline is wrong. Proper pipeline is cited in the forum thread we referenced,
you just need to read it carefully. Proper pipeline is:
It's clear that you also did not read post number 10 where indicated that I
did have BatchCMN in the pipeline before BatchAGC and it made things worse. Go
back up in the thread and see for yourself. Just in case I was wrong I double-
checked using your recommended pipeline. It's worse. The BatchAGC alone
pipeline is so far the best result I have seen (as I already wrote). Reread
10.
Simply put: what you have suggested is not the solution to this problem,
although the BatchAGC did improve matters. Don't blame me for a lack of
information or for lack of attention, I read everything carefully, which you
would know if you had read carefully yourself.
I have given you all of the information you need to reproduce the situation on
your end with a simple copy and paste. It would be one thing if you had this
set up and it was working for you but I understand that this is some educated
guesswork. I've given you the complete contents of all of my files and the
result. The one thing that is missing is the audio file.
The audio is here:
about.com
It works reasonably well with this pipeline (BatchAGC only):
If you have the inclination, you could use the file and code I provided to
reproduce the problem. I just modified the included Aligner.java, aligner.xml
files and added french_f0 into the mix.
My guess is that this is not enough to provide for accurate French recognition
and that some further tuning is needed. It seems that you believe otherwise,
but maybe you or an experienced user are willing to reproduce the problem
using the information I have provided and show me that it is just a simple
tweak as you have indicated.
Hello
I tried your audio and indeed it returns not so good results. There is some
issue with long silences between digits, if you'll remove them everything will
be way better. Result with cut silence is:
Aligner algorithm need some work it seems to deal with this particular case.
But that doesn't change the proper frontend configuration listed above since
the configuration is based on prior knowledge, not on the experiments. If
experiments were based on bigger amount of data, they can show which one
performs better.
According to previous discussions on this forum, problem with proper silence
classification might be caused by wrong positioning of dataBlocker component
in pipeline. Placing this component after VAD (after nonSpeechDataFilter to be
more exact) or removing it (if applicable) may solve this problem (see https:
//sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3894779/index/p
age/2)