Adaptation recordings were made and no problems until this point which I do
not understand:
The -agc none parameter is very important. Make sure the arguments here match
the parameters in feat.params file inside the acoustic model folder. Please
not that not all the parameters from feat.param is supported by bw, only a few
of them. bw for example doesn't suppport upperf or other feature extraction
params. But those which supported should match.
So I proceeded to MLLR which failed, but I did not record the errors.
needed to be entered every time the program was run. So I created a file
called local.conf in the subdirectory /etc/ld.so.conf.d containing just the
line /usr/local/lib. That is,
Contents of /etc/ld.so.conf.d/local.conf:
/usr/local/lib
This works every time.
I also have reason to suspect the onboard sound card and inexpensive desktop
microphone are causing problems with noise and limited input level. The system
does work except only about 2% of all results are accurate.
Is there a way to compensate for the substandard equipment such as "beam
tuning" and such? Increasing the gain caused so much noise the engine thought
there was input.
Any help appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adaptation recordings were made and no problems until this point which I do
not understand:
This paragraph was corrected. I hope it's more clear now.
I then tried MAP and got this:
If you want to resolve the issue you have you just need to read the output of
the command. It told you you didn't specify the command correctly. The command
must be
The command shouldn't have redirection symbols inside. You can learn more
about shell commands and options reading the shell manual
The system does work except only about 2% of all results are accurate. Is
there a way to compensate for the substandard equipment such as "beam tuning"
and such? Increasing the gain caused so much noise the engine thought there
was input.
I don't think that your hypothesis about noise or substandard equipment
matters. Real issue is in some other place. To improve accuracy you need to
provide more information what are you trying to do. What command are you
running, what speech are you trying to recognize, what results do you get. You
need to be as precise as possible, it will help you to get the solution
quickly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am recognizing English of the Northeast United States. The kind that
pronounces Rs hard. In other words, not NYC and not Massachusetts, and
certainly not British.
Here is are typical, consistent results. The spoken word(s) are listed first
and the result second:
Hello 000000000: both
What is wrong with you INFO: ngram_search.c(474): Resized score stack to
200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: ngram_search.c(466): Resized backpointer table to 40000 entries
INFO: ngram_search.c(466): Resized backpointer table to 80000 entries
INFO: ngram_search.c(474): Resized score stack to 400000 entries
INFO: ngram_search.c(466): Resized backpointer table to 160000 entries
INFO: ngram_search.c(474): Resized score stack to 800000 entries
INFO: ngram_search.c(466): Resized backpointer table to 320000 entries
pocketsphinx_continuous: feat.c:362: feat_array_alloc: Assertion `nfr > 0'
failed.
Aborted
(This happens very infrequently and sometimes returns to the "READY" prompt
after a minute or two)
Restart.........
What time is it 000000000: couple times it
Are you having trouble 000000001: are to keep having trouble bit
Inconsistent 000000002: it is that a a systems to have
And so on......
I tried the MAP commands above and now come up with this:
Here is are typical, consistent results. The spoken word(s) are listed first
and the result second:
Please run
pocketsphinx_continuous -rawlogdir .
Note the dot (current dir) after rawlogdir. It will also dump audio it's
trying to recognize to a file. Pack the files into archive and upload them to
a public file sharing. Give here a link
I am recognizing English of the Northeast United States. The kind that
pronounces Rs hard. In other words, not NYC and not Massachusetts, and
certainly not British.
It doesn't matter. Most probable reason is that the default language model
provided is not really suitable to recognize the text you are trying to
recognize. You need to build your own language model or to download some
generic one like lm_giga.
./map_adapt \ -meanfn hub4wsj_sc_8k/means
In shell \ is used to escape sequences and join the lines. This way you pass
the argumnt " -meanfn" to the shell command (note the space which is escaped
by backslash). when I provided you the command in previous post it meant to be
multiline command. If you want to enter it as a single line use:
I have not yet run any adaptation scripts yet. These are from v. 0.7 release,
out of the box. The .raw format had to be converted to PNG-24 in Photoshop on
Windows 7 since nothing on openSUSE could do it. Let me know if they need
improvement.
Thank you for the CLI pointers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm, looking on your files it seems that adaptation will not help you
You have some issues with the driver or with pocketsphinx sound input API. The
audio is no recorded properly, it contains skips and jumps. Most likely some
issue with the driver
Can you record audio at all? Outside pocketsphinx. Can you record audio through ALSA api with arecord? Can you record with pulseaudio with parecord?
Which pocketsphinx/sphinxbase version are you using
In sphinxbase snapshot we implemented pulseaudio API. Can you try it?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have successfully recorded with Audacity for the adaptation recordings and
they played back well but the recording level required speaking directly into
the mic, otherwise weak. I will have to check the others. I do not like the
ALSA driver and mixer. There just seems to be something funny about it.
openSUSE is notorious for sound difficulties, in general. No help from ALSA,
there is nobody home, so to speak.
pocketsphinx 0.7 sphinxbase 0.7
I do not know what sphinxbase snapshot is or what to do with it if I found it.
Here is my system info in case it helps:
OS: openSUSE 11.4 x86_64
Kernel: Linux 2.6.37.6-0.5-desktop
Desktop: KDE 4.6.00 rel 6
Machine: HP xw9400 AMD 64 Opteron
Chipset: nVidia nForce Pro 3600 and 3050 (proprietary Tyan Thunder)
Drive: OCZ Vertex 60 GB dedicated system--single boot
RAM: 4 GB ECC
Video: nVidia GT200 (GeForce 210) 512 MB
2D Driver: nouveau
3D Driver: swarst (no 3D acceleration) (7.10)
Audio: Onboard "card 0": nVidia MCP55 Analog Stereo
Onboard "card 1": nVidia Corproation Digital Stereo
Chip: Realtek ALC262
Alsa Driver: v. 1.0.23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ran ALSA diagnostics. Some problems seem to be present. Notably ACPI...may
have to disable in BIOS. And, Clocksource tsc unstable, ALC262 codec not
ready, etc. I will have to research further. Disabling ACPI is easy and has
been known to cause problems in Linux distros if left activated. Maybe IRQ
adjustments, too. I noticed there was a swap which may be part of the
diagnostic routine, but if not, there should be no swapping since the system
has 4GB RAM.
Changes were made to ALSA ans Pulseaudio and recognition has improved greatly.
Would you please check some new rawlogdir files at:
ftp:\ftp.earlybirdmaintenance.net
user: earlybir
pass: a123
Also the MAP commands were corrected but it returns:
Changes were made to ALSA ans Pulseaudio and recognition has improved
greatly. Would you please check some new rawlogdir files at:
ftp:\ftp.earlybirdmaintenance.net
No, the recordings are still bad. See how it looks
Error (403) It seems you don't belong here! You should probably try logging
in?
But I remember when making my adaptation recordings, there were level
problems. I had to speak directly on top the mic to get a fat waveform, or it
would be so thin that obviously they were no good.
Ok, so you were right, it is not the program, it is the OS or drivers. I may
have to obtain a real audio card that has ALSA/Linux support, since SUSE or
anyone else cannot help.
My distro's sound involves KDE desktop, ALSA, Pulseaudio, and a player. A
change in any one of them could have drastic, systemwide affects.
It failed to create the file gauden_counts on previous step
Is there something wrong here and can it be fixed?
I tried MLLR again this time being careful with the commands. First this
returned:
Then mllr_solve was copied to my "working directory" which is the adaptation
folder where everything related to the adaptation procedure is located. Re-
running returned:
linux-2i5i:/home/vince/adaptation#./mllr_solve-meanfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means-varfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances-outmllrfnmllr_matrix-accumdir.INFO:cmd_ln.c(559):Parsingcommandline:./mllr_solve\-meanfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means\-varfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances\-outmllrfnmllr_matrix\-accumdir.Currentconfiguration:[NAME][DEFLT][VALUE]-accumdir.,-cb2mllrfn.1cls..1cls.-cdonlynono-examplenono-fullvarnono-helpnono-meanfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means-mllraddyesyes-mllrmultyesyes-moddeffn-outmllrfnmllr_matrix-varfloor1e-31.000000e-03-varfn/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variancesINFO:main.c(387):-- 1. Read input mean, (var) and accumulation.WARN:"s3io.c",line256:Unabletoopen/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/meansforreading;NosuchfileordirectoryFATAL_ERROR:"main.c",line397:Couldn'tread/usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means
So it is looking for installation files in /usr/local/share. In the tutorial
it was unclear to me what the "working directory" actually was and where it
should be located. At least everything is located in one place. I did
successfully create the recordings with the corresponding .mfc files. All the
other files are there as instructed as well.
I am aware that adaptation is not going to solve my accuracy problem. I just
want to be ready after the driver/software problem is corrected.
Then, after it is proved the engine works acceptably, I will invest the time
and effort in my application. This appears to be what amounts to Dragon for
Linux, or hopefully IBMs Watson ( I may need some help here).
Thank you very much for your help. There seems to be hope!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
with no errors or complaints. Recognition seems slightly improved, except at
the initial utterance. The beginning of sentences usually return bizarre
results as always, the latter parts are fairly good. But "excuse me" almost
invariably returns perfectly. I have never seen "hello" by itself come back.
It always returns "both" or something else.
Further research into the OS/driver issue on my machine is pointing to
possible APIC issues, although it may still be the driver as well.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Success!! A Creative X-FiTitanium Fatal1ty Pro was installed and solved the
mic input level and distortion problem. Pocketsphinx works with guestimated
90% accuracy OOTB!!
For the record, I did not have the linking correct. For openSUSE create a text
file called local.conf in the subdirectory /etc/ld.so.conf.d containing just
the lines:
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
Then as root run:
ldconfig
I reinstalled Pocketsphinx after this and the install proceeded without any
warnings and make check approved everything.
Running
pkg-config –cflags –libs pocketsphinx sphinxbase
to confirm installation, returned the proper information according to Building
Application notes.
It appears this modification to ld.so.conf.d is to be made BEFORE Pocketsphinx
is installed.
Proceeding to my application, the literature and code in Sphinx4 appears to be
where to go, since the goal is passing Total Turing Test. Sphinx4 could be
made faster with fast processor and fast SSD on dedicated system board (or
two).
I will start new thread on programming after further research in language
processing textbook and learning some Java.
If you have any pointers here, kindly let me know............
Thank you so much for your patience and help.!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adaptation recordings were made and no problems until this point which I do
not understand:
The -agc none parameter is very important. Make sure the arguments here match
the parameters in feat.params file inside the acoustic model folder. Please
not that not all the parameters from feat.param is supported by bw, only a few
of them. bw for example doesn't suppport upperf or other feature extraction
params. But those which supported should match.
So I proceeded to MLLR which failed, but I did not record the errors.
I then tried MAP and got this:
Note: These lines
needed to be entered every time the program was run. So I created a file
called local.conf in the subdirectory /etc/ld.so.conf.d containing just the
line /usr/local/lib. That is,
Contents of /etc/ld.so.conf.d/local.conf:
This works every time.
I also have reason to suspect the onboard sound card and inexpensive desktop
microphone are causing problems with noise and limited input level. The system
does work except only about 2% of all results are accurate.
Is there a way to compensate for the substandard equipment such as "beam
tuning" and such? Increasing the gain caused so much noise the engine thought
there was input.
Any help appreciated.
This paragraph was corrected. I hope it's more clear now.
If you want to resolve the issue you have you just need to read the output of
the command. It told you you didn't specify the command correctly. The command
must be
And not
The command shouldn't have redirection symbols inside. You can learn more
about shell commands and options reading the shell manual
I don't think that your hypothesis about noise or substandard equipment
matters. Real issue is in some other place. To improve accuracy you need to
provide more information what are you trying to do. What command are you
running, what speech are you trying to recognize, what results do you get. You
need to be as precise as possible, it will help you to get the solution
quickly.
Thank you for responding.
I use
to start the program.
I am recognizing English of the Northeast United States. The kind that
pronounces Rs hard. In other words, not NYC and not Massachusetts, and
certainly not British.
Here is are typical, consistent results. The spoken word(s) are listed first
and the result second:
Hello 000000000: both
What is wrong with you INFO: ngram_search.c(474): Resized score stack to
200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: ngram_search.c(466): Resized backpointer table to 40000 entries
INFO: ngram_search.c(466): Resized backpointer table to 80000 entries
INFO: ngram_search.c(474): Resized score stack to 400000 entries
INFO: ngram_search.c(466): Resized backpointer table to 160000 entries
INFO: ngram_search.c(474): Resized score stack to 800000 entries
INFO: ngram_search.c(466): Resized backpointer table to 320000 entries
pocketsphinx_continuous: feat.c:362: feat_array_alloc: Assertion `nfr > 0'
failed.
Aborted
(This happens very infrequently and sometimes returns to the "READY" prompt
after a minute or two)
Restart.........
What time is it 000000000: couple times it
Are you having trouble 000000001: are to keep having trouble bit
Inconsistent 000000002: it is that a a systems to have
And so on......
I tried the MAP commands above and now come up with this:
Again, much thanks for your interest.
Please run
Note the dot (current dir) after rawlogdir. It will also dump audio it's
trying to recognize to a file. Pack the files into archive and upload them to
a public file sharing. Give here a link
It doesn't matter. Most probable reason is that the default language model
provided is not really suitable to recognize the text you are trying to
recognize. You need to build your own language model or to download some
generic one like lm_giga.
In shell \ is used to escape sequences and join the lines. This way you pass
the argumnt " -meanfn" to the shell command (note the space which is escaped
by backslash). when I provided you the command in previous post it meant to be
multiline command. If you want to enter it as a single line use:
Without backslashes.
Ok. 5 utterances @ http://www.flickr.com/photos/56063668@N07/
I have not yet run any adaptation scripts yet. These are from v. 0.7 release,
out of the box. The .raw format had to be converted to PNG-24 in Photoshop on
Windows 7 since nothing on openSUSE could do it. Let me know if they need
improvement.
Thank you for the CLI pointers.
Sorry: http://www.flickr.com/photos/56063668@N07/
Sorry, i wanted to get your raw files. What I am supposed to do with that
flickr link?
File sharing resource is for example http://dropbox.com
ftp://ftp.earlybir.w06.winhost.com/
username: earlybir
passwd: a123
Hm, looking on your files it seems that adaptation will not help you
You have some issues with the driver or with pocketsphinx sound input API. The
audio is no recorded properly, it contains skips and jumps. Most likely some
issue with the driver
I have successfully recorded with Audacity for the adaptation recordings and
they played back well but the recording level required speaking directly into
the mic, otherwise weak. I will have to check the others. I do not like the
ALSA driver and mixer. There just seems to be something funny about it.
openSUSE is notorious for sound difficulties, in general. No help from ALSA,
there is nobody home, so to speak.
pocketsphinx 0.7 sphinxbase 0.7
I do not know what sphinxbase snapshot is or what to do with it if I found it.
Here is my system info in case it helps:
OS: openSUSE 11.4 x86_64
Kernel: Linux 2.6.37.6-0.5-desktop
Desktop: KDE 4.6.00 rel 6
Machine: HP xw9400 AMD 64 Opteron
Chipset: nVidia nForce Pro 3600 and 3050 (proprietary Tyan Thunder)
Drive: OCZ Vertex 60 GB dedicated system--single boot
RAM: 4 GB ECC
Video: nVidia GT200 (GeForce 210) 512 MB
2D Driver: nouveau
3D Driver: swarst (no 3D acceleration) (7.10)
Audio: Onboard "card 0": nVidia MCP55 Analog Stereo
Onboard "card 1": nVidia Corproation Digital Stereo
Chip: Realtek ALC262
Alsa Driver: v. 1.0.23
Update:
Reinstalled ALSA drivers..no improvement.
Ran ALSA diagnostics. Some problems seem to be present. Notably ACPI...may
have to disable in BIOS. And, Clocksource tsc unstable, ALC262 codec not
ready, etc. I will have to research further. Disabling ACPI is easy and has
been known to cause problems in Linux distros if left activated. Maybe IRQ
adjustments, too. I noticed there was a swap which may be part of the
diagnostic routine, but if not, there should be no swapping since the system
has 4GB RAM.
ALSA dmesg available here: http://www.alsa-
project.org/db/?f=e88fdf992b5297289437ddfa9e4e6206fcce27b8
It is quite extensive.
Update: Ran arecord test .wav recording. The playback was terrible, could
hardly understand, sounded broken up.!
Try to build sphinxbase with OSS or pulseaudio support. Maybe those systems
will work better for you.
I will try rebuilding sphinxbase but I do not know how to add support for OSS
or Pulseaudio.
I did try
with both of the only two devices, 0 and 1 and it returns
for example.
Changes were made to ALSA ans Pulseaudio and recognition has improved greatly.
Would you please check some new rawlogdir files at:
ftp:\ftp.earlybirdmaintenance.net
user: earlybir
pass: a123
Also the MAP commands were corrected but it returns:
Thank you for your help as it appears progress is being made.
Sorry the link is: ftp://ftp.earlybirdmaintenance.net
No, the recordings are still bad. See how it looks
https://dl-web.dropbox.com/get/Public/a.png?w=27168a88
Compare to proper audio
http://cmusphinx.sourceforge.net/wiki/tutorialconcepts
It failed to create the file gauden_counts on previous step
The site asked for a login which I do not have. https://dl-
web.dropbox.com/get/Public/a.png?w=27168a88 giving:
But I remember when making my adaptation recordings, there were level
problems. I had to speak directly on top the mic to get a fat waveform, or it
would be so thin that obviously they were no good.
Ok, so you were right, it is not the program, it is the OS or drivers. I may
have to obtain a real audio card that has ALSA/Linux support, since SUSE or
anyone else cannot help.
My distro's sound involves KDE desktop, ALSA, Pulseaudio, and a player. A
change in any one of them could have drastic, systemwide affects.
Is there something wrong here and can it be fixed?
I tried MLLR again this time being careful with the commands. First this
returned:
Then mllr_solve was copied to my "working directory" which is the adaptation
folder where everything related to the adaptation procedure is located. Re-
running returned:
So it is looking for installation files in /usr/local/share. In the tutorial
it was unclear to me what the "working directory" actually was and where it
should be located. At least everything is located in one place. I did
successfully create the recordings with the corresponding .mfc files. All the
other files are there as instructed as well.
I am aware that adaptation is not going to solve my accuracy problem. I just
want to be ready after the driver/software problem is corrected.
Then, after it is proved the engine works acceptably, I will invest the time
and effort in my application. This appears to be what amounts to Dragon for
Linux, or hopefully IBMs Watson ( I may need some help here).
Thank you very much for your help. There seems to be hope!!
Ran:
with no errors or complaints. Recognition seems slightly improved, except at
the initial utterance. The beginning of sentences usually return bizarre
results as always, the latter parts are fairly good. But "excuse me" almost
invariably returns perfectly. I have never seen "hello" by itself come back.
It always returns "both" or something else.
Further research into the OS/driver issue on my machine is pointing to
possible APIC issues, although it may still be the driver as well.
Try this link
http://dl.dropbox.com/u/26073448/a.png
Or just use wavesurfer to explore your raw files:
https://sourceforge.net/projects/wavesurfer/
As for missing files, they are indeed missing. You need to specify the path
properly, then the file will be successfully processed.
Success!! A Creative X-FiTitanium Fatal1ty Pro was installed and solved the
mic input level and distortion problem. Pocketsphinx works with guestimated
90% accuracy OOTB!!
For the record, I did not have the linking correct. For openSUSE create a text
file called local.conf in the subdirectory /etc/ld.so.conf.d containing just
the lines:
Then as root run:
I reinstalled Pocketsphinx after this and the install proceeded without any
warnings and make check approved everything.
Running
to confirm installation, returned the proper information according to Building
Application notes.
It appears this modification to ld.so.conf.d is to be made BEFORE Pocketsphinx
is installed.
Proceeding to my application, the literature and code in Sphinx4 appears to be
where to go, since the goal is passing Total Turing Test. Sphinx4 could be
made faster with fast processor and fast SSD on dedicated system board (or
two).
I will start new thread on programming after further research in language
processing textbook and learning some Java.
If you have any pointers here, kindly let me know............
Thank you so much for your patience and help.!!