Activity for Nickolay V. Shmyrev

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    It shouldn't be using OSS OSS is selected during sphinxbase configuration. You need to install alsa development headers, reconfigure sphinxbase and make sure alsa is selected, then reinstall sphinxbase, then reinstall pocketsphinx. I also recommend you to try more powerful boards for voice experiments.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Try Vosk https://github.com/alphacep/vosk-api/blob/master/java/demo/src/main/java/org/vosk/demo/DecoderDemo.java

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Ok, you can try to remove -static-libgcc from the flags, it might help. Otherwise you'd better recompile pocketsphinx/sphinxbase with mingw too. Or you use MSVC to compile your binary since libraries were built with msvc. In general the rule is that you use single compiler for all the libs and binaries in the project. On Windows they are not easily cross-compatible due to different runtimes.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I asked mingw OR with a visual studio

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Did you compile pocketsphinx_batch.exe and pocketsphinx.dll with mingw or with a visual studio?

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Then you need to check compilation params. It is something about dll, not about your code. It crashes when it plugs first sphixnbase functions.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Looks good now, thank you! Attention to details will help you in your programming career. As for exit after start, it exits because it fails to find the dlls (sphinxbase and pocketsphinx). They must be in the same folder where you run the program. You can use dll explorer to make sure dlls are present.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Please click edit button in your original post and format the code properly.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Can you format the code properly? You can use markdown syntax or just the vyswyg editor.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can call speech.ad.stop_recording before calling Festival: https://github.com/bambocher/pocketsphinx-python/blob/769492da47e41b71e3dd57a6b033fbba79e57032/swig/sphinxbase/ad_base.i#L77

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can improve the accuracy of recognition with a custom language model: https://cmusphinx.github.io/wiki/tutoriallm/ and also teach the model with the audio with acoustic model adaptation: https://cmusphinx.github.io/wiki/tutorialadapt/ In general you can get much better results without adaptation with more advanced and modern toolkits than pocketsphinx (Vosk).

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I'm doing some work with Sphinxtrain It is very outdated technology these days. Does the training process automatically search both spellings to find the best fit? No. Or, must the transcript file be coded to the correct pronunciation, (let's assume the subject ennuciated the second pronunciations) eg You can point proper transcription variant in the transcription file. You can also run forced alignment stage, it will try to select a proper pronunciation, but the accuracy is not guaranteed.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    For example you can check jigasi transcription with vosk server: https://community.jitsi.org/t/jigasi-open-source-alternative-of-google-speech-to-text/20739

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    jsapi is dead long time ago. You can use vosk https://github.com/alphacep/vosk-api

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    Is this project still being supported? No

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    If there is a resource that we can reference that it does not store data then that would be awesome. The source code is our documentation.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    https://cmusphinx.github.io/wiki/tutoriallmadvanced/

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Speech Recognition Theory

    That one very basic, you can try https://github.com/alphacep/vosk-api and https://alphacephei.com/vosk/models/vosk-model-small-pt-0.3.zip it should work fine

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Pocketsphinx is very old technology, not very accurate. Consider Vosk https://github.com/alphacep/vosk-api. You can use Vosk with Qt on Android, there is no problem to build a library and link it to Qt.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    frate: default is 100. This means the hop_length is 10 milliseconds, so the frames are generated at 0, 10, 20,... 990 th milliseconds, right? With -samprate = 16000, the hop_length is 160samples. Is this correct? Yes In each frame, what is number of samples? - The parameter wlen = 0.025625. I interpret this as framesize=0.025625 seconds. That is, 25.625milliseconds = 410 samples (with 16KHz sampling rate). Is this correct? Yes Or, is it nfft=512 parameter that defines framesize as 512 samples. no...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can check https://montreal-forced-aligner.readthedocs.io/en/latest/

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    There is voice activity detection which removes frames, you can add -remove_silence no to see remaining ones.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    where can i ask about vosk? On github or in a telegram group.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Arduino is too slow, it can not run any serious AI on it. Ubuntu OS is ok, though on Raspberry Pi you usually use specialized distro called Raspbian. Rpi3 is ok, you can run Vosk on it and get good accuracy. For more advanced application you'd better get Rpi4 though.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You have nice application that should fit Vosk capabilities. What is your problem then? You can download the library and use it.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    pocketsphinx is very old technology, it doesn't provide enough accuracy for modern standard. You can try vosk instead https://github.com/alphacep/vosk-api with daanzu English model.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Speech Recognition Theory

    does the language model restrict the recognition to its contents (words used to build the language model) Yes or can words outside of the LM (but included in the dictionary) still be recognized? No would it be better to build my own LM (myLM) and to use it with the en-us dictionary, or build a dictionary (myDict) from the same corpus and use the compination myLM and myDict ? You have to update both LM and the dictionary

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    https://stackoverflow.com/questions/4727480/how-do-i-make-my-perl-scripts-act-like-normal-programs-on-windows

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    Also https://github.com/jimregan/wolnelektury-audio-corpus

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    Check https://github.com/danijel3/ClarinStudioKaldi

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I'd like to have the ability to write to a log file without all the fluff, just the capture. And I don't see any way to get pocketsphinx to stop listening, like pause on command, or take any commands once it's running. This has to be done throught the API like Python API

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can write a script in Python

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    https://stackoverflow.com/questions/4727480/how-do-i-make-my-perl-scripts-act-like-normal-programs-on-windows

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Try threshold 1e-20, 30, 40 I replied you on stack overflow already.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    is it possible to train a model within 10days for 1400hours of speech? Yes, it is perfectly possible on machine like above.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Thanks for your reply. I will prefer to adapt my model since I have less amount of time. Do you think it would be possible to adapt the model with 1400hours of audio file? It is possible to adapt but accuracy will not be the best one.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    For 1400 hours it is better to train from scratch. 4gb of memory is pretty low, something like 16 at least better 64 You also need a gpu card for modern algorithms at least gtx1080.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on ticket #492

    Try https://github.com/alphacep/vosk-api, it supports Portuguese.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Use vosk https://github.com/alphacep/vosk-api

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    For such short words like "yes" it is impossible to do keyword spotting because false alarm rate is too high. You have to impelment full LVCSR recognizer with speaker separation and search in the results. cmusphinx tutorial is here https://cmusphinx.github.io/wiki/tutorial

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    In probability calculations it is important to properly describe probability spaces. Say you have position1 that would be space A1 and you have next word position 2 that would be space A2. You can write p("how are") = P (are|how) * P(how) and you'll reduce it to P(how|are) * P(are) by Bayes rule but here you need to be careful because in P(how | are) the first word "how" is still from space A1 and the second word "are" is still from the space A2 so you can not really replace it with P("are how"),...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Thanks Nickolay. So if I want log P(are | how) I should type lm.log_P("how are") - lm.log_P("how"), correct? No. P("how are") is about the order like I wrote above. So it is not simply P("how" & "are") but more like P("how" & "are" & "are follows how") so log_P("how are") - lm.log_P("how") is P(are | how & "are follows how") not simply P(are | how). There is an extra term that must balance P("how are") and P("are how"). I couldn't find the documentation that explained this. Can you point me to it?...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    And even last thing is wrong since order or words is important, you need to adjust this with the prob of "are going after how" vs "how going after are".

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    It doesn't work like that lm.log_p("how are") is not really log P(are | how) but more the estimate of probability of seeing both words together in a text corpus, i.e. log P(are | how) + log P(how)

  • Nickolay V. Shmyrev Nickolay V. Shmyrev modified a comment on discussion Help

    Well, if you are serious about this project you need a neural spotter otherwise its not going to work reliably, you can probably try https://github.com/hyperconnect/TC-ResNet, it supports tflite and should easily work on mobile. If you are not serious, just select a longer keyword and train better Italian model. Still, you need linux.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Well, if you are serious about this project you need a neural spotter otherwise its not going to wore reliably, you can probably try https://github.com/hyperconnect/TC-ResNet, it supports tflite and should easily work on mobile. If you are not serious, just select a longer keyword and train better Italian model. Still, you need linux.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Adaptation doesn't improve the accuracy of the keyword detection. For best detection keyword should have 3-5 syllables, your one is 2. If you still want to keep your keyword, you'd better adopt something like mycroft precise. But you will need to record much more keyword samples. Windows is not suitable for any kind of speech work.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    What are you trying to achieve overall? Do you want an English model?

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Yeah sure, thans for your suggestion, im gonna try using vosk, but it is able to make own lm and dict using vosk like pocketsphinx? Yes, sure, you can use kaldi toolkit for that.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Raspberry Pi CPU is too slow to decode with such configuration. You need to use smaller acoustic and language model or you can try more modern vosk library https://github.com/alphacep/vosk-api vosk is much faster to recognize large vocabulary.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    It is not very easy but somewhat doable, you can check for details http://vpanayotov.blogspot.com/2012/06/kaldi-decoding-graph-construction.html

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    How's a person supposed to learn it? Just keep coming here and asking questions? Yes. Many people can read code too, it is the best documentation. So little information, and no tutorials. That's a bummer! I'd think there should be some good tutorials around on how to use and modify vosk. Eventually there will be some tutorials but for now the speed of the development of the technology makes it very hard to create extensive documentation. You can check also https://groups.google.com/d/topic/kaldi...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    All the ALSA lib comments are coming from Pyaudio. So I'll, need to look into how I can turn those off. You need to cleanup the alsa config, see https://stackoverflow.com/questions/7088672/pyaudio-working-but-spits-out-error-messages-each-time Do I then need to dig that result out of there to put it into a string I can actualy use? It is json, you can parse it with json.loads: import json result = json.loads(rec.Result()) text = result['text'] Is there any documented source code for Vosk? No

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    How do I shut off all the other messages? add config.set_string('-logfn', '/dev/null')) see also https://stackoverflow.com/questions/17825820/how-do-i-turn-off-e-info-in-pocketsphinx I'd really like to find documentation on the Pocket Sphinx source code so I can see what the methods I call actualy do, and what parameters I can send them to do things like telling them not to print messages. There is C documentation here. https://cmusphinx.github.io/doc/pocketsphinx/files.html. There is no Python documentation,...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory You need to install libfortran.so.3 with sudo apt-get install libgfortran3 Is there anyone still alive who can answer questions about Pocket Sphinx.? I'd rather be using pocketsphinx to be honest. It was looking really promising for my specific project. You are welcome to ask.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    What exactly is vosk, and why is it needed? Isn't Kaldi suppoosed to be the SRE? If you want to simply use speech recognizer from python, you can use vosk prepackaged wheels and models. Kaldi is more a system for speech researchers with complex install, api and usage.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Try pip3 install https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp36-cp36m-linux_aarch64.whl

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    This feels like going back to square one moving over to vosk-kaldi. I hope it's worth it. Absolutely! Let me know if you have further questions.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I don't even know what it is, or how to use it? You are welcome to ask, Does vosk use pocketsphinx? No If not, what exactly is vosk? And where do I find detailed information on it beyond that github page, especially in terms of tutorials.? It is a software library to recognize speech just like pocketsphinx. I'm already having close to 100% accuracy with Pocket Sphinx. It's been decoding everything I throw at it with near perfection. Perhaps it likes the way I speek? If it is perfect already, what...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Try https://github.com/alphacep/vosk-api, it is much more accurate.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    Modern DNN recognizers mostly use log-mel filterbanks.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    No. And LPC is not that good for ASR unfortunately because a lot of information is in the residual.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Good. Try vosk-api between.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    it is a crash due to runtime mismatch as described in: https://cmusphinx.github.io/wiki/faq/#q-pocketsphinx-crashes-on-windows-in-_lock_file you need to check how your visual studio updated the project files. It most likely screwed things.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I recommend you to try vosk-api, a modern toolkit with higher accuracy: http://github.com/alphacep/vosk-api The installation on rpi4 is simple pip3 install vosk Code samples are here: https://github.com/alphacep/vosk-api/tree/master/python/example

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    As far as I can tell, -DMODELDIR invokes QT It doesn't. It just computes the header path and pass them to compiler. You can run: pkg-config --cflags --libs pocketsphinx sphinxbase To see what it outputs. . My next step was to try to compile gcc -o hello_ps hello.c You can't do that, you need the header path from pkg-config. You can substitute the pkg-config result yourself though.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    Please provide more information on the problem to get help (steps, logs, errors, environment)

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    vosk is faster than pocketsphinx on compatible hardware

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Its not about a config value, more about the code. Overall, for asterisk the vosk-api will be much more accurate, I recommend you to try it.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev modified ticket #491

    start detection threshold setting

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on ticket #491

    See the answer at https://sourceforge.net/p/cmusphinx/discussion/help/thread/e241c19421/?

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    This? -vad_startspeech 10 Num of speech frames to trigger vad from silence to speech.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    It does

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    sphinx4 is very old, try vosk-api: https://github.com/alphacep/vosk-api/tree/master/java

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Offline model update: https://github.com/alphacep/vosk-api/blob/master/doc/model.md Online words list: https://github.com/alphacep/vosk-api/blob/master/python/example/test_words.py

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You audio is 8khz, you can try with 8khz model to get a good recognition accuracy.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    On Windows install vosk-api with pip3 install https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp37-cp37m-win_amd64.whl To get help on the accuracy share the audio file recorded from your microphone to reproudce accuracy issues.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can share audio samples but most likely your bluetooth records have 8khz bandwidth and require 8khz model You can also get much better accuracy with vosk-api instead of livespeech.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    I have successfully runned dialogue demo in my mac, but I don't quite understand what is the result of the demo, can it recognize the speech I speaks to microphone? I do, by the way, but nothing happens (haha, a bit awkward, speak several times and nothing happens.) Probalby your microphone is muted/mailfunctioning shoud I do some audio check?? Yes

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Install linux

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Sphinx4 Help

    quick_lm only creates lm. For dictionary you need a custom tool since g2p doesn't work for chinese. You can get some inspiration from https://cc-cedict.org/wiki/ probably.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev modified a comment on discussion Help

    Speaker identification is not supported in pocketsphnx. We recently added speakerid to vosk library, you can use it instead; https://github.com/alphacep/vosk-api/blob/master/python/example/test_speaker.py

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Yes, you can. To install in on rpi simply type pip3 install vosk

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Speaker identification is not supported in pocketsphnx. We recently added speakerid to vosk library, you can use it instead; https://github.com/alphacep/vosk-api/blob/master/python/example/test_local_speaker.py

  • Nickolay V. Shmyrev Nickolay V. Shmyrev modified ticket #490

    Hypothesis.text empty or whitespace

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on ticket #490

    If you listen for mp3 file you'll hear it is clearly corrupted with sample rate conversions. So the other files. Same for wav file. Raw file you shared is 44100 Hz, with proper sample rate configuration it decodes fine. You can try wtih command line first instead of tlsphinx.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Same as https://sourceforge.net/p/cmusphinx/bugs/490/

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    If you want to reject other words, you need keyword spotting mode, not LM. You need to tune thresholds in keyphrase.list file for reliable detection, but I doubt you will be able to do it with such a similar keywords. For more accurate recognition you might try vosk-api https://github.com/alphacep/vosk-api

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    ps_get_hyp function returns current results, final or intermediate if you call it. https://cmusphinx.github.io/doc/pocketsphinx/pocketsphinx_8h.html#ada74b12d71e9d4db5d959b94004ff812

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Ok then. Timeout is perfectly possible, you just have to implement it yourself in your code. I do not see any problem here, just count the bytes processed and return if you got enough bytes.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    I'm sorry, it is hard to understand the purpose of the system and give you the advise. You simply experience accuracy issues. Asterisk is bad choice here since it only limits to 8khz which is less accurate than 16khz. Another thing would be to have more accurate system based on neural networks. https://github.com/alphacep/vosk-api should work on RPi if that is your embedded system. German model is here: https://github.com/alphacep/kaldi-android-demo/releases/download/2020-01/alphacep-model-android-de-zamia-0.3.tar.gz...

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You can create the dictionary before alignment with g2p library like phonetisaurus and then align.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Depends on what your problem, please elaborate what exactly are you trying to achieve.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Case matters, James should be lowercase since all our dictionary is lowercase. You can not recognize the word if it is missing in the dictionary unfortunately, not with pocketsphinx at least. Some modern recognizers allow vocabulary-free recognition though.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Yes, without dataset of the size specified in the tutorial it will not work. You can probably record more data and use less parameters in the model. There is a large research on low-resourced languages, I am just not sure you'll be able to apply it. For example, you can also add English data to training as a helper dataset and use a common phoneset.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Care to say what is your super secret language?

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    Not necessary, you can take arbitrary annotated recordings.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    No, until you have at least 50 hours of data.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    You see the unstability of results due to the small amount of training data.

  • Nickolay V. Shmyrev Nickolay V. Shmyrev posted a comment on discussion Help

    The audio is UK English, not US English, includes noise and 8khz telephony speech. Phonetic recognition of this would be problematic without accurate model. Even with the best neural network models phonetic recognition is not very accurate because it is hard to recognize very loosely defined phonemes in continuous speech. If you want to deal with the audio like this you'd better recognize with UK large vocabulary model and then convert to phonemes.

1 >
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.