I'm looking for a program that can detect voice in my recordings. I have hundreds of recordings in different formats, and in most of them the only valuable part is the voice part. However, many of them are hours long and only contain a few minutes, or even seconds, of voice.
Until now I have been using an audio editor to detect the voice manually. I open the file in a spectrogram view and visually look for voice waveforms. This method works but it's very time consuming. I'm looking for software that does it automatically and marks the voice parts or something similar.
The closest thing I've found so far are these Sphinx systems, but I'm not sure if they include a program that does what I'm looking for. Can you give me some feedback on this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi again, once again it's been a lot of time since my last post. I just don't know how to setup my account to send me a notification when there is a reply. For the time being I'll just create a reminder in my calendar.
My specific question is: How can I use the link you provided to achieve the results I'm looking for? As I said, I'm looking for a program (or any form of software) which allows me to detect voice in my recordings. Ideally, the program would take as an input a recording and provide as an output an audio file which contains only the parts of the original recording which have voice activity. However, anything that allows me to detect automatically the voice activity will do.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How can I use the link you provided to achieve the results I'm looking for? As I said, I'm looking for a program (or any form of software) which allows me to detect voice in my recordings.
Link points to a program that allows you to detect voice in your recordings.
Ideally, the program would take as an input a recording and provide as an output an audio file which contains only the parts of the original recording which have voice activity.
Obviously that doesn't work. I wonder if you're assuming I'm a Python programmer. I am not. When I run that command in my system (Windows) it just says it doesn't know what python is. Anyway, I installed Python, copied that code, saved it to a file, ran the command you said and, unsurprisingly, it didn't work either (ImportError: No module named 'webrtcvad'). So, my specific question is: How do I make that program work?
Audacity doesn't do VAD. I already discussed it with an Audacity developer: http://forum.audacityteam.org/viewtopic.php?f=21&t=10485&start=10. Nor does Adobe Audition. Truncate silence doesn't work for me because the noise to signal ratio is too high in my recordings.
What I need are instructions on how to run the VAD program that I can understand and follow. I am a former programmer, so I can figure out some stuff for myself (like I figured out how to run Python code). However, the instructions in https://github.com/wiseman/py-webrtcvad are cryptic for me.
When I say those instructions don't work in my system, what I mean is, for instance, that when I run pip install webrtcvad in my command prompt I just receive an error message because Windows doesn't recognize the pip command. A bit of googling suggests that those instructions are not system commands but Python code (meant to be written in a Python program I create, I guess). Is that correct? If so, how do I go about creating such a program and making it work (hopefully in Windows), keeping in mind that I've never written a line of Python?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, it is not covered in that link. As you can see, I'm using Python 3.5. That link covers the problem for versions up to 3.3, and the solutions it offers don't work with 3.5. You know, I've worked as a programmer for several years, so I know that most programmers have very weak communication skills. Still, I would ask you to make an effort to communicate if you want to help me with this.
Anyway, I found a page that seems to address the issue for all versions: https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/. Following its advice, I installed Visual C++ Build Tools 2015, uninstalled the version of setuptools that came with my Python installation (v20), and installed the latest version of setuptools (v30). Now I get this error message:
I'm looking for a program that can detect voice in my recordings. I have hundreds of recordings in different formats, and in most of them the only valuable part is the voice part. However, many of them are hours long and only contain a few minutes, or even seconds, of voice.
Until now I have been using an audio editor to detect the voice manually. I open the file in a spectrogram view and visually look for voice waveforms. This method works but it's very time consuming. I'm looking for software that does it automatically and marks the voice parts or something similar.
The closest thing I've found so far are these Sphinx systems, but I'm not sure if they include a program that does what I'm looking for. Can you give me some feedback on this?
You can use https://github.com/wiseman/py-webrtcvad
Sorry for the late reply. Can you tell me a bit more? I just checked webrtc.org and I didn't find anything about VAD.
Sure, if you ask more specific questions
The link above did not point to webrtc.org, its a separate github project.
Hi again, once again it's been a lot of time since my last post. I just don't know how to setup my account to send me a notification when there is a reply. For the time being I'll just create a reminder in my calendar.
My specific question is: How can I use the link you provided to achieve the results I'm looking for? As I said, I'm looking for a program (or any form of software) which allows me to detect voice in my recordings. Ideally, the program would take as an input a recording and provide as an output an audio file which contains only the parts of the original recording which have voice activity. However, anything that allows me to detect automatically the voice activity will do.
Link points to a program that allows you to detect voice in your recordings.
https://github.com/wiseman/py-webrtcvad/blob/master/example.py does exactly that. You run
it creates chunked wavs with voice. You can modify it according to your further needs.
Obviously that doesn't work. I wonder if you're assuming I'm a Python programmer. I am not. When I run that command in my system (Windows) it just says it doesn't know what python is. Anyway, I installed Python, copied that code, saved it to a file, ran the command you said and, unsurprisingly, it didn't work either (ImportError: No module named 'webrtcvad'). So, my specific question is: How do I make that program work?
Note: Needless to say, the instructions in https://github.com/wiseman/py-webrtcvad don't work in my system.
Ok, maybe audacity will work better for you then:
http://manual.audacityteam.org/man/truncate_silence.html
you can download it here:
http://www.audacityteam.org/download/
Audacity doesn't do VAD. I already discussed it with an Audacity developer: http://forum.audacityteam.org/viewtopic.php?f=21&t=10485&start=10. Nor does Adobe Audition. Truncate silence doesn't work for me because the noise to signal ratio is too high in my recordings.
What I need are instructions on how to run the VAD program that I can understand and follow. I am a former programmer, so I can figure out some stuff for myself (like I figured out how to run Python code). However, the instructions in https://github.com/wiseman/py-webrtcvad are cryptic for me.
When I say those instructions don't work in my system, what I mean is, for instance, that when I run
pip install webrtcvad
in my command prompt I just receive an error message because Windows doesn't recognize thepip
command. A bit of googling suggests that those instructions are not system commands but Python code (meant to be written in a Python program I create, I guess). Is that correct? If so, how do I go about creating such a program and making it work (hopefully in Windows), keeping in mind that I've never written a line of Python?http://stackoverflow.com/questions/4750806/how-do-i-install-pip-on-windows
OK, I ran
python -m pip install webrtcvad
in the Windows shell and I got this error:Any ideas?
This problem is covered in the link above, you just need to read it till the end:
http://stackoverflow.com/a/12476379/432021
Actually, it is not covered in that link. As you can see, I'm using Python 3.5. That link covers the problem for versions up to 3.3, and the solutions it offers don't work with 3.5. You know, I've worked as a programmer for several years, so I know that most programmers have very weak communication skills. Still, I would ask you to make an effort to communicate if you want to help me with this.
Anyway, I found a page that seems to address the issue for all versions: https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/. Following its advice, I installed Visual C++ Build Tools 2015, uninstalled the version of setuptools that came with my Python installation (v20), and installed the latest version of setuptools (v30). Now I get this error message:
Alonso, I think if you try it without installing the latest version of setuptools it should work.
I left this project for a long time and retook it today. It works now. Thank you, John!