CMU Sphinx / Forums / Speech Recognition Theory: Best software for detecting human voices in a large group of files

Oliver Carden - 2018-10-23

Apologies in advance if this is a topic that has been covered in great detail already.

I work for a small market research firm that conducts face to face interviews in developing countries. These interviews are carried out using tablet computers which are able to make recordings of portions of the interview for quality control purposes. One of our biggest challenges is quality control, and at the moment we employ people to manually go through the files that we have collected and note if they can hear the interview being conducted or not.

A typical project for us has around 2000-3000 interviews, which makes this process very time consuming. The files typically have background noise, and can have music or other distractions in them, so know that we would still have to do some manual checking. However, any software that could do a basic assessment of VOICE vs NO VOICE would be very helpful to us as it would reduce massively the number of interviews that we have to manually check, as those with no voice could be immediately flagged. It would also have to work in a wide variety of different languages. Many thanks for your help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-23
  
  There are many implementations around. You are looking for neural-network based audio tagging or speech activity detection trained on both music, noises and speech. The complexity of the solutions are varying.
  
  It all depends on the capabilities of your developer if he will be able to set it up and programming language/OS requriements. I would simply use
  
  http://kaldi-asr.org/models/m4
  
  but there are also more advanced solutions at https://www.kaggle.com/c/freesound-audio-tagging/leaderboard and pretty OK baseline at https://github.com/DCASE-REPO/dcase2018_baseline/tree/master/task2
  
  There are also projects on github like https://github.com/pyannote/pyannote-audio/tree/master/tutorials/speech-activity-detection
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2018-10-23
    
    If you are looking for something plug and play you can try any SAAS service, like https://www.speechmatics.com/, they will properly detect the speech and return you the transcript too.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oliver Carden - 2018-10-26

thanks for the replies!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Best software for detecting human voices in a large group of files

Speech Recognition Toolkit

Forums

Help

Best software for detecting human voices in a large group of files

Best software for detecting human voices in a large group of files

Speech Recognition Toolkit

Forums

Help

Best software for detecting human voices in a large group of files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Best software for detecting human voices in a large group of files