CMU Sphinx / Forums / Speech Recognition Theory: From Speech Recognition to Speaker Recognition?

Travis Banger - 2015-03-20

There must be folks around here who know a lot about Voice Identification? Actually, the preferred term seems to be Speaker Identification.

Can anybody please help with the following project?

The subject, Mr. Bill O'Reilly, is the undisputed star of the main news network in the US: Fox News.

The incident occurred in 1977. The subject has written (*) books in which he maintains (and redoubles to this day to his national audience) that he was in Palm Beach, Florida when he heard a gunshot from a Russian Baron who had committed suicide.

The unexpected, fascinating development is that a cassette containing a phone conversation has turned up. That audio material - IF verified by you experts - will demonstrate that the subject -barring time travel- is not telling the truth. Some contend that he called from Dallas, Texas, located 1,250 miles away, and was informed about the dramatic suicide, which made international news, prompting a Congressional investigation that lasted several years, and cost millions of dollars.

In short, we have a number of voice clips with his voice, some from 30+ years ago. Therefore, the voice change along the years will not be an issue. The task is to determine whether the voice played in the CNN interview -from the cassette just discovered- is the same as the myriad samples of his voice that are available.

Spasibo

() and narrated*.

Last edit: Travis Banger 2015-03-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-20

A previous post of mine "From Speech Recognition to Lie Recognition?" gave the incorrect impression that we are trying to analyze voice in order to detect lies based in the signal content: i.e. some sort of "Voice-based Lie Detector" contraption.

That is not it at all!

What we need is simpler, and is being widely used in biometric systems. For the Sphinx crowd it should be well within their domain of expertise: Are these 2 sets of voices, presumably from the same speaker. Is the unknown person in the cassette recordings, indeed Mr. Bill O'Reilly? A numeric degree of certainty would be important.

Last edit: Travis Banger 2015-03-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-20

Relevant data files and background information is located here:

http://goo.gl/T06BJx
[For those unfamiliar with Google Drive, you only click ONCE]

Make sure to take a look at the contents of the folder named "Learning Material".

I need more material for that folder!! Suggestions are welcome, specially papers/material for the laymen and for those -like me- with a technical/engineering background, but without a Masters/PhD.

Last edit: Travis Banger 2015-03-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-20

Basic question: I named a folder "Utterances" - Is that the proper term?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-20

More questions:

My basic hunt has been for common single words; additionally, some times I found two:

"tomorrow morning"

"down there"

Should I look for "interword" sounds? The end of a word as it is bridged/enunciated into the next?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-20

Obviously, in a forum of this nature, there are participants from all over the world.

The cool thing about American politics is that pretty much the whole planet can opine about them, even domestic matters: That's a necessary consequence of being the Rome of today's world, or close.

What I am trying to say is: do not allow reservations to prevent you from helping.

If the US gave you the Internet, you should return the favor.

Well, enough of politics! Let's do some technical investigations!

Last edit: Travis Banger 2015-03-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-03-21
  
  Speaker identification is an interesting and widely popular subject which we might add in coming future. It does not only help in biometric applications, but also useful for standard decoding where it assists speaker adaptation.
  
  However, it is worth to note that certainty of the detection is very small and confusion rate is very high compared to other biometrics solutions, so this method can only assist in identification and can not be solely used as an evidence.
  
  This is quite an active research, you can read the following publication for introduction:
  
  http://www.cstr.ed.ac.uk/downloads/publications/2014/chapter7_anti-spoofing.pdf
  
  Basically, speaker verification is easily fooled, it is too uncertain verification.
  
  Last edit: Nickolay V. Shmyrev 2015-03-21
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Travis Banger - 2015-03-22

Thanks, Nick!

In my Internet research, I bumped into some documentation: The FBI claims to have a voice/speaker identification system which has some 0.43% false positive rate and 0.56% false negatives (those are NOT necessarily the exact numbers that I read! I am quoting from memory. I need to find that page again, but the failures are lower than 1%).

Additionally, in my case, I have these advantages:

(1) The individual didn't know that his friend (a Congressional Investigator doing the business of We The People) wisely recorded all his investigation-related phone calls.

IOW: I am interested in the specific case in which there is no malice involved. Read on:

(2) The subject had no way of knowing that 38 years later, his voice statements would become the focus of national attention. He was an entry-level reporter who has risen to stardom, becoming The Most Trusted Spokesperson, aka "King of Cable News" (*) in The Most Trusted News Network.

Spasibo.

(*) Can't resist repeating a little joke that somebody replied in Usenet's comp.dsp:

"Ramon, you don't need no fancy lie detection algorithms with Bill! Just listen to him for 10 minutes!" :-)

Last edit: Travis Banger 2015-03-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-03-23
  
  "Ramon, you don't need no fancy lie detection algorithms with Bill! Just listen to him for 10 minutes!" :-)
  
  Fair enough.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

From Speech Recognition to Speaker Recognition?

Speech Recognition Toolkit

Forums

Help

From Speech Recognition to Speaker Recognition?

From Speech Recognition to Speaker Recognition?

Speech Recognition Toolkit

Forums

Help

From Speech Recognition to Speaker Recognition? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

From Speech Recognition to Speaker Recognition?