What is noise power modulating speech power in PNCC feature extraction?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

What is noise power modulating speech power in PNCC feature extraction?

Forum: Speech Recognition Theory

Created: 2018-12-30

Updated: 2018-12-30

ethan - 2018-12-30

I am reading the latest (2016) PNCC paper. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7439789

There are too many blocks to understand in it and I am getting lost in the explanation.

Specifically I want to know How and Why the medium time power calculation is used to motify the short-time power calculation.

It says in the paper (page 7, left column, bottom) that

The time-averaged, frequency-averaged transfer functionS˜[m, l] is used to modulate the original short-time power P[m, l]

I don't understand why do they use the term modulation. I thought they were going to subtract the noise power from speech signal power, why are the multiplying the two?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.