Why are 8 STFT vectors used for the predictor input, in the "Denoise Speech Using Deep Learning Networks" example ?

2 views (last 30 days)
In the MATLAB example of denoising speech with deep learning, I have a hard time in grasping why they used 8 STFT segments for their predictor input.
it's been stated and underlined in this section;
Please does anyone get it?

Answers (1)

Sahil Jain
Sahil Jain on 1 Sep 2021
Hi Daniel. The example states "The predictor input consists of 8 consecutive noisy STFT vectors, so that each STFT output estimate is computed based on the current noisy STFT and the 7 previous noisy STFT vectors". This may have been done because the authors of this approach believe that taking into account the noisy STFT vectors of the current segment and the noisy STFT vectors of the previous 7 segments would lead to better performance. I would suggest going through the research articles mentioned in the references at the end of the example to further understand the motivation for doing this. Also, you can try training the network using only the current segment as input and see how it performs in comparison to using 8 segments.

Categories

Find more on Machine Learning and Deep Learning for Audio in Help Center and File Exchange

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!