Why are 8 STFT vectors used for the predictor input, in the "Denoise Speech Using Deep Learning Networks" example ?
1 view (last 30 days)
Sahil Jain on 1 Sep 2021
Hi Daniel. The example states "The predictor input consists of 8 consecutive noisy STFT vectors, so that each STFT output estimate is computed based on the current noisy STFT and the 7 previous noisy STFT vectors". This may have been done because the authors of this approach believe that taking into account the noisy STFT vectors of the current segment and the noisy STFT vectors of the previous 7 segments would lead to better performance. I would suggest going through the research articles mentioned in the references at the end of the example to further understand the motivation for doing this. Also, you can try training the network using only the current segment as input and see how it performs in comparison to using 8 segments.