speaker recognition by using MFCC
3 views (last 30 days)
Show older comments
Hi all, currently I am on my way to start my speaker recognition project by using MATLAB. I am a beginner in MATLAB project so please forgive my any tedious questions. I have so some research on speaker recognition and have some the idea on how to do.
Overall, what I planning to do is:
- Get sound input
- Do normalization on signal to standardize the volume of sound by using mapminmax()
- Pre-emphasing in order to boost the amount of energy in the high frequency by using filter()
- Frame blocking, windowing and convert to melcepst spectrum by using melcepst()
- Get the feature vectors by using kmeans() as I going to apply neural network.
- Create a feed-forward backpropagation network by using newff().
- Neural network training.
My question at here is:
- Do we necessary set the frame length to 256? Why?
- Do we necessary to apply Fast Fourier Transform? Why? As I know the function is to transform the signal from time domain into frequency domain.
- Do melcepst() function provides Fast Fourier Transform function? As I know melcepst function already combined frame blocking and windowing function.
- If I apply melcepst function, then where to put the Fast Fourier Transform?
Answers (2)
Walter Roberson
on 6 Dec 2017
"Do we necessary set the frame length to 256?"
No, that is just convenient. Coefficients can be constructed for other lengths.
"Do we necessary to apply Fast Fourier Transform?"
No, there are other approaches that can be taken in theory.
"Do melcepst() function provides Fast Fourier Transform function?"
No -- and not just in theory, practical melcepst() routines do not need fft.
"If I apply melcepst function, then where to put the Fast Fourier Transform?"
2 Comments
Walter Roberson
on 6 Dec 2017
"if I don't apply FFT, how could I transform it from time domain to frequency domain?"
You have a signal which can be assumed to be 0 for negative time. Because of that, you can potentially apply theory involving laplace transforms instead of fourier transforms. There are likely other mathematical approaches you could use.
Let me put it this way: a couple of months ago I was playing a computer game in which there were planned ways to get through various stages. But I discovered that in places if I did the equivalent of jamming my fingertips into cracks, I could slowly force my way up walls and then very carefully inch my way along the game model's line between rooms, go hand-over-hand along the ceiling, and drop on the other side of the barrier, skipping over four puzzles in doing so. "Do you have to grab a rock and knock this wall down, taking about 5 minutes to go through the section? No, you do not have to -- you can spend 45 minutes exploiting the limits of the game mechanics to get around the walls instead."
It is not always necessary to do things the obvious way, but it can certainly be a lot easier if you do so.
See Also
Categories
Find more on Speech Recognition in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!