Mel frequency cepstral coefficient feature extraction that closely matches that of HTK's HCopy.

Computes mel frequency cepstral coefficient (MFCC) features from a given speech signal. The speech signal is first preemphasised using a first order FIR filter with preemphasis coefficient. The preemphasised speech signal is subjected to the short-time Fourier transform analysis with a specified frame duration, frame shift and analysis window function. This is followed by magnitude spectrum computation, followed by filterbank design with M triangular filters uniformly spaced on the mel scale between lower and upper frequency limits. The filterbank is applied to the magnitude spectrum values to produce filterbank energies (FBEs). Log-compressed FBEs are then decorrelated using the discrete cosine transform to produce cepstral coefficients. Final step applies sinusoidal lifter to produce liftered MFCCs that closely match those produced by HTK. Demo scripts are included.

Kamil Wojcicki (2021). HTK MFCC MATLAB (https://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab), MATLAB Central File Exchange. Retrieved .

hanif omarcan it run 2019b ?

Bafie Rambymaysaa lamarihello Mr. Kamil,

i would like to ask you about the frames matrix in your code. what i could change if i would have a vector of frames not a matrix. i will be really gratful if you answer my question, and thanks for your cooperation

Hoang An NguyenEsmail yahyahello, can anyone help me, please?

Error using dtw

Expected input number 1, X, to be non-NaN.

Error in dtw (line 87)

validateattributes(x,{'single','double'},{'nonnan','2d','finite'},'dtw','X',1);

Error in test (line 40)

dis(1) = dtw(tMFCCs, MFCCs1) % find the euclidian distance using "dtw"

function and store it in a array

kc@Kamil

hi m using matlab syntax mfcc for calculation of coefficents.It says default no of coeffs are 13 but in result 14 column are found plz help.Also in this do I need to do pre-emphasis,framing,overlapping ,windowing,or filtering.I think its inbuilt in code?

Emre Mutluhello, can anyone help me, please?

l have a voice signal 2 seconds and 16000 samples and l want to speech recognition with mel filter so l divided it into 40 frames for each frames 560 samples then apply hamming and l took the power of the signal then l want to apply triangle filter but l am not sure that which l should be used for frequency. when l choose 0-8000 Hz l face to a fault with the matrix. l am using frequency range 0 and framesize/2.

Can you help me for this problem? Thanks

Kamil WojcickiIqbal, example.m shows you how you can extract MFCC features from an audio file. You can use that for whatever projects you want, but those are outside of the scope of this submission.

Iqbal Azizany advice? i try to do speaker recognition with mfcc..i want to use your code with by using my data and classify it with any such as LDR or Vector Quantization..

Kamil WojcickiHi Iqbal, The example.m script shows you how to call the mfcc function. It may be helpful if you have a look at a introduction to MATLAB tutorial. Best, Kamil

Iqbal Azizi just dont understand [ MFCCs, FBEs, frames ] = ...

mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

it say too many input argument..however if i run example.m directly its can be executed..

Kamil Wojcickihi Laboid, please see example.m. Only MFCCs are included in this submission. Best, Kamil

labiod asmaHello, Mr. Kamil, thanks for sharing , could you please explain me how to extracted the following parameters: MFCC,PLC i'm just a beginner in matlab and all i did was read the wav file with :

fid=fopen('filename','r');

x=fread(fid,'int16');

plot(x);soundsc(x,88200);

salvador florido llorensHello kamil,

Your code is clean and concise, my congrats! I am working with HTK, and concretely I am trying to generate my own features from matlab to train an HMM model by means of HTK. My issue is that: when I used your code to generate MFCC without ENERGY as first coefficient, I realized that HTK MFCC structure and yours are very different. As a consequence, when I try to train with your MFCC features, the accuracy results are too low, in constrast of built HTK features accuracy. Do you know which steps are differing from your implementation to HTK so as to give such huge different results in HTK accuracies?

Thank you for your contribution!

LRMyasir hussainHi,

anybody mail me the working code for MFCC because i getting error.. thanks in advanced

email: L175117@lhr.nu.edu.pk

yasir hussainyes sir i have 2018 version of matlab and also have replace waveread with audioread

Kamil WojcickiHi Yasir,

What do you mean by "the code"? Do you mean mfcc.m? Have a look at example.m on how to use the function, or type "help mfcc" at MATLAB prompt.

Also, for newer versions of MATLAB, you'll have to replace calls to wavread in examples with audioread, since the former are no longer supported.

Best, Kamil

yasir hussainhi ,when i run the code i am getting following error

Error using vec2frames (line 79)

usage: [ frames, indexes ] = vec2frames( vector, frame_length, frame_shift, direction, window, padding );

can anyone help me

Kamil WojcickiHi Bipasha, It looks like an issue with how the audioread function is being called. Could you post that snippet of your code? Best, Kamil

bipasha kHi Kamil,

When I try to run the code with your example speech file, the system crashes & I get the following error:

------------------------------------------------------------------------------------------------------

Error using audioread (line 135)

Out of memory. The likely cause is an infinite recursion within the program.

------------------------------------------------------------------------------------------------------

Could you please help me with this?

Regards,

Bipasha

Kamil WojcickiHi Tariq, have you tried plotting the input to the mfcc function? Are there any zeros or NaNs in it? You could also try stepping using the MATLAB IDE debugger to find where variables become NaN. Best, Kamil

Tariq AnwarI'm getting all the column vector of MFCC as NaN. Any help where i've gone wrong ?

Kamil WojcickiLee, could you share the error message you are getting?

Lee Miskoits can't work on matlab 2017

Dalibor KnisI MChaliPKamil WojcickiHi Nurul, it looks like it failed to write the pdf file (with the figure) to disk. Could be permissions issue or something else. In any case you can comment out the print line. Best, Kamil

Nurul Ikhlas Septiaihello, Mr. Kamil

im new with this stuff, thanks for ur sharing.. its really help..

im trying to run ur program and ive got this in example.m

Error using name (line 102)

Cannot create output file '.\example.pdf'.

Error in print (line 85)

pj = name( pj );

Error in example (line 78)

print('-dpdf', sprintf('%s.pdf', mfilename));

can u help me what is that mean, but well the mfcc pic is still show when im running it.

Jonnathan JaraKamil WojcickiFor LPCCs you'll want to calculate LPCs and then apply a recursion to convert them to LPCCs, see for example: https://www.mathworks.com/help/dsp/ref/lpctofromcepstralcoefficients.html

For your second question, see for example: http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf

zohaibThanks for your response, ok i will look at this again,

if we want LPCC feature vectors which changes are to be done in this code. And what information we are getting from the cepstrum plots.

Kamil WojcickiHi Zohaib, please have a closer look at mfcc.m and the references there in. Best, Kamil

zohaibHello Kamil,i have few question.

What L = 22 parameter tells us, What information we are getting from the log filterbank energies and mel cepstrum plots as shown in your results, I want to plot MFCC feature vector where we are getting this in code and how to plot it, like shown in this link https://www.researchgate.net/figure/Histogram-representation-of-the-second-coefficient-of-MFCC-features-and-the-Gaussian-fit_fig2_47465970, What changes we have to made if we want LPCC feature vectors in your code.

Wait for your help.

Thanks

Kamil Wojcickihi Akbar, the output dimensions will depend on (a) the number of features per frame (which in turn depends on the parameters selected) and (b) the length of the input signal. The former will be fixed for the set of selected parameter values. The latter will vary depending on how much audio you feed in. Best, Kamil

Akbar FatkhiHi Mr. Kamil, thanks for sharing. Your code works very well.

Could you please help me how to make the outputs always have the same matrix dimensions?

Kamil WojcickiHi Jonathan,

In short yes - please have a look at what parameters are supported by the MFCC function's API, either by using the "help mfcc" command, or by opening mfcc.m. For anything else, you'll have to add params to the function API and expose them that way.

HTH,

Kamil

Jonathan ZieglerHello Kamil, thanks for sharing! But how can we change the setting for alpha, R, M and L ? The helpfile names ‘WindowLenght’, XX, ‘OverlapLength’, XX and so one, but what about the ‘premphasis Coesff’, ‘frequency range’, ‘number of filterbank channels’ and ‘cepstral sine lifter parameter’?

Best,

Hen

Kamil WojcickiThanks Ahmed, for sharing your solution with others.

Kamelhello again,i figured it out.

the "audioread" returns N x 2 array,The left channel is in the first column and the right channel is in the second column,and the values are so close to each other so you can take the average to transfer it to a vector to be used in the "vec2frames"

Kamil WojcickiSam, ensure that the density modeling is working fine. You can use other (simpler) features to start with, e.g., LPCs. Then, once that is working, compare performance of different features. You could also try different modeling/classification backends. Beyond this, is it is difficult help without actual hands-on work, and this is perhaps not the best forum for this either. Good luck. Kamil

Kamil WojcickiMaysaa, the MFCC function expects a vector. Is the audio that you are loading have one or more channels? If more than one, just keep one channel. Kamil

maysaa lamari@Olessya Medvedeva plz did you find a solution to your problem, i got the same.

maysaa lamariHi Mr.Kamil

thanks for the suggestion when i use [speech, fs] = audioread(wav_file); i got:

Error using vec2frames (line 83)

usage: [ frames, indexes ] = vec2frames( vector, frame_length, frame_shift, direction,

window, padding );

Error in mfcc (line 151)

frames = vec2frames( speech, Nw, Ns, 'cols', window, false );

Error in Untitled11 (line 41)

[ MFCCs, FBEs, frames ] = ...

i hope that you can help me

Yiyang ZhouHi Kamil,

Your code works very well and thanks for your sharing!

I am a beginner in this and I would like to use your code for speech recognition. I was trying to use your MFCCs output in Gaussian component densities as suggested by (https://www.mathworks.com/company/newsletters/articles/developing-an-isolated-word-recognition-system-in-matlab.html). But it did not work...

Could you please help me on this?

Best,

Sam

Kamil Wojcickihi Maysaa,

The API for audioread is a little different than that of wavread - please refer to:

https://www.mathworks.com/help/matlab/ref/audioread.html

You'll want to use:

[speech, fs] = audioread(wav_file);

I hope that helps.

Kamil

maysaa lamarihi Mr.Kamil

First thank you for this file exchange.

the audio file format is WMA not WAV, when i use audioread instead of wavread i got:

Error using audioread

Too many output arguments.

Error in Untitled11 (line 30)

[ speech, fs, nbits ] = audioread( wav_file );

can you please help me ?

Kamil WojcickiHi Alan,

See example.m and mfcc.m.

You may have to change wavread/write to audioread/write as the former are no longer supported.

Best,

Kamil

Alan Lingmay i know how to run the code?

Vineel Pratap KonduruWell written. Thanks !

zohaibHow we can feature extract for multiple audio signals and plot it, anybody pls help me regarding this code

zohaib@Jie Xie

I need to to classify feature vectors we get from this code using svm, can you help me regarding this, i found the program related to this, but its not working https://github.com/mosamdabhi/Voice-Based-Digit-Recognition-Speech-Recognition-System-Machine-Learning-, pls help me regarding this

Jie Xie@SUNHIT KAKKAR

I fixed this problam by changing the function 'spec2cep.m'. Add '+eps' as follows: 'cep = dctm*log(spec+eps)'.

SUNHIT KAKKARThanks for the code. It runs properly for an audio file. However, when I tried to run it using a txt file , it doesn't work. Here's what I did:

speech=dlmread('Dolom1_test.txt'); % the text file contains a column matrix with 110 rows

fs = 0.6e+17; %sampling rate

Tw = 8.3; % analysis frame duration (E-17)

Ts = 1.6; % analysis frame shift (E-17)

alpha = 0.97; % pre-emphasis coefficient

R = [ 12e+13 15e+13 ]; % frequency range to consider

M = 20; % number of filterbank channels

N = 13; % number of cepstral coefficients i.e K

L = 22; % cepstral sine lifter parameter

[ CC, FBE, frames ] = mfcc( speech, fs, Tw, Ts, alpha, window, R, M, N, L );

The code gets executed but I get NaN values in my CC matrix. I have narrowed it down to a problem in the trifbank function. Actually the trifbank function is returning a null matrix.

Can anyone suggest how to approach this problem? The parameters can be changed, but I need the code to work.

Please help as soon as possible.

zohaibKamil, Thanks for your response, i have the code its ok now, please clarify only MFCC feature extraction is done in this code not training of feature vectors? Is it any tutorial of this code through which we can get an idea what is happening in the functions in this code.

If we replace audio file with other sound files like sound of plane this code will give accurate result.

Kamil WojcickiZohaib, it maybe that you have created a script called hamming.m that collides with MATLAB's inbuilt hamming function. Rename your script and you'll be good to go.

zohaibPlease help, when i run the file example.m i am getting these errors.

Attempt to execute SCRIPT hamming as a function:

C:\Users\hp\Documents\MATLAB\hamming.m

Error in vec2frames (line 164)

window = window( Nw );

Error in mfcc (line 151)

frames = vec2frames( speech, Nw, Ns, 'cols', window, false );

Error in example (line 34)

mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

Meriem M'sallemMohamed GamalDear All:

How Can I run the programm please, Can somebody send me the manual please ?

mohgamal1@yahoo.com

Meriem M'sallemHello, i have to extract MFCC coefficients for a project for school but when i tried the exemple.m file the output are the plots. how can i get the coefficients please. thank you for your help

Kamil WojcickiGlad you got it resolved Deep. Thanks for sharing the solution here!

DeepHi Kamil ! Thanks a lot for great support during resolution of this issue. Issue resolved now

It pertains to the AMD A8-7410 APU . Which is prone to crash during matrix multiplication. finally setting the environmental variables to " MKL_CBWR=AVX" and" MKL_DEBUG_CPU_TYPE=4" helped. Answer by Chestor Gillon.. Thanks a lot Mathworks !

Kamil WojcickiHi Deep. The whole MATLAB is crashing? Or, you get an error? What are the inputs N and M set to? When you type-in "dctm" by itself in MATLAB prompt, what do you get?

DeepHi. My MATLAB R 2014a is crashing every time i reach the code line " DCT = dctm( N, M );" in the given example.

some help please.

DeepThanks, Kamil for a quick and sure shot answer..

Kamil WojcickiFizza, you are getting 13 coefficients for every audio frame. To compute delta coefficients refer to the HTK book. If you don't want to implement this in MATLAB yourself, just export the MFCCs to HTK format and then use HTK to append delta and delta-delta coefficients.

Deep, does "help vec2frames" produce error for you? If so, ensure _all_ helper functions included in the submission are in MATLAB path.

Kamil WojcickiFizza, you are getting 13 coefficients for every audio frame. To compute delta coefficients refer to the HTK book. If you don't want to implement this in MATLAB yourself, just export the MFCCs to HTK format and then use HTK to append delta and delta-delta coefficients.

DeepHi , I am unable to run the code using Matlab R 2014A in my win 7 PC. I am getting error like function vec2frames is not defined for variable type double. and for type single. i tried both using audioread('filename', 'native') and audioread('filename', 'double') .

Anand MohanFizza Ghulam Nabidear Kamil

thank you for this file exchange...

i required only 13 coefficients for 100-1600 Hz freq.

now it is giving FBE 13*93, CC=256*93, frames 256*93

Ts = 10;

alpha = 0.97;

R = [ 100 1600 ];

M = 13;

C = 13;

L = 22;

what i should do?

secondly can you help to calculate delta and double delta?

Kamil WojcickiHi Wilmer, Have you looked at example.m? There the MFCCs are computed in this line:

% Feature extraction (feature vectors as columns)

[ MFCCs, FBEs, frames ] = mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

Remember to update wavread to audioread as described in the comment below.

HTH,

Kamil

wilmer sarangoi am lost , in what part i can obtain the mel coeficients, I supouse are 12 ? please help me

safa zighemsalvatore lupothanks, you have been really helpful!

Kamil WojcickiHi Salvatore,

If the audio file has multiple channels, then yes, you'll get a matrix after loading the file into MATLAB. For example:

>> [x,fs]=audioread('speech.wav');

>> size(x)

ans =

46417 2

This means that there are two channels of audio, each with 46417 samples. It really depends on your task what you do with it. For example, if you:

>> x=x(:);

size(x)

ans =

92834 1

then you essentially concatenating the samples for the two channels, i.e., samples from the second channel are appended after the last sample of the first channel. You can verify this by plotting the signal waveform and/or spectrogram. Now, if you pass this concatenated x vector through the mfcc() function it will extract the features as expected.

One alternative would be loop over each channel and pass one channel at the time to the mfcc() function to get only the features for that channel at a time.

HTH,

Kamil

salvatore lupoThanks for the answer! I have an another problem. I read that "speech" has to be a vector not a matrix, I compute it and I get a matrix,so I changed it in a vector with function speech1 = speech(:). Do you think that it is a correct transformation to obtain the mfcc conefficients?

Kamil WojcickiMATLAB has removed the wavread and wavwrite functions some releases ago (and I haven't got around to revising the FileExchange submissions). They replaced these with audioread and audiowrite functions, so you'll have to use those. Note that API is somewhat different (e.g., those do not return nbits as the third output).

So in the case you are pointing out, you would replace the wavread line with:

[ speech, fs ] = audioread( 'trail.wav' );

HTH,

Kamil

salvatore lupo[speech, fs, nbits ] = wavread( 'trial.wav' );

Undefined function or variable 'wavread'.

why?I install toolbox

Timothy KennedyJovan GalicJohn Erasmus Mari De CastroJovan GalicOnce again, thank you very much!

Kamil WojcickiHi Jovan,

Unfortunately there isn't.

One possibility would be to cite HTK documentation, as well as to provide the link to the implementation in a footnote, e.g.:

"The MFCC features were extracted according to [1] using MATLAB.^#"

# The MATLAB-based MFCC routines can be found at: http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab

[1] Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006. The HTK Book (HTK Version 3.4.1). Engineering Department, Cambridge University.

Alternatively, you could cite the original paper by Davis and Mermelstein (1980).

Unless you are doing something like assessing various implementations (e.g., Ganchev et al., 2005), I would say that the footnote part is optional given how standard this task is. I would just ensure to state the relevant settings used in your task in the methods section.

Best,

Kamil

Jovan GalicIs there some conference or article paper for citing this code in MATLAB ?

Regards,

Jovan

Jovan GalicKamil, thank you for clear explanation.

Regards,

Jovan

Lucas RKamil WojcickiHi Jovan,

>> And how the code would be if warping function is between mel and linear for example warp = 0.5 * hz?

Well, 0.5*hz is linear. You'll need a pair of nonlinear forward/backward warping functions instead if you want the filters to be non-uniformly spaced on the Hz scale.

>> Is it enough to change only mel2hz and hz2mel?

Yes. Simply assign function handles to these backward and forward warping functions, respectively.

See the documentation for the trifbank function. Use the example provided there to visualize the triangular filterbanks for the different warping functions you may want to try.

HTH,

Kamil

Jovan GalicAnd how the code would be if warping function is between mel and linear for example warp = 0.5 * hz?

Is it enough to change only mel2hz and hz2mel?

Regards.

Jovan GalicDear Kamil,

thank you very much for very helpful and quick reply!

Best regards.

Kamil WojcickiHi Jovan,

Modify mfcc.m as follows:

add:

lin = @(x)(x);

replace:

hz2mel = @( hz )( 1127*log(1+hz/700) ); % Hertz to mel warping function

mel2hz = @( mel )( 700*exp(mel/1127)-700 ); % mel to Hertz warping function

with:

hz2mel = lin;

mel2hz = lin;

HTH,

Kamil

Jovan GalicHello!

How and where modify the code to get LFCC feature vectors, where triangular filters are uniformaly distributed over linear (not mel) frequency scale?

In some applications lfcc show greater robustness.

Regards!

Kamil WojcickiHi Donal.

Yes, the zeroth coefficient is included in the output from mfcc.m. This function emulates HTK's MFCC_0 feature computation, which includes the zeroth coefficient (i.e., the _0 modifier).

Note, however, that for plotting purposes in the included example, the zeroth coefficient was discarded. See example.m, and specifically the following lines:

subplot( 313 );

imagesc( time_frames, [1:C], MFCCs(2:end,:) ); % HTK's TARGETKIND: MFCC

%imagesc( time_frames, [1:C+1], MFCCs ); % HTK's TARGETKIND: MFCC_0

HTH,

Kamil

Donal O SullivanHi Does this MFCC calculation include the c(0) (first MFCC) in the output?

Kamil WojcickiHi Mehmet, the triangular filterbank function implementation is based on information for a speech processing book, a reference to which is included in trifbank.m. HTH, Kamil

Mehmet Kazançhi, where is your article , this is stated in trifbank script.

I am trying to built mel filter bank

thanks

Kamil WojcickiHi Ankur,

Could you elaborate on what you mean?

If you are wondering how to load audio from a file and extract features using the mfcc function, take a look at example.m.

Note that for newer MATLAB releases you may want to replace wavread with audio read, i.e.,

[ x, fs ] = audioread( wav_file );

Hope that helps.

Ankur KalitaIs there any specification for the input audio file?

Viet Nguyen VanKamil WojcickiHi Yi,

In general you don't really have to do that. It is just that here we are trying to match the output of HTK feature extractor when it reads in audio data as 16-bit signed shorts. With this, you can compare directly the output features generated using this MATLAB routine with the corresponding features extracted with HTK (as demonstrated in compare.m).

Beyond that, i.e., if you are not comparing w/ HTK and just are looking to extract features for some task, you can drop this scaling.

HTH,

Kamil

yi wui am confused about the sentence

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

% Explode samples to the range of 16 bit shorts

if( max(abs(speech))<=1 ), speech = speech * 2^15; end;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

why speech need to multiply 2^15 before calculate STFT? Can someone kindly help to answer it? Thank you!

Kamil WojcickiYibo, the overlap used is as defined in:

Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, Upper Saddle River, NJ, USA (pp. 314-315).

Kamil WojcickiOlessya, what is the dimensionality of your input vector? i.e., what is size(speech)? It must be a vector and not a matrix.

Olessya Medvedevahi, i am trying to use your code but it gives me the usage error:

[ MFCCs, FBEs, frames ] = ...

mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

Error using vec2frames (line 83)

usage: [ frames, indexes ] = vec2frames( vector, frame_length, frame_shift, direction, window, padding );

Error in mfcc (line 151)

frames = vec2frames( speech, Nw, Ns, 'cols', window, false );

Do you have any idea what the problem might be? Thank you

Yibo YangQuick question Kamil: how can I tweak the trifbank code so that I can generate triangular filters with, say, 50% overlaps in the mel scale?

Thanks for your work!

Kamil WojcickiBrittany, are you using the provided example.m with sp10.wav, or your own audio files? If the audio file you are using happens to have long sections of zero only samples, that could explain NaN MFCC values. If that is the case, you could add some very low level noise to your audio samples, e.g.,

speech = speech + randn(size(speech))*1E-10;

Hope this helps.

Brittany DavisI get some NaN values in the MFCC variable. Why is that so?

clarissa yongDoes anyone know which file should I run to achieve the final outcome? please help,thanks!!

Adnan FarooqIn case of sequence of images.. how can we use MFCC? 1-> we convert each frame to 2D/3D to 1D vector. but i am confuse how can i use these parameters ?

"fs, Tw, Ts, alpha, window, R, M, N, L"

FJK