Sir, I tried for the feature extraction of a speech using mel frequency cepstral coefficient (mfcc) but the code now showing error. I dont know how to rectify this error. So, Sir can you please help me to rectify this error.

2 views (last 30 days)
The code is given below,
[audio, fs1] = audioread('cryrumble.wav');
%sound(x,fs1);
ts1=1/fs1;
N1=length(audio);
Tmax1=(N1-1)*ts1;
t1=(0:ts1:Tmax1);
figure;
plot(t1,audio),xlabel('Time'),title('Original audio');
% fs2 = (20/441)*fs1;
% y=resample(audio,2000,44100);
% %sound(y,fs2);
% ts2=1/fs2;
% N2=length(y);
% Tmax2=(N2-1)*ts2;
% t2=(0:ts2:Tmax2);
% figure;
% plot(t2,y),xlabel('Time'),title('resampled audio');
%Step 1: Pre-Emphasis
a=[1];
b=[1 -0.95];
z=filter(b,a,audio);
subplot(413),plot(t1,z),xlabel('Time'),title('Signal After High Pass Filter - Time Domain');
subplot(414),plot(fs1,fftshift(abs(fft(z)))),xlabel('Freq (Hz)'),title('Signal After High Pass Filter - Frequency Spectrum');
nchan = size(audio,2);
for chan = 1 : nchan
%subplot(1, nchan, chan)
spectrogram(y(:,chan), 256, [], 25, 2000, 'yaxis');
title( sprintf('spectrogram of resampled audio ' ) );
end
% Step 2: Frame Blocking
frameSize=1000;
% frameOverlap=128;
% frames=enframe(y,frameSize,frameOverlap);
% NumFrames=size(frames,1);
frame_duration=0.03;
frame_len = frame_duration*fs1;
framestep=0.01;
framestep_len=framestep*fs1;
% N = length (x);
num_frames =floor(N2/frame_len);
% new_sig =zeros(N,1);
% count=0;
% frame1 =x(1:frame_len);
% frame2 =x(frame_len+1:frame_len*2);
% frame3 =x(frame_len*2+1:frame_len*3);
frames=[];
for j=1:num_frames
frame=z((j-1)*framestep_len + 1: ((j-1)*framestep_len)+frame_len);
% frame=x((j-1)*frame_len +1 :frame_len*j);
% identify the silence by finding frames with max amplitude less than
% 0.025
max_val=max(frame);
if (max_val>0.025)
% count = count+1;
% new_sig((count-1)*frame_len+1:frame_len*count)=frames;
frames=[frames;frame];
end
end
% Step 3: Hamming Windowing
NumFrames=size(frames,1);
hamm=hamming(1000)';
windowed = bsxfun(@times, frames, hamm);
% Step 4: FFT
% Taking only the positive values in the FFT that is the first half of the frame after being computed.
ft = abs( fft(windowed,500, 2) );
plot(ft);
% Step 5: Mel Filterbanks
Lower_Frequency = 100;
Upper_Frequency = fs1/2;
% With a total of 22 points we can create 20 filters.
Nofilters=20;
lowhigh=[300 fs/2];
%Here logarithm is of base 'e'
lh_mel=1125*(log(1+lowhigh/700));
mel=linspace(lh_mel(1),lh_mel(2),Nofilters+2);
figure;
plot(mel);
xlabel('frequency in Hertz');ylabel('mels');
title('melscale');
melinhz=700*(exp(mel/1125)-1);
%Converting to frequency resolution
fres=floor(((frameSize)+1)*melinhz/fs2);
%Creating the filters
for m =2:length(mel)-1
for k=1:frameSize/2
if k<fres(m-1)
H(m-1,k) = 0;
elseif (k>=fres(m-1)&&k<=fres(m))
H(m-1,k)= (k-fres(m-1))/(fres(m)-fres(m-1));
elseif (k>=fres(m)&&k<=fres(m+1))
H(m-1,k)= (fres(m+1)-k)/(fres(m+1)-fres(m));
elseif k>fres(m+1)
H(m-1,k) = 0;
end
end
end
%H contains the 20 filterbanks, we now apply it to the processed signal.
for i=1:NumFrames
for j=1:Nofilters
bankans(i,j)=sum((ft(i,:).*H(j,:)).^2);
end
end
figure;
plot(bankans(i,j));
figure;
plot(H);
xlabel('Frequency');ylabel('Magnitude');
title('Mel-Frequency Filter bank');
% Step 6: Nautral Log and DCT
% pkg load signal
%Here logarithm is of base '10'
logged=log10(bankans);
for i=1:NumFrames
mfcc(i,:)=dct2(logged(i,:));
end
%plotting the MFCC
figure
hold on
for i=1:NumFrames
plot(mfcc(i,1:13));
title('mfcc');
end
hold off
% save c5 mfcc
i= mfcc;
save i i
load i.mat
X=i;
k=1;
[IDXi,ci] = kmeans(X,k);
save c41i ci
The error is showing like this:
>> mfccfinal
Error using bsxfun
Non-singleton dimensions of the two input arrays must match each other.
Error in mfccfinal (line 70)
windowed = bsxfun(@times, frames, hamm);

Accepted Answer

Walter Roberson
Walter Roberson on 1 Feb 2019
Edited: Walter Roberson on 2 Feb 2019
You failed to set the hamming window size to either the frame size or the number of frames .
Also your frames variable is probably a column vector. you construct frame by indexing a column vector with a row vector. When you index a vector with a vector the result has the same orientation as the vector being indexed which is column vector in this case. Therefore frame is a column vector and you vertcat those together which gives you a column vector result .
I recommend that you use buffer() instead of breaking up the array yourself .
  4 Comments
Romody Momoto Sogavo
Romody Momoto Sogavo on 7 Jun 2020
Sir, regarding this code I tried running different audio.WAV files ( less then 5 seconds long each) but frames came up empty. As can be seen from my workspace "frame [ ]" but hamm = 1x1000 double. I'm thinking the problem is with frames not the hamm. if so what should i do to rectify this?

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!