Find peaks from noisy data

I am trying to find the method of locating the gaussian peaks.
I used findpeaks function to find peaks. The solid line is created by sgolayfilt.
hgcs = sgolayfilt(hgc, 10, 41);
findpeaks(hgc, 'MinPeakDistance', 20)
Question1: Are there any functions or algorithms that can determine the number of peaks and the locations?
Question2: After 160, there are small features. Some of them may be still peaks following the first four peaks. At least, we know that the locations of peaks are quasi-periodic. Could we also find more peaks?

 Accepted Answer

I gave up on estimating the width parameter, and just went for the amplitude and location with a uniform width.
Try this:
D = load('hgc.mat');
hgc = D.hgc;
x = linspace(0, numel(hgc), numel(hgc));
hgcs = sgolayfilt(hgc, 10, 41);
figure
plot(x,hgc)
hold on
plot(x,hgcs)
hold off
grid
[pks,locs,w] = findpeaks(hgc, 'MinPeakDistance', 20, 'MinPeakProminence',5);
gausfcn = @(b,x) b(1).*exp(-(x-b(2)).^2 * 0.05);
for k = 1:numel(locs)
B(:,k) = lsqcurvefit(gausfcn, [pks(k); x(locs(k))], x, hgcs);
end
figure
plot(x,hgc)
hold on
for k = 1:numel(locs)
plot(x, gausfcn(B(:,k),x), 'LineWidth',2)
end
hold off
grid
.

8 Comments

An even beter filtering approach would be to use a lowpass filter:
D = load('hgc.mat');
hgc = D.hgc;
x = linspace(0, numel(hgc), numel(hgc));
L = numel(hgc);
Fs = 1;
Fn = Fs/2;
FThgc = fft(hgc - mean(hgc))/L;
Fv = linspace(0, 1, fix(L/2)+1)*Fn;
Iv = 1:numel(Fv);
figure
plot(Fv, abs(FThgc(Iv))*2)
grid
title('Fourier Transform')
xlabel('Frequency')
ylabel('Amplitude')
xlim([0 0.1])
hgcs = lowpass(hgc, 0.055, Fs);
figure
plot(x,hgc)
hold on
plot(x,hgcs, 'LineWidth',1.5)
hold off
grid
[pks,locs,w] = findpeaks(hgcs, 'MinPeakDistance', 20, 'MinPeakProminence',1.5);
gausfcn = @(b,x) b(1).*exp(-(x-b(2)).^2 * 0.05);
for k = 1:numel(locs)
B(:,k) = lsqcurvefit(gausfcn, [pks(k); x(locs(k))], x, hgcs);
end
figure
plot(x,hgc)
hold on
for k = 1:numel(locs)
plot(x, gausfcn(B(:,k),x), 'LineWidth',2)
end
hold off
grid
I chose the cutoff frequency for the filter on the basis of the Fourier transform. (Subtracting the mean first makes the other peaks more easily visible.) I still have no way of fitting the widths correctly, since even the filtered the signal is a bit difficult to work with.
EDIT — (16 Feb 2021 at 15:38)
A slightly different approach:
D = load('hgc.mat');
hgc = D.hgc;
x = linspace(0, numel(hgc), numel(hgc));
L = numel(hgc);
Fs = 1;
Fn = Fs/2;
FThgc = fft(hgc - mean(hgc))/L;
Fv = linspace(0, 1, fix(L/2)+1)*Fn;
Iv = 1:numel(Fv);
figure
plot(Fv, abs(FThgc(Iv))*2)
grid
title('Fourier Transform')
xlabel('Frequency')
ylabel('Amplitude')
xlim([0 0.1])
hgcs = lowpass(hgc, 0.055, Fs);
figure
plot(x,hgc)
hold on
plot(x,hgcs, 'LineWidth',1.5)
hold off
grid
[pks,locs,w] = findpeaks(hgcs, 'MinPeakDistance', 20, 'MinPeakProminence',1.5);
gausfcn = @(b,x) b(1).*exp(-(x-b(2)).^2 * b(3));
hf = figure;
nl = numel(locs);
frm = 15;
for k = 1:nl
subplot(nl,1,k)
idxrng = max(1,locs(k)-frm):min(numel(hgcs),locs(k)+frm);
B(:,k) = lsqcurvefit(gausfcn, [hgcs(k); x(locs(k)); 5E-3], x(idxrng), hgcs(idxrng)-min(hgcs(idxrng)),zeros(1,3));
plot(x(idxrng), hgcs(idxrng))
hold on
plot(x(idxrng), gausfcn(B(:,k),x(idxrng))+min(hgcs(idxrng)), 'LineWidth',2)
hold off
grid
GausAUC(k) = trapz(x(idxrng),gausfcn(B(:,k),x(idxrng)));
title(sprintf('AUC = %.3f',GausAUC(k)))
text(B(2,k), min(hgcs(idxrng)), sprintf('$y = %5.2f e^{\\frac{-(x - %6.2f)^2}{%5.2f}}$', B(1:2,k),1/B(3,k)), 'HorizontalAlignment','center', 'VerticalAlignment','bottom', 'Interpreter','latex', 'FontSize',14)
end
outpos = hf.OuterPosition;
hf.OuterPosition = outpos + [0 -800 0 800];
That has the advantage of being able to fit every peak, including the widths. It also computes the peak area for the index range (‘idxrng’) for each peak as plotted in the last figure.
That’s easy enough to do.
Add this assignment:
TFidx = find(islocalmin(abs(FThgc(Iv))*2, 'FlatSelection','first'));
and change the lowpass call to:
hgcs = lowpass(hgc, Fv(TFidx(1)), Fs);
so the full code is now:
D = load('hgc.mat');
hgc = D.hgc;
x = linspace(0, numel(hgc), numel(hgc));
L = numel(hgc);
Fs = 1;
Fn = Fs/2;
FThgc = fft(hgc - mean(hgc))/L;
Fv = linspace(0, 1, fix(L/2)+1)*Fn;
Iv = 1:numel(Fv);
TFidx = find(islocalmin(abs(FThgc(Iv))*2, 'FlatSelection','first'));
figure
plot(Fv, abs(FThgc(Iv))*2)
hold on
plot(Fv(TFidx), abs(FThgc(Iv(TFidx)))*2, 'p')
hold off
grid
title('Fourier Transform')
xlabel('Frequency')
ylabel('Amplitude')
xlim([0 0.1])
hgcs = lowpass(hgc, Fv(TFidx(1)), Fs);
figure
plot(x,hgc)
hold on
plot(x,hgcs, 'LineWidth',1.5)
hold off
grid
[pks,locs,w] = findpeaks(hgcs, 'MinPeakDistance', 20, 'MinPeakProminence',1.5);
gausfcn = @(b,x) b(1).*exp(-(x-b(2)).^2 * b(3));
hf = figure;
nl = numel(locs);
frm = 15;
for k = 1:nl
subplot(nl,1,k)
idxrng = max(1,locs(k)-frm):min(numel(hgcs),locs(k)+frm);
B(:,k) = lsqcurvefit(gausfcn, [hgcs(k); x(locs(k)); 5E-3], x(idxrng), hgcs(idxrng)-min(hgcs(idxrng)),zeros(1,3));
plot(x(idxrng), hgcs(idxrng))
hold on
plot(x(idxrng), gausfcn(B(:,k),x(idxrng))+min(hgcs(idxrng)), 'LineWidth',2)
hold off
grid
GausAUC(k) = trapz(x(idxrng),gausfcn(B(:,k),x(idxrng)));
title(sprintf('AUC = %.3f',GausAUC(k)))
text(B(2,k), min(hgcs(idxrng)), sprintf('$y = %5.2f e^{\\frac{-(x - %6.2f)^2}{%5.2f}}$', B(1:2,k),1/B(3,k)), 'HorizontalAlignment','center', 'VerticalAlignment','bottom', 'Interpreter','latex', 'FontSize',14)
end
outpos = hf.OuterPosition;
hf.OuterPosition = outpos + [0 -800 0 800];
This has somewhat improved results.
Note that this is highly dependent on the Fourier transform result. It may not work as desired for every data set, if others are significantly different from this one.
Thank you!
Reset the 'MinPeakProminence' value to get additional peaks:
[pks,locs,w] = findpeaks(hgcs, 'MinPeakDistance', 20, 'MinPeakProminence',0.15);
and change the frame value to get a different fit window:
frm = 10;
These are both necessary for the second data set.
Experiment with them to get different results.
This approach appears to work reasonably well.
With respect to findpeaks, one of the outputs can be prominence, so perhaps using sort (with two outputs, the second being the index into the original vector, and the 'descend' option) could be appropriate with respect to identifying a specifc number of peaks and their prominences.
With respect to filtering, using the first minimum of the Fourier transform result would likely be appropriate, as well.
Another possibility would be to create a function from my code, then return whatever you want from it. At least one argument would be the file name.
Beyond that, I have no specific suggestions at present.
That is not what it is designed to do. It is intended at this point to remove noise.
Filters assume periodic data, since they are based on the discrete Fourier transform. To remove specific peaks, it would likely be necessary to use a second filter with specific characteristics. It would likely not simply remove that peak, and would probably remove others as well.
This could be where the ‘sorting-by-prominence’ approach I described could be beneficial, That smaller peak would have a significantly lower prominence that the peaks on either side of it, and its index would likely not be returned in the output of the sorting step. (Note that this would be indexing into the ‘locs’ vector, so it would be a somewhat more complicated process.)
If you were to do something like this:
[pks,locs,w,prom] = findpeaks(hgcs, 'MinPeakDistance', 20, 'MinPeakProminence',0.15);
[sprom,idx] = sort(prom, 'descend');
the peak at 81 wouild be the fourth peak returned in the ‘idx’ vector. That might still not do what you want, however it would be a start on devising an appropriate approach.
I have no idea what that represents.
In my code, if you want to enlarge ‘idxrng’ to include whatever values you want, and you know approximately where the peaks should be, use those peak indices, and change ‘frm’ (the framelength for the parameter estimation and to fit the Gaussian function) to be whatever you want it to be in order to include the appropriate values.
photoon
photoon on 17 Feb 2021
Edited: photoon on 25 Aug 2021
For the fitting, this was closer to what I was thinking about.
nl = numel(locs);
gnl = strcat('gauss', num2str(nl));
gaoptions = fitoptions(gnl);
gaoptions.Upper = [];
Thanks,
I don’t have the Curve Fitting Toolbox (I don’t need it for what I do), so I can’t run that code snippet, and I can’t help you with it. (However I essentially understand what it’s doing.)
The only other suggestion I have with respect to identifying the peaks is to detrend the filtered signal. That would perform an additional sort of highpass filtering on the filtered signal, eliminating the baseline variations, and could make it easier to isolate the peaks you want.

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 16 Feb 2021
I'm attaching a demo where you can specify how many Gaussians are in your signal and it will find them. If you don't know how many there are then you can maybe specify more and take the 4 with the largest area under the curve. Adap the demo to use 4 instead of 6, and to read in your actual data instead of using made up data.

5 Comments

Hi Image Analyst,
smoothdata(hgc, 'movmean') closely shows the number of gaussian peaks in terms of my eyes. Are there any functions that can determine the number and locations?
If you have nice peaks already, then you can use findpeaks() in the Signal Processing Toolbox. It's kind of complicated but with the right selection of parameters you should be able to home in on the peaks you want.
You might want to use sgolayfilt() instead of movmean(). Use a polynomial order of 2 then try narrowing and widening the framelength. Wider gives more smoothing and narrower will let it follow your data more closely. The Saivtzky-Golay filter is like movmean() but it fits the data to a sliding polynomial instead of a line (which is what movmean does by taking the average). This will keep the peaks sharp and not blunt their tops or bottoms.
I tried sgolayfilt and movmean. But it turns out that lowpass combined with FFT works best for identifying the peaks of interest.
If using a linear filter like movmean() or fft() followed by a low pass filter, works better than a non-linear filter like sgolayfilt() that's fine (though I'm surprised).
But you still need to find the locations, regardless of what method you use to smooth the signal. So you can either use findpeaks(), or my Gaussian fitting code. findpeaks() will give you every peak and you don't need to know how many there are in advance, while the Guassian fitting code requires you to specify how many Gaussians your signal should be fit to.
findpeaks() is what I am using. However, I have to change parameters for each data set in order to choose the peaks of interest. Are there better ways of making this autonomous?

Sign in to comment.

Asked:

on 15 Feb 2021

Edited:

on 25 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!