You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
After importing data from excel and plotting them, I need to subtract the noise and extract only the five peaks, and finally to find the area under the peaks? Can anyone tell me what function to use?
2 views (last 30 days)
Show older comments
I have attached a sample plot. The left plot is plotted in MATLAB and I need to convert this to the one similar to the plot in the right side by reducing the noise. I then need to find the area under the five peaks.
Accepted Answer
Ryan Takatsuka
on 20 Jul 2018
You can probably apply a highpass filter to the data to isolate the peaks. This should remove the low frequency/offset of the plot, while allowing the quickly changing peaks to pass through unchanged.
Alternatively, you can locate the peaks of the data with something like:
[pks, locs] = findpeaks(data);
Because the peaks seem to have a consistent width, you can divide the data into small "subsections" and plot each individual subsection.
To find the area under the curve, you can use a trapezoidal approximation using one of the following:
cumtrapz();
trapz();
19 Comments
Joswin Leslie
on 20 Jul 2018
data = xlsread('P540.xlsx','Line1-5','B1:C9001'); x = data(:,1); y = data(:,2); plot(x,y,'r')
I used this function to import data and plot in MATLAB. I tried your code and I'm getting an error.
Also, my peaks will not be always of consistent width. I need to locate where the peaks are, and find the area under these peaks.
Can you please tell me why the error occurs?
Ryan Takatsuka
on 20 Jul 2018
It's not super clean, but this seems to be a successful way at isolating the peaks in the data.
Essentially there are a few steps:
- Identify the trend in the data and subtract it out
- Find the peaks by detecting when the signal goes above a certain threshold. Assuming 'relatively' similar peak width, the peaks can be extracted. This should be an okay assumption anyway because the data is shifted to a y=0 mean anyway.
- numerically integrate the data
Hopefully this helps:
%%Import the data
data = xlsread('P540.xlsx', 'Line1-5', 'B1:C9001');
x = data(:,1);
y = data(:,2);
%%Get the trend
% Find the derivative of the signal
dy_dx = diff(y) ./ diff(x); % derivative
% append a value to the end of the vector so the lengths remain the same
dy_dx = [dy_dx; dy_dx(end)].^2;
% Find the points where the derivative is high (These are points where the signal is changing
% significantly, and should not be used to calculate the trend
ind = find(dy_dx>2e9);
% Get new x and y variables without the high derivative points
x_new = x;
y_new = y;
x_new(ind) = [];
y_new(ind) = [];
% Fit a polynomial to the data (This defines the trend)
p = polyfit(x_new, y_new, 2);
y2 = polyval(p, x);
y2 = y-y2;
%%Find the peak
MORE_PEAKS = true;
peak_end_ind = 1;
k = 1;
while MORE_PEAKS
peak_start_ind = find(y2(peak_end_ind:end)>500, 1) - 50 + peak_end_ind;
peak_end_ind = peak_start_ind + 400;
i_peak_start(k) = peak_start_ind;
i_peak_end(k) = peak_end_ind;
if isempty(find(y2(peak_end_ind:end)>500,1))
MORE_PEAKS = false;
end
k = k+1;
end
%%Create the clean version of the vectors
y_clean = zeros(size(y2));
for i=1:length(i_peak_start)
y_clean(i_peak_start(i):i_peak_end(i)) = y2(i_peak_start(i):i_peak_end(i)) - y2(i_peak_start(i));
end
%%Calculate integral
y_int = cumtrapz(x, y_clean);
fprintf('The total area under the peaks: %0.5f \n', y_int(end))
%%PLOTS
figure
hold on
plot(x, y)
title('Original Data')
grid on
figure
plot(x, y_clean)
title('Peak-Only Data')
grid on
figure
plot(x, y_int)
title('Integrated-peak Data')
grid on
Joswin Leslie
on 20 Jul 2018
Thank you! This code really helped me. Instead of finding the total area under the curves, how can I find the area under each peak?
Image Analyst
on 20 Jul 2018
Edited: Image Analyst
on 20 Jul 2018
If you have the Image Processing Toolbox it's trivial. Just sum the values of the array in each peak. Untested code:
binarySignal = y_clean > someSmallValue; % Threshold.
props = regionprops(binarySignal, y_clean, 'PixelValues');
for k = 1 : length(props)
peakAreas(k) = sum([props.PixelValues]);
end
By the way, I work daily with several spectroscopists doing this sort of signal and image processing.
Ryan Takatsuka
on 21 Jul 2018
Because the peaks have already been separated, you can find the area under each peak by looping through all the peak start values (i_peak_start)
for i=1:length(i_peak_start)
peak_areas(i) = trapz(x(i_peak_start(i):i_peak_end(i)), y_clean(i_peak_start(i):i_peak_end(i)));
end
This should create a vector, peak_areas that contains the area under each peak.
Joswin Leslie
on 23 Jul 2018
Thanks Ryan. The above code really helped me to find the area of five peaks. I have one more problem.
I have attached the excel sheet. I was able to specify the number of rows in the code. This specified number of rows depend on the value of "0" in column A, and will not always be constant. Similarly, I need to plot 9 more graphs for the values in column A ranging from 1 to 9. I need to find the area of all the peaks in all the 10 plots.
In each of these 10 plots, there will be five peaks. I also need to find the average of the 1st peak, 2nd peak.....till the 5th peak.
Ryan Takatsuka
on 23 Jul 2018
You can import the entire Excel spreadsheet, and then split the data up based on the value in column 1. The following should replace the first 4 lines of the original example code I provided. Essentially, it puts all of the data in a cell array, with each cell equal to one graph. By changing the value in graph_number different parts of the file will be plotted and analyzed. To fully automate this, the entire code can be placed in a for loop that iterates through each cell in the data variable.
%%Import the data
raw_data = xlsread('P540.xlsx', 'Line1-5');
% Find the row number where the value in column 1 changes
for i=1:9
new_graph_index(i+1) = find(raw_data(:,1)==i,1);
end
% manually add the row number for the begining and end of the data
new_graph_index(1) = 1;
new_graph_index(11) = length(raw_data);
% Split the raw data into a cell array for each graph
for k = 1:length(new_graph_index)-1
data{k} = raw_data(new_graph_index(k):new_graph_index(k+1)-1,2:3);
end
% Select which graph to analyze
graph_number = 5;
% Set the x and y variables from the cell array
x = data{graph_number}(:,1);
y = data{graph_number}(:,2);
Joswin Leslie
on 24 Jul 2018
Thanks. But I don't necessary need to plot the graphs (plotting is also fine). I need to find the area of all the peaks from all graphs and automate this. Then I need to find the average area of the peaks. For example, in this plot there are 5 peaks in each graph. I need to find the average area of the first peak from all the 10 plots. Similarly for the other 4 peaks also.
Joswin Leslie
on 9 Aug 2018
Hey Ryan! I was trying this code for a different file. I have attached the excel file. I am getting an error in the line where "data{k} = raw_data(new_graph_index(k):new_graph_index(k+1)-1,2:3);".
Can you please tell me why this error occurs? I have attached my complete code.
Ryan Takatsuka
on 9 Aug 2018
This is because it is trying to access the index "0" in the vector new_graph_index. Change the way the last point in the variable is calculated (lines 8-14) using something like this:
%Find the row number where the value in column 1 changes
for i=1:4
new_graph_index(i+1) = find(raw_data(:,1)==i,1);
end
%manually add row number for the beginning and end of the data
new_graph_index(1) = 1;
new_graph_index(end+1) = length(raw_data);
This solves the index problem, but there seems to be a problem with the data itself. For example, look at line 958 in the excel file. It looks like this data point is 10000X bigger than the rest of the data, causing the script to fail when analyzing it.
Joswin Leslie
on 9 Aug 2018
I'm sorry. I believe I attached the wrong excel file. I am attaching the new file now. I'm getting the following error:
Warning: Polynomial is badly conditioned. Add points with distinct X values, reduce the degree of the polynomial, or try centering and scaling as described in HELP POLYFIT. > In polyfit (line 79) In samples (line 45) Array indices must be positive integers or logical values.
Error in samples (line 70) y_clean(i_peak_start(i):i_peak_end(i)) = y2(i_peak_start(i):i_peak_end(i)) - y2(i_peak_start(i));
Can you please help me with this?
Ryan Takatsuka
on 10 Aug 2018
In the distance column, around lines 2597-2738, the numbers get extremely large.
Joswin Leslie
on 10 Aug 2018
They will. Is there any way possible to process it with these large numbers?
Ryan Takatsuka
on 10 Aug 2018
It'll be difficult to get any useful area information without a clean height vs. distance plot (it's hard to find the area under the curve without a reasonable looking curve).
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)