How can I restrict data arrays to do a linear regression between 2 points?
4 views (last 30 days)
Show older comments
Lexie Wilson
on 19 Jul 2016
Commented: Star Strider
on 20 Jul 2016
I would like to do a linear regression using polyfit, but only on part of the dataset. I have 2 arrays, Wavelength (x axis) and Flux (y axis). I would like to regress the data in the range of Wavelength >1515 & Wavelength < 1750, and then find the slope of the trend line that unites the fluxes (y values) in this range. I do to know how to restrict my data set in this way (without importing the data again!). I tried scaling my axes, but the polyfit function still considered all values in my dataset.
Here is what I have so far:
if true
% code
%%Initialize variables.
filename = '/Users/lexiwilson/Documents/SURF/DataIrradiance/DEC/WSD_26DEC/WAIS1226201500166.asd.irr.pco.txt';
delimiter = {'\t',' '};
startRow = 39;
datetime = strcat('/Users/lexiwilson/Documents/SURF/DataIrradiance/DEC/WSD_26DEC/','122615_','00:33:56');
%%Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%[^\n\r]';
%%Open the text file.
fileID = fopen(filename,'r');
%%Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false);
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'ReturnOnError', false);
%%Close the text file.
%%Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = [dataArray{:,1:end-1}];
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts strings in the input cell array to numbers. Replaced non-numeric
% strings with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1);
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(thousandsRegExp, ',', 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
% Convert numeric strings to numbers.
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
catch me
%%Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%%Allocate imported array to column variable names
Wavelength = cell2mat(raw(:, 1));
Flux = cell2mat(raw(:, 2));
%%Plot wavelength vs irradiance
plot(Wavelength, Flux);
xlabel('Wavelength (nm)');
ylabel('Irradiance (W/m^2)');
%zoom to 1.6 micron window
plot(Wavelength, Flux);
xlabel('Wavelength (nm)');
ylabel('Irradiance (W/m^2)');
Ystartindx = find(Wavelength == 1515); %index of wavelength = 1515nm
Ystart = Flux(Ystartindx); %corresponding flux
Yendindx = find(Wavelength == 1750); %index of wavelength = 1750nm
Yend = Flux(Yendindx);%corresponding flux
hold on;
%make linear fit and print slope to console
waverange = find(Wavelength > 1515 & Wavelength < 1750);
fluxrange = find(Flux > Ystart & Flux < Yend);
P = polyfit(waverange,fluxrange,1);
fit = P(1)*waverange + P(2);
disp(P(1)); %print slope to console
%save plot in directory as jpeg
%%Clear temporary variables
clearvars filename delimiter startRow formatSpec fileID dataArray ans raw numericData col rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;
There errors I get claim that my arrays waverange & fluxrange are not the same size (which, they aren't). How can I make them the same size, and restrict the X & Y values to a range in the middle of my data set?
Accepted Answer
Star Strider
on 20 Jul 2016
The waverange seems to be defining your data range, so use it for both, and use polyval to evaluate the fit:
%make linear fit and print slope to console
waverange = find(Wavelength > 1515 & Wavelength < 1750);
P = polyfit(Wavelength(waverange),Flux(waverange),1);
fit = polyval(p, Wavelength(waverange));
See if that works.
More Answers (0)
See Also
Find more on Data Distribution Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!