Fit of a function with a weighted sum
12 views (last 30 days)
Show older comments
Hi, I'm trying to fit an ACF of data (that you can find in the attached document) with this weighted sum where the coefficients are . I implemented a while cicle,to consider each time in the sum the previous fit coefficient, but is not working. Also I noticed that the intial values in the fit have a relevant role, so I tried to use the coefficients of an exponential fit as starting point in each trial.
I tried also to implement a chi-squared test to stop the fit.
load('wind_value_restricted.mat');
wind_speed=new_magnitude;
maxLag=100;
acf=autocorr(training_ws,'NumLag', maxLag);
lag=(0:maxLag)';
%evaluate the start point with an exponential fit
exp0=fit(lag,acf,'exp1');
coeff0=[coeffvalues(exp0),0];
y=0;
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
i=1;
% residual=1;
max_iterations=5;
threshold=1e-2;
iter=1;
h=1;
coeff=[0,0,0];
while h==1 && iter <= max_iterations
ft=fittype(@(w,a,b,x)w*exp(-a*x).*cos(b*x)+coeff(1)*exp(-coeff(2)*x).*cos(coeff(3)));
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = coeff0;
f=fit(lag,acf,ft,opts);
coeff_component(i,:)=coeffvalues(f);
% coeff0=coeffvalues(f);
coeff=sum(coeff_component,1);
y=feval(f,lag);
[h, p, stats] = chi2gof(y, 'expected', acf);
i=i+1;
iter=iter+1;
end
figure
plot(lag,feval(f,lag),lag,acf)
1 Comment
Mathieu NOE
on 20 Dec 2023
hello
I am not sure to help you here but a few comments / suggestions
your data are not zero mean , so I was first looking at what is the mean signal here and I ended up founding a exponential decaying (with pulsation = 0 ) signal
so that should be the first term of your serie :
C = 2.27e+00 * e^{-2.218e-05*t}
to get to this result I needed to remove the first 7000 samples , otherwise the fit is quite wrong.
NB you have not supplied any time interval info / sampling rate so I assumed dt = 1 (unit ??)
once we have retrieved that first term,the residual signal can be analysed with fft to pick the dominant remaining frequencies. Those values could be used as IC for a (more robust) fit (I dont have the curve fitting toolbox, so that portion of work I leave it to you)
code :
load('wind_value_restricted.mat');
wind_speed=new_magnitude(6e3:end);
wind_speedS = smoothdata(wind_speed,'gaussian',1e4); % smooth heavily the data to get the DC trend (background)
% exp fit of wind_speedS (DC trend (background))
% model : y = b*exp(m*x);
t = (0:numel(wind_speed)-1)';
P = polyfit(t, log(wind_speed), 1);
m = P(1);
b = exp(P(2));
yfit = b*exp(t*m);
figure(1)
plot(t,wind_speed,t,wind_speedS, t, yfit, '--k')
TE = sprintf('C = %0.2e * e^{%0.3e*t}',b, m);
tt = text(t(round(0.75*end)), yfit(round(0.75*end))*5, TE);
set (tt,'FontSize',15)
legend('raw data','smoothed ','fit');
figure(2)
wind_speed2 = wind_speed-wind_speedS; % remove exponential DC term background
plot(t,wind_speed2)
% fft of the wind_speed2 signal
L = length(wind_speed2); % Length of signal
NFFT = 2^nextpow2(L); % Next power of 2 from length of y
Spectrum = fft(wind_speed2, NFFT)/L;
fs = 1; % as we have no time spacing value
f = fs/2*linspace(0,1,NFFT/2+1);
Spectrum = Spectrum(1:NFFT/2+1);
figure(3);
plot(f, abs(Spectrum));
title('FFT of wind speed2')
xlabel('Frequency (Hz)')
ylabel('Amplitude')
hold on
xlim([0 0.005*fs]);
ylim([0 max(abs(Spectrum))*1.5]);
% add findpeaks to display dominant frequencies
NP = 5;
[PKS,LOCS] = findpeaks(abs(Spectrum),f,'SortStr','descend','NPeaks',NP);
plot(LOCS,PKS,'dr');
for ck = 1:NP
ht = text(LOCS(ck),PKS(ck)*1.1,[sprintf('%.2e',LOCS(ck)) 'Hz']);
set (ht,'Rotation',90)
end
hold off
Answers (2)
arushi
on 27 Dec 2023
Hi Camilla,
I understand that you want to fit an ACF of data and also implement a chi-squared test to stop the fit. To fit an autocorrelation function (ACF) in MATLAB, you can use the built-in autocorr function to calculate the ACF of your data and then use the fit function from the Curve Fitting Toolbox to fit a model to the ACF data. Here are a few suggestions for the code you provided:
- Use of chi2gof: Assuming the data to be continuous in nature, the chi2gof function is meant for categorical data, not continuous data like an ACF. This function is not appropriate for the type of data you are working with.
- The fittype function inside the while loop references coeff, which is supposed to be updated in each iteration, but the way it's written, it will always use the initial value of coeff.Please make sure the values are getting updated correctly after each iteration.Also,the coeff array is initialized with zeros, which may not be a suitable starting point for the fit.
Hope this helps.
Thank you
0 Comments
Star Strider
on 31 Dec 2023
@Camilla Bandinelli — Responding to your email, first thank you for clarifying my concerns, especially with respect to the ‘training_ws’ vector. Regarding the results in the paper (that I have not read because I do not know what it is) you are attempting to reproduce, it appears that they are essentially doing something similar to the genetic algorithm, however using the goodness-of-fit to determine the fitness of a particular set of parameter estimates to the data actually determines if it is normally-distributed, not that those parameter estimates fit the data. (You can code a test to do that, however the one provided as chi2gof does not do that, at least as I read the documentation.) You can certainly use a statistical test to determine the best fir to the data, however I fail to understand the reasoning behind it. The ‘typical‘ fitness metric for a genetic algorithm (the Global Optimization Toolbox ga function) is the norm, essentially the square root of the sum of squares in this application of it. I doubt that any statistical test could improve on it with respect to determining the fittest individual. Although it would be relatively straightforward to use a statistical test as a fitness metric, it would likely slow the algorithm down considerably. So I am using the norm here.
I ran the ga code 40 times in a loop here (that limit is imposed by the 55-second time limit here, so you can expand it if you wish), and eventually came up with the fittest individual of those runs.
Try this —
load('wind_value_restricted.mat');
% whos
wind_speed=new_magnitude;
training_ws = wind_speed; % Detail Supplied By Camilla Bandinelli
maxLag=100;
acf=autocorr(training_ws,'NumLag', maxLag);
lag=(0:maxLag)';
%evaluate the start point with an exponential fit
exp0=fit(lag,acf,'exp1')
coeff0=[coeffvalues(exp0),0];
y=0;
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
i=1;
% residual=1;
max_iterations=5;
threshold=1e-2;
iter=1;
h=1;
coeff=[0,0,0];
while h==1 && iter <= max_iterations
ft=fittype(@(w,a,b,x)w*exp(-a*x).*cos(b*x)+coeff(1)*exp(-coeff(2)*x).*cos(coeff(3)));
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = coeff0;
f=fit(lag,acf,ft,opts)
coeff_component(i,:)=coeffvalues(f);
% coeff0=coeffvalues(f);
coeff=sum(coeff_component,1);
y=feval(f,lag);
[h, p, stats] = chi2gof(y, 'expected', acf);
i=i+1;
iter=iter+1;
end
figure
plot(lag,feval(f,lag),lag,acf)
% ---------- GENETIC ALGORITHM SOLUTIONS ----------
objfcn = @(w,a,b,x)w*exp(-a*x).*cos(b*x)+coeff(1)*exp(-coeff(2)*x).*cos(coeff(3));
ftns = @(b) norm(acf - objfcn(b(1),b(2),b(3),lag));
PopSz = 500;
Parms = 3;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
tic
for kga = 1:40
t0 = clock;
fprintf('\n----------\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
GAdata(kga,:) = [fval theta(:).'];
ParamChr = ["w" "a" "b"];
fprintf(1,'\tParameters:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\t%s = %8.5f\n', ParamChr(k1), theta(k1))
end
end
toc
format long
GAdata
[MaxFtns,idx] = min(GAdata(:,1)) % Best Fitness & Index
theta = GAdata(idx, 2:end) % Paramters: 'w', 'a', 'b'
t = lag;
c = acf;
tv = linspace(min(t), max(t));
Cfit = objfcn(theta(1),theta(2),theta(3), tv);
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
This is the best I can do with these data. The fittest individual (as defined by the norm of the rtesiduals) has a fitness value of about 2.909804496722381.
.
2 Comments
Star Strider
on 2 Jan 2024
All the parameters are adjusted individually, each on its own trajectory. I initially did not pick up on the fact that my code did not actually estimate all the necessary parameters. The objective function I wrote to use with ga needed to be changed to incorporate (and estimate) the ‘coeff’ parameters as well.
Then it works —
load('wind_value_restricted.mat');
% whos
wind_speed=new_magnitude;
training_ws = wind_speed; % Detail Supplied By Camilla Bandinelli
maxLag=100;
acf=autocorr(training_ws,'NumLag', maxLag);
lag=(0:maxLag)';
%evaluate the start point with an exponential fit
exp0=fit(lag,acf,'exp1')
coeff0=[coeffvalues(exp0),0];
y=0;
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
i=1;
% residual=1;
max_iterations=5;
threshold=1e-2;
iter=1;
h=1;
coeff=[0,0,0];
% while h==1 && iter <= max_iterations
% ft=fittype(@(w,a,b,x)w*exp(-a*x).*cos(b*x)+coeff(1)*exp(-coeff(2)*x).*cos(coeff(3)));
% opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
% opts.Display = 'Off';
% opts.StartPoint = coeff0;
% f=fit(lag,acf,ft,opts)
% coeff_component(i,:)=coeffvalues(f);
% % coeff0=coeffvalues(f);
% coeff=sum(coeff_component,1);
% y=feval(f,lag);
% [h, p, stats] = chi2gof(y, 'expected', acf);
% i=i+1;
% iter=iter+1;
% end
%
% figure
% plot(lag,feval(f,lag),lag,acf)
% ---------- GENETIC ALGORITHM SOLUTIONS ----------
objfcn = @(w,a,b,coeff1,coeff2,coeff3,x) w*exp(-a*x).*cos(b*x)+coeff1*exp(-coeff2*x).*cos(coeff3);
ftns = @(b) norm(acf - objfcn(b(1),b(2),b(3),b(4),b(5),b(6),lag));
PopSz = 500;
Parms = 6;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
tic
for kga = 1:9
t0 = clock;
fprintf('\n----------\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
GAdata(kga,:) = [fval theta(:).'];
ParamChr = ["w" "a" "b" "coeff(1)", "coeff(2)", "coeff(3)"];
fprintf(1,'\tParameters:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\t%-8s = %8.5f\n', ParamChr(k1), theta(k1))
end
end
toc
format long
GAdata
[MaxFtns,idx] = min(GAdata(:,1)) % Best Fitness & Index
theta1 = GAdata(idx, 2:end) % Paramters: 'w', 'a', 'b'
ACFmodel = fitnlm(lag, acf, @(b,x)objfcn(b(1),b(2),b(3),b(4),b(5),b(6),x), theta) % Calculate & Return Statistics, Tweak Fit
theta2 = ACFmodel.Coefficients.Estimate
t = lag;
c = acf;
tv = linspace(min(t), max(t));
Cfit1 = objfcn(theta1(1),theta1(2),theta1(3),theta1(4),theta1(5),theta1(6), tv);
Cfit2 = objfcn(theta2(1),theta2(2),theta2(3),theta2(4),theta2(5),theta2(6), tv);
figure
hd = plot(t, c, 'p', 'DisplayName','Data');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
plot(tv, Cfit2, '-r', 'DisplayName','GA Tweaked');
hlp = plot(tv, Cfit1, 'DisplayName','GA');
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
legend('Location','best')
I used the fitnlm function here to provide the statistics of the parameters, and also to tweak the parameters ga estimated, using a gradient descent approach. (This is commonly done, and ga actually provides a 'HybridFcn' option to do just that as part of the ga parameter estimation process. I did it separately here because I wanted the statistics.) The tweaking did not change the parameters or the fit noticably.
Except for ‘b(3)’ corresponding to ‘b’ in the original objective function, all the parameters are highly statistically significant, as is the fit.
I kept the number of iterations in the loop to 9 because of the 55-second time limit here. Increase it to get more ga runs and perhaps better parameter estimates.
.
See Also
Categories
Find more on Interpolation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!