Parameter estimation of SEIHRDBP model

% I am back please. I have gotten the data correct data with date. The whole idea is that I want the parameter estimation for parameters for SEI. I.e., run the parameter estimation these classes (SEI). These are only the active human class. The parameters associated with classes are theta16, theta17, theta18, theta19, theta20, theta3, theta7 and theta8.
% I tried to run the new code use the "ga" algorithm but I got the error:
% Arrays have incompatible sizes for this operation.
% Arrays have incompatible sizes for this operation.
%
% Error in untitled10>SEIHRDBP_MODEL_PARAMETER (line 61)
% opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,12)*1E-3 ones(1,7)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf');
%
% Error in untitled10 (line 2)
% SEIHRDBP_MODEL_PARAMETER
% In fact, this is my new corrected codes and new data is attached.
SEIHRDBP_MODEL_PARAMETER
function SEIHRDBP_MODEL_PARAMETER
%7 ODEs to describe the model (S,E,I,H,R,D,B, P)
function M = SEIHRDBP(p,t)
%Initial conditions with population
M0 = p(21:28);
[~,m] = ode15s(@DifEq,t,M0);
function dM = DifEq(~,k)
dMdt = zeros(8,1);
dMdt(1) = p(1) - (p(16)*k(3) + p(17)*k(4) + p(18)*k(6)+ p(19)*k(7) + p(20)*k(8))*k(1)-p(2)*k(1);
dMdt(2) = (p(16)*k(3) + p(17)*k(4) + p(18)*k(6)+ p(19)*k(7) + p(20)*k(8))*k(1) - (p(2) + p(3))*k(2);
dMdt(3) = p(3)*k(2) - (p(2) + p(4) + p(5))*k(1);
dMdt(4) = p(5)*k(3) - (p(6) + p(7) + p(2))*k(4);
dMdt(5) = p(7)*k(4) - p(2)*k(5);
dMdt(6) = p(4)*k(3) + p(6)*k(4) -p(8)*k(6);
dMdt(7) = p(8)*k(6) - p(9)*k(7);
dMdt(8) = p(10) + p(14)*k(3) + p(15)*k(4) + p(11)*k(6) + p(12)*k(7) -p(13)*k(8);
dM = dMdt;
end
M = m;
end
%From day Aug to nov 20
t = [1
2
3
4
5
6
7
8
9
10];
%Official data for I, D for 10 days
k = [24 23
62 35
68 41
70 42
70 42
70 43
68 49
67 49
67 49
66 49];
% [theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,theta0,t,c);
function fv = ftns(p,t)
SEIHRDBPout = SEIHRDBP(p,t);
fv = norm(k-SEIHRDBPout(:,[3 6]));
end
% ftns = @(p) norm(k-EIDAGG2(p,t));
PopSz = 50;
% Parameter = 12;
Parameter = 28;
opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,20)*1E-3 ones(1,8)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf');
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t0)
[p,fval,exitflag,output] = ga(@(p)ftns(p,t),Parameter,[],[],[],[],zeros(Parameter,1),[ones(20,1); Inf(8,1)],[],[],opts);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t1)
GA_Time = etime(t1,t0);
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf(1,'\n\tRate Constants:\n')
for k1 = 1:12
fprintf(1,'\t\tp(%2d) = %11.8f\n',k1,p(k1))
end
fprintf(['\n\tInitial Conditions:\n'])
for k1 = 1:8
fprintf('\t\tdMdt(%d) = %11.8f\n',k1,p(k1))
end
tv = linspace(min(t),max(t));
Mfit = SEIHRDBP(p,tv);
figure(1)
plot(t,k,'p')
hold on
hlp = plot(tv,Mfit);
hold off
grid
xlabel('time')
ylabel('Population')
legend(hlp,'S','E','I','H','R','D','B','P',Location','best')
end

1 Comment

I have been able to change this: [ones(1,20)*1E-3 ones(1,8)*10]. It seems to work but is taken long time. Please why is it so?

Sign in to comment.

 Accepted Answer

The time values are incorrect.
The complete table of those values should be —
Date Infected Death
____ ________ _____
0 24 13
16 62 35
30 68 41
34 70 42
37 70 42
43 70 43
50 68 49
60 67 49
66 67 49
80 66 49
When I experimented with a similar initial population matrix to:
'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,20)*1E-3 ones(1,8)*10]
the code crashed, even using ode15s, because the differential equaiton system encountered a singularity and stopped.
The code I used —
T0 = readtable('observed.csv')
VN = T0.Properties.VariableNames % Use These Variable Names
T1 = readtable('mydata.csv');
T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
% figure
% plot(T1{:,1}, T1{:,2:end})
% grid
% return
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1);
c = T1{:,2:end};
% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out
ftns = @(theta) norm(c-SEIHRDBP(theta,t));
PopSz = 500;
Parms = 28;
% optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-2, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
% optshf = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10, 'HybridFcn',@fmincon, 'PlotFcn',@gaplotbestf, 'PlotInterval',1); % With 'HybridFcn'
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
t1 = clock;
sprintf(['B = [' repmat('%.6f ',1,28) ']'],theta)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
tv = t;
Cfit = SEIHRDBP(theta, tv)
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
Ax = gca;
Ax.YScale = 'log';
xlabel('Time')
ylabel('Concentration')
% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
legend(hlp, VN, 'Location','best')
function C=SEIHRDBP(theta,t)
c0=theta(end-7:end);
[T,Cv]=ode15s(@DifEq,t,c0);
%
function dC=DifEq(t,c)
dcdt=zeros(8,1);
dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1)-theta(2)*c(1);
dcdt(2) =(theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1) - (theta(2) + theta(3))*c(2);
dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(1);
dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
dcdt(5) = theta(7)*c(4) - theta(2)*c(5);
dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
dcdt(7) = theta(8)*c(6) - theta(9)*c(7);
dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
dC=dcdt;
end
C=Cv(:,[3 6]); % Return Only 'Infected' & 'Deaths'
end
(these differential equations use different variable names, however I believe the system is the same as in the initial question post) produced (after about 70 generations over about 4 minutes with MATLAB Online) what I consider to be insane parameter values for a kinetic system, those being:
Rate Constants:
Theta( 1) = 72614000.00000
Theta( 2) = 96602001.00000
Theta( 3) = 1055000.00010
Theta( 4) = 47392001.01295
Theta( 5) = 19519002.01651
Theta( 6) = 87092000.06967
Theta( 7) = 73074002.00018
Theta( 8) = 48324999.00702
Theta( 9) = 96660001.00649
Theta(10) = 9511000.00048
Theta(11) = 12759002.00011
Theta(12) = 73740000.00532
Theta(13) = 89201001.00000
Theta(14) = 31149000.05865
Theta(15) = 41402000.00567
Theta(16) = 31798000.00065
Theta(17) = 8765000.01877
Theta(18) = 5103001.50109
Theta(19) = 6618002.93761
Theta(20) = 23122001.00077
Theta(21) = 872.77225
Theta(22) = 875.78264
Theta(23) = 60.38500
Theta(24) = 280.22130
Theta(25) = 783.93000
Theta(26) = 44.87735
Theta(27) = 980.88375
Theta(28) = 344.53288
Then with those parameters, throwing this message:
Warning: Failure at t=1.870129e-05. Unable to meet integration tolerances without reducing the step size below the smallest value allowed (5.421011e-20) at time t.
and not going further.
So the code crashed again for essentially the same reason, and the last plot (showing the data and the estimated fit) did not display.
I suspect something is wrong with the system of differential equations, however I cannot determine what that problem is. They could be coded correctly from a published paper and the paper itself is in error. (That would not be the first time I would have encountered that problem, since I spent an entire semester in grad school working to replicate an interesting paper that used neural nets for EKG classification that when coded according to the description in the paper and using the online data provided, resulted in entirely different — and significantly conflicting — results than those published. My professor reviewed the paper and my code and my results and agreed with me. It was a learning experience for both of us.)
.

24 Comments

Thank you for your response. This Ebola Virus model, my collaborators have done other analysis, which seems to work. for I suspect the there still problem with the data. This is another data from March 2014-Oct 2016. That's 264 days
My pleasure!
I will assume that ‘Total Cases’ corresponds to ‘infected’ as before.
With the only changes to my code with these data being:
T1 = readtable('data1.csv', 'VariableNamingRule','preserve')
and:
t = days(T1{:,1} - T1{1,1});
with:
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
I am still getting uncharacteristic parameter estimates for a kinetic system (it completely fails with intital estimates in the expected range of about 1 for the 20 kinetic parameters):
Rate Constants:
Theta( 1) = 2598017.75000
Theta( 2) = 7621020.75000
Theta( 3) = 67016.25000
Theta( 4) = 751013.50000
Theta( 5) = 8537014.75000
Theta( 6) = 345014.25000
Theta( 7) = 8775018.00000
Theta( 8) = 9188014.50000
Theta( 9) = 6872011.25000
Theta(10) = 8805015.00000
Theta(11) = 1680018.25000
Theta(12) = 6903007.75000
Theta(13) = 4194013.25000
Theta(14) = 4230015.75000
Theta(15) = 9147018.25000
Theta(16) = 6158018.00000
Theta(17) = 2797020.50000
Theta(18) = 594021.75000
Theta(19) = 4872018.50000
Theta(20) = 2149011.00000
Theta(21) = 43.54000
Theta(22) = 63.25000
Theta(23) = 143.63000
Theta(24) = 104.67000
Theta(25) = 49.45000
Theta(26) = 143.92000
Theta(27) = 91.53000
Theta(28) = 35.83000
It again crashes when attempting to calculate the plot, so no plot of the final values is created.
I still believe there is something significantly wrong with the differential equations, however I cannot determine what that is.
.
Thank you, maybe I have to check the equations
My pleasure.
It would also be helpful to post a PDF of the paper where the equations first appeared.
Thank you for your assistance. Is actually my master thesis and I working on publishing it. These are the equations.
I will let you parse them and code them, because I do not understand the notation. When you check them and re-wriite the differential equations used in your ‘SEIHRDBP’ function, I will again try it to see if I can estimate the parameters.
% I have spotted a mistake in the equations and I have uploaded a new data
% with reported dates (data1). Please use the data1. Is recorded for 264
% days.
T0 = readtable('data1.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T0 = 265x3 table
WHOReportDate infected deaths _____________ ________ ______ 25/03/2014 86 60 26/03/2014 86 59 27/03/2014 117 77 31/03/2014 120 76 01/04/2014 130 82 02/04/2014 135 88 07/04/2014 169 102 10/04/2014 179 115 17/04/2014 224 135 21/04/2014 230 142 23/04/2014 234 157 30/04/2014 242 147 05/05/2014 244 166 14/05/2014 245 168 23/05/2014 270 183 27/05/2014 271 187
VN = T0.Properties.VariableNames % Use These Variable Names
VN = 1x3 cell array
{'WHOReportDate'} {'infected'} {'deaths'}
T1 = readtable('mydata.csv');
Error using readtable
Unable to find or open 'mydata.csv'. Check the path and filename or file permissions.
T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
% figure
% plot(T1{:,1}, T1{:,2:end})
% grid
% return
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1);
c = T1{:,2:end};
% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out
ftns = @(theta) norm(c-SEIHRDBP(theta,t));
PopSz = 500;
Parms = 28;
% optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-2, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
% optshf = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10, 'HybridFcn',@fmincon, 'PlotFcn',@gaplotbestf, 'PlotInterval',1); % With 'HybridFcn'
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
t1 = clock;
sprintf(['B = [' repmat('%.6f ',1,28) ']'],theta)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
tv = t;
Cfit = SEIHRDBP(theta, tv)
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
Ax = gca;
Ax.YScale = 'log';
xlabel('Time')
ylabel('Concentration')
% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
legend(hlp, VN, 'Location','best')
function C=SEIHRDBP(theta,t)
c0=theta(end-7:end);
[T,Cv]=ode15s(@DifEq,t,c0);
%
function dC=DifEq(t,c)
dcdt=zeros(8,1);
dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1)-theta(2)*c(1);
dcdt(2) = (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1) - (theta(2) +theta(3))*c(2);
dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(3);
dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
dcdt(5) = theta(7)*c(4) -theta(2)*c(5);
dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
dcdt(7) = theta(8)*c(6)-theta(9)*c(7);
dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
dC=dcdt;
end
C=Cv(:,[3 6]); % Return Only 'Infected' & 'Deaths'
end
There is an error in ‘dcdt(1)’ specifically:
dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*theta)*c(1)-theta(2)*c(1);
↑← HERE
I do not see any similar problems on a quick inspection.
University
University on 5 Mar 2024
Edited: University on 5 Mar 2024
Sorry, I have corrected it. Is supposed to be c(8). I think you can go ahead and try it now.
That seems to be working.
I started it with:
'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E-3 ones(1,8)*0.01]
that should give reasonable magnitudes for the kinetic parameters, if they remain in that region, and let it go for 500 generations.
It did better, however the parameters are still not optiomal, nor is the fit to the data:
Elapsed Time: 9.596859670000000E+02 00:15:59.685
Fitness value at convergence = 371764.9840
Generations = 500
Rate Constants:
Theta( 1) = 182.78900
Theta( 2) = 0.10400
Theta( 3) = 59.95227
Theta( 4) = 0.07475
Theta( 5) = 0.00100
Theta( 6) = 24.84600
Theta( 7) = 35.69500
Theta( 8) = 0.00700
Theta( 9) = 32.74200
Theta(10) = 31.98400
Theta(11) = 23.42800
Theta(12) = 25.05000
Theta(13) = 26.40075
Theta(14) = 23.32913
Theta(15) = 28.59505
Theta(16) = 35.31950
Theta(17) = 30.12100
Theta(18) = 25.97900
Theta(19) = 31.91500
Theta(20) = 40.90413
Theta(21) = 89.95000
Theta(22) = 95.62000
Theta(23) = 118.36156
Theta(24) = 94.21344
Theta(25) = 100.94000
Theta(26) = 86.38000
Theta(27) = 118.12199
Theta(28) = 49.12000
These results are definitely not optimal, however they are better. (On the plot, the fit to the data is not good, so the differential equations may still need to be tweaked.) At the very least, consider letting it run for perhaps 5000 generations and see what that result is. If it is still not converging well on the data, take another look at the differential equations. Ideally, the final fitness value should be in the low thousands (probably less than 10000), considering the magnitude of the data. Also consider tweaking the final parameter estimates by using them as the initial estimates to the lsqcurvefit version of this.
.
Great, thank you so much. I will look into that…
Hi, please how can I get the lsqcurvefit version.
You quoted that in your initial opst in your other question.
Specifically here it would be:
T0 = readtable('observed.csv')
VN = T0.Properties.VariableNames % Use These Variable Names
T1 = readtable('mydata.csv');
T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
% figure
% plot(T1{:,1}, T1{:,2:end})
% grid
% return
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1);
c = T1{:,2:end};
% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out
theta0 = % theta vector from the ga results
[theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@SEIHRDBP,theta0,t,c,zeros(size(theta0)));
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
tv = t;
Cfit = SEIHRDBP(theta, tv)
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
Ax = gca;
Ax.YScale = 'log';
xlabel('Time')
ylabel('Concentration')
% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
legend(hlp, VN, 'Location','best')
function C=SEIHRDBP(theta,t)
c0=theta(end-7:end);
[T,Cv]=ode15s(@DifEq,t,c0);
%
function dC=DifEq(t,c)
dcdt=zeros(8,1);
dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1)-theta(2)*c(1);
dcdt(2) =(theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1) - (theta(2) + theta(3))*c(2);
dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(1);
dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
dcdt(5) = theta(7)*c(4) - theta(2)*c(5);
dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
dcdt(7) = theta(8)*c(6) - theta(9)*c(7);
dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
dC=dcdt;
end
C=Cv(:,[3 6]); % Return Only 'Infected' & 'Deaths'
end
So it is essentially the same as the ga code without the ga call.
.
Hi, please can you get the time interval for me from this attached data?
I tried
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear');
Units = 'dayofyear'
t = t - t(1);
I do get negative values.
These data are quite different from the previous data sets. To begin with, they apply to different countries, so it is necessary to choose one country and then do the analysis with only that country.
The country itself is not important. For this analysis, I chose ‘Liberia’ simply because it has the most data. Even considering that, I needed to limit it at the beginning so that the data were consistent. Toward the end of the outbreak, the numbers of infected and dead fluctuated wildly. You may want to use all of them evventually, however for the time being, I filtered out the wild fluctuations.
The code I used for that is:
Tc = readtable('ebola.csv', 'VariableNamingRule','preserve')
% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
[Cu,iv,ix] = unique(Tc.Country);
Nr = accumarray(ix, 1)
Summary = table(Cu,Nr)
[maxNr,idx] = max(Summary.Nr)
UseCountry = Summary(idx,:)
AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)})
Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell);
CountryCell = AllCountryCell(Lv,:);
CountryData = CountryCell{:}
Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Fluctuations
CountryData = CountryData(~Lvm,:) % Filter Out Fluctuations
figure
plot(CountryData{:,2}, CountryData{:,3:end})
grid
% return
T1 = CountryData(:,2:end)
Beyond that, the code is the same as previously. I am running it now, and will post back with an edit when it finishes. The fitness values still seem to me to be unacceptably high, even given the amplitude of the data.
EDIT — (7 Mar 2024 at 02:53)
This just in —
On the plot, ‘Deaths’ fits well, while ‘Infected’ (the third and sixth elements of ‘VN’) is quite far off, still using columns (differential equations) 3 and 6. Are those still appropriate here, or does that need to be changed?
Latest Results —
'B = [170.553500 0.104000 59.656500 0.122000 0.091000 16.443500 62.706000 0.015000 19.409000 29.011000 39.846000 23.242000 18.513000 21.040000 21.314000 28.976000 22.451000 25.906000 36.867000 32.707000 120.160000 102.580000 115.120000 113.150000 20.310000 98.300000 60.677500 41.287500 ]'
Stop Time: 2024-03-07 02:41:22.3062
GA_Time =
877.3838
Elapsed Time: 8.773837640000000E+02 00:14:37.383
Fitness value at convergence = 106445.9925
Generations = 500
Rate Constants:
Theta( 1) = 170.55350
Theta( 2) = 0.10400
Theta( 3) = 59.65650
Theta( 4) = 0.12200
Theta( 5) = 0.09100
Theta( 6) = 16.44350
Theta( 7) = 62.70600
Theta( 8) = 0.01500
Theta( 9) = 19.40900
Theta(10) = 29.01100
Theta(11) = 39.84600
Theta(12) = 23.24200
Theta(13) = 18.51300
Theta(14) = 21.04000
Theta(15) = 21.31400
Theta(16) = 28.97600
Theta(17) = 22.45100
Theta(18) = 25.90600
Theta(19) = 36.86700
Theta(20) = 32.70700
Theta(21) = 120.16000
Theta(22) = 102.58000
Theta(23) = 115.12000
Theta(24) = 113.15000
Theta(25) = 20.31000
Theta(26) = 98.30000
Theta(27) = 60.67750
Theta(28) = 41.28750
EDIT — (7 Mar 2024 at 03:42)
Letting it run for 1000 generations did not change the fit much, and produced these results —
Elapsed Time: 1.785278555000000E+03 00:29:45.278
Fitness value at convergence = 104320.7923
Generations = 1000
Rate Constants:
Theta( 1) = 307.01975
Theta( 2) = 0.31750
Theta( 3) = 103.02616
Theta( 4) = 0.08000
Theta( 5) = 0.01700
Theta( 6) = 18.11753
Theta( 7) = 107.94763
Theta( 8) = 0.01093
Theta( 9) = 34.96571
Theta(10) = 60.88569
Theta(11) = 44.23353
Theta(12) = 47.06145
Theta(13) = 38.39091
Theta(14) = 49.11388
Theta(15) = 57.49012
Theta(16) = 43.48404
Theta(17) = 50.10263
Theta(18) = 39.86377
Theta(19) = 32.73687
Theta(20) = 51.32770
Theta(21) = 91.68969
Theta(22) = 117.37641
Theta(23) = 148.15000
Theta(24) = 131.28027
Theta(25) = 89.34000
Theta(26) = 85.50352
Theta(27) = 128.46062
Theta(28) = 99.14187
The final fitness value improved only slightly.
Using the correct differential equation for ‘Infected’ would definitely improve that.
.
University
University on 7 Mar 2024
Edited: University on 7 Mar 2024
% these are the correct equations. The equation for the infected was incorrect. in the last term of the equation, it was c(1) which is not correct. Is suppose to be c(3). Can you post the code where the deaths fit well?
dcdt=zeros(8,1);
dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1)-theta(2)*c(1);
dcdt(2) =(theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1) - (theta(2) + theta(3))*c(2);
dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(3);
dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
dcdt(5) = theta(7)*c(4) - theta(2)*c(5);
dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
dcdt(7) = theta(8)*c(6) - theta(9)*c(7);
dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
dC=dcdt;
The ‘infected’ estimate is still not accurate (about of what it should be), however ‘deaths’ is reasonably close —
Elapsed Time: 1.907417319000000E+03 00:31:47.417
Fitness value at convergence = 97274.7981
Generations = 1000
Rate Constants:
Theta( 1) = 341.67850
Theta( 2) = 0.10400
Theta( 3) = 97.56275
Theta( 4) = 0.12200
Theta( 5) = 0.02850
Theta( 6) = 21.74038
Theta( 7) = 117.70600
Theta( 8) = 0.03844
Theta( 9) = 38.51837
Theta(10) = 57.51100
Theta(11) = 61.34600
Theta(12) = 48.53888
Theta(13) = 40.51300
Theta(14) = 47.10250
Theta(15) = 50.68900
Theta(16) = 45.28850
Theta(17) = 36.95100
Theta(18) = 55.03100
Theta(19) = 46.11700
Theta(20) = 59.70700
Theta(21) = 139.41000
Theta(22) = 123.70500
Theta(23) = 136.87000
Theta(24) = 125.90000
Theta(25) = 37.81000
Theta(26) = 102.40937
Theta(27) = 82.17750
Theta(28) = 70.53750
I ran this a few minutes ago. The fitness value is much better.
I still have it returning columns (equations) 3 and 6.
.
Thank you so much for this. Please how can get a reasonble time (maybe in days from this data). At the moment if I print t, there are negative values. For instance, t(50)=-172
Tc = readtable('ebola.csv', 'VariableNamingRule','preserve');
% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
[Cu,iv,ix] = unique(Tc.Country);
Nr = accumarray(ix, 1);
Summary = table(Cu,Nr);
[maxNr,idx] = max(Summary.Nr);
UseCountry = Summary(idx,:);
AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)});
Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell);
CountryCell = AllCountryCell(Lv,:);
CountryData = CountryCell{:};
Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Fluctuations
CountryData = CountryData(~Lvm,:);
% Filter Out Fluctuations
figure
plot(CountryData{:,2}, CountryData{:,3:end})
grid
% return
T1 = CountryData(:,2:end);
t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1)
t = 139×1
0 3 7 11 13 17 19 21 26 28
t(50)
ans = -172
The code I am using calculates ‘t’ as:
t = days(T1{:,1} - T1{1,1});
and there are no negative values.
I am needing to update my code to work with the differint data files. My current pre-processing step is:
Tc = readtable('ebola.csv', 'VariableNamingRule','preserve')
% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry
ebola = true;
if ebola
[Cu,iv,ix] = unique(Tc.Country); % Unique Countries
Nr = accumarray(ix, 1) % Count Country Data
Summary = table(Cu,Nr) % Present REsults
[maxNr,idx] = max(Summary.Nr)
UseCountry = Summary(idx,:)
AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)}) % Cell Array Of All The ‘Country’ ‘table’ Arrays
Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell); % Country With Maximum Number Of Observations
CountryCell = AllCountryCell(Lv,:); % Table For Chosen Country
CountryData = CountryCell{:} % View ‘table’ For The Chosen Country
Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Abrupt Changes
CountryData = CountryData(~Lvm,:) % Delete Them
figure
plot(CountryData{:,2}, CountryData{:,3:end}) % Plot Country Result Data
grid
else
% Other Data Options
end
% return
T1 = CountryData(:,2:end)
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = days(T1{:,1} - T1{1,1});
[tb1,tb2] = bounds(t)
c = T1{:,2:end};
Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'})
figure
plot(Data{:,1}, Data{:,2:end})
grid
The ‘Summary’ table lists the countries in the file and the number of data associated with each of them. The ‘UseCountry’ step chooses the country using the ‘idx’ value (that can be any number from 1 to 10 since there are 10 countries in the file) and then further processes those values to present the data to the optimisation routine.
I intend to fill the else (or if necessary, elseif) section with procedures for reading and using other files as required.
.
Thank you so mcuh for your assistance. Please part of the code allows one to select country, assumming I want Nigeria
In the ‘Summary’ table, ‘Nigeria’ is country 5.
To choose a specific country, the if block changes to:
if ebola
[Cu,iv,ix] = unique(Tc.Country); % Unique Countries
Nr = accumarray(ix, 1) % Count Country Data
Summary = table(Cu,Nr) % Present REsults
% [maxNr,idx] = max(Summary.Nr)
ChooseCountry = 'Nigeria';
idx = cellfun(@(x)strmatch(x,ChooseCountry), Summary.Cu, 'Unif',0)
idx = cellfun(@(x)~isempty(x), idx, 'Unif',0)
idx = cell2mat(idx)
UseCountry = Summary(idx,:)
AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)}) % Cell Array Of All The ‘Country’ ‘table’ Arrays
% Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell); % Country With Maximum Number Of Observations
CountryCell = AllCountryCell(idx,:); % Table For Chosen Country
CountryData = CountryCell{:} % View ‘table’ For The Chosen Country
Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Abrupt Changes
% CountryData = CountryData(~Lvm,:) % Delete Them
figure
plot(CountryData{:,2}, CountryData{:,3:end}) % Plot Country Result Data
grid
xlabel(CountryData.Properties.VariableNames{2})
title(ChooseCountry)
legend(CountryData.Properties.VariableNames{3:end}, 'Location','best')
% Data = CountryData(:,2:end)
else
% Other Data Options
end
After the if block, the code continues as:
T1 = CountryData(:,2:end)
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = days(T1{:,1} - T1{1,1});
[tb1,tb2] = bounds(t)
c = T1{:,2:end};
Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'})
figure
plot(Data{:,1}, Data{:,2:end})
grid
legend(CountryData.Properties.VariableNames{3:end}, 'Location','best')
These may change with subsequent data sets.
I am currently running the code for this data set. I will post back with an EDIT when it completes.
.
Thank you so very much for your assitance. I have been able to do it in Python using least square method and it fits well.
As always, my pleasure!
I need to spend some time with Python. A least squares (gradient descent) approach can work with relatively simple problems, however your system has 20 kinetic parameters (and in my approach 28 including the initial conditions for the differential equations), so gradient descent is less likely to be successful unless the initial parameter estimates are relatively close to the best values. For that reason, I chose the genetic algorithm approach. I am currently running it, and it is converging reasonably well after nearly an hour. (It seems to be stalling just now, so it may be close to converging.) it is experiencing a ‘graphics timeout’ so it is depriving me of progress update information right now. I will post the results when it provides them.
.

Sign in to comment.

More Answers (0)

Products

Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!