Parameter estimation of SEIHRDBP model

Question

0 votes

mydata.csv

% I am back please. I have gotten the data correct data with date. The whole idea is that I want the parameter estimation for parameters for SEI. I.e., run the parameter estimation these classes (SEI). These are only the active human class. The parameters associated with classes are theta16, theta17, theta18, theta19, theta20, theta3, theta7 and theta8. 
% I tried to run the new code use the "ga" algorithm but I got the error: 
% Arrays have incompatible sizes for this operation.
% Arrays have incompatible sizes for this operation.
% 
% Error in untitled10>SEIHRDBP_MODEL_PARAMETER (line 61)
% opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,12)*1E-3 ones(1,7)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf');
% 
% Error in untitled10 (line 2)
% SEIHRDBP_MODEL_PARAMETER
% In fact, this is my new corrected codes and new data is attached.
SEIHRDBP_MODEL_PARAMETER
function SEIHRDBP_MODEL_PARAMETER
%7 ODEs to describe the model (S,E,I,H,R,D,B, P)
    function M = SEIHRDBP(p,t)
        %Initial conditions with population 
        M0 = p(21:28);
        [~,m] = ode15s(@DifEq,t,M0);
        function dM = DifEq(~,k)
            dMdt = zeros(8,1);
            dMdt(1) = p(1) - (p(16)*k(3) + p(17)*k(4) + p(18)*k(6)+ p(19)*k(7) +  p(20)*k(8))*k(1)-p(2)*k(1);
            dMdt(2) = (p(16)*k(3) + p(17)*k(4) + p(18)*k(6)+ p(19)*k(7) +  p(20)*k(8))*k(1) - (p(2) + p(3))*k(2);
            dMdt(3) = p(3)*k(2) - (p(2) + p(4) + p(5))*k(1);
            dMdt(4) = p(5)*k(3) - (p(6) + p(7) + p(2))*k(4);
            dMdt(5) = p(7)*k(4) - p(2)*k(5);
            dMdt(6) = p(4)*k(3) + p(6)*k(4) -p(8)*k(6);
            dMdt(7) = p(8)*k(6) - p(9)*k(7);
            dMdt(8) = p(10) +  p(14)*k(3) + p(15)*k(4) + p(11)*k(6) + p(12)*k(7) -p(13)*k(8);
            dM = dMdt;
        end
        M = m;
    end
%From day Aug to nov 20
t = [1
    2
    3
    4
    5
    6
    7
    8
    9
    10];
%Official data for I, D for 10 days
k = [24 23 
    62 35 
    68 41 
    70 42 
    70 42 
    70 43 
    68 49 
    67 49
    67 49
    66 49]; 
% [theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,theta0,t,c);
function fv = ftns(p,t)
    SEIHRDBPout = SEIHRDBP(p,t);
    fv = norm(k-SEIHRDBPout(:,[3 6]));
end
% ftns = @(p) norm(k-EIDAGG2(p,t));
PopSz = 50;
% Parameter = 12;
Parameter = 28;
opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,20)*1E-3 ones(1,8)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf');
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t0)
[p,fval,exitflag,output] = ga(@(p)ftns(p,t),Parameter,[],[],[],[],zeros(Parameter,1),[ones(20,1); Inf(8,1)],[],[],opts);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t1)
GA_Time = etime(t1,t0);
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf(1,'\n\tRate Constants:\n')
for k1 = 1:12
    fprintf(1,'\t\tp(%2d) = %11.8f\n',k1,p(k1))
end
fprintf(['\n\tInitial Conditions:\n'])
for k1 = 1:8
    fprintf('\t\tdMdt(%d) = %11.8f\n',k1,p(k1))
end
tv = linspace(min(t),max(t));
Mfit = SEIHRDBP(p,tv);
figure(1)
plot(t,k,'p')
hold on
hlp = plot(tv,Mfit);
hold off
grid
xlabel('time')
ylabel('Population')
legend(hlp,'S','E','I','H','R','D','B','P',Location','best')
end

1 Comment
Show -1 older comments Hide -1 older comments

University on 4 Mar 2024

I have been able to change this: [ones(1,20)*1E-3 ones(1,8)*10]. It seems to work but is taken long time. Please why is it so?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Star Strider on 5 Mar 2024

Edited: Star Strider on 5 Mar 2024

Open in MATLAB Online

0 votes

The time values are incorrect.

The complete table of those values should be —

Date Infected Death

____ ________ _____

0 24 13

16 62 35

30 68 41

34 70 42

37 70 42

43 70 43

50 68 49

60 67 49

66 67 49

80 66 49

When I experimented with a similar initial population matrix to:

'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,20)*1E-3 ones(1,8)*10]

the code crashed, even using ode15s, because the differential equaiton system encountered a singularity and stopped.

The code I used —

T0 = readtable('observed.csv')
VN = T0.Properties.VariableNames                           % Use These Variable Names
T1 = readtable('mydata.csv');
T1{end-1,1} = datetime(2014,10,31)                         % Correct Typographical Error In 'Date' Entry
% figure
% plot(T1{:,1}, T1{:,2:end})
% grid
% return
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear');                               % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1);
c = T1{:,2:end};
% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out
ftns = @(theta) norm(c-SEIHRDBP(theta,t));
PopSz = 500;
Parms = 28;
% optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-2, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10);                                                                  % Options Structure For 'Answers' Problems
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
% optshf = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10, 'HybridFcn',@fmincon, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);   % With 'HybridFcn'
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
t1 = clock;
sprintf(['B = [' repmat('%.6f  ',1,28) ']'],theta)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
    fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
tv = t;
Cfit = SEIHRDBP(theta, tv)
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
    CV(k1,:) = hd(k1).Color;
    hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
    hlp(k1).Color = CV(k1,:);
end
hold off
grid
Ax = gca;
Ax.YScale = 'log';
xlabel('Time')
ylabel('Concentration')
% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
legend(hlp, VN, 'Location','best')
function C=SEIHRDBP(theta,t)
    c0=theta(end-7:end);
    [T,Cv]=ode15s(@DifEq,t,c0);
   %
    function dC=DifEq(t,c)
        dcdt=zeros(8,1);
        dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) +  theta(20)*c(8))*c(1)-theta(2)*c(1); 
        dcdt(2) =(theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) +  theta(20)*c(8))*c(1) - (theta(2) + theta(3))*c(2);
        dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(1);
        dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
        dcdt(5) = theta(7)*c(4) - theta(2)*c(5);
        dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
        dcdt(7) = theta(8)*c(6) - theta(9)*c(7);
        dcdt(8) = theta(10) +  theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
        dC=dcdt;
    end
    C=Cv(:,[3 6]);              % Return Only 'Infected' & 'Deaths'
end

(these differential equations use different variable names, however I believe the system is the same as in the initial question post) produced (after about 70 generations over about 4 minutes with MATLAB Online) what I consider to be insane parameter values for a kinetic system, those being:

Rate Constants:

Theta( 1) = 72614000.00000

Theta( 2) = 96602001.00000

Theta( 3) = 1055000.00010

Theta( 4) = 47392001.01295

Theta( 5) = 19519002.01651

Theta( 6) = 87092000.06967

Theta( 7) = 73074002.00018

Theta( 8) = 48324999.00702

Theta( 9) = 96660001.00649

Theta(10) = 9511000.00048

Theta(11) = 12759002.00011

Theta(12) = 73740000.00532

Theta(13) = 89201001.00000

Theta(14) = 31149000.05865

Theta(15) = 41402000.00567

Theta(16) = 31798000.00065

Theta(17) = 8765000.01877

Theta(18) = 5103001.50109

Theta(19) = 6618002.93761

Theta(20) = 23122001.00077

Theta(21) = 872.77225

Theta(22) = 875.78264

Theta(23) = 60.38500

Theta(24) = 280.22130

Theta(25) = 783.93000

Theta(26) = 44.87735

Theta(27) = 980.88375

Theta(28) = 344.53288

Then with those parameters, throwing this message:

Warning: Failure at t=1.870129e-05. Unable to meet integration tolerances without reducing the step size below the smallest value allowed (5.421011e-20) at time t.

and not going further.

So the code crashed again for essentially the same reason, and the last plot (showing the data and the estimated fit) did not display.

I suspect something is wrong with the system of differential equations, however I cannot determine what that problem is. They could be coded correctly from a published paper and the paper itself is in error. (That would not be the first time I would have encountered that problem, since I spent an entire semester in grad school working to replicate an interesting paper that used neural nets for EKG classification that when coded according to the description in the paper and using the online data provided, resulted in entirely different — and significantly conflicting — results than those published. My professor reviewed the paper and my code and my results and agreed with me. It was a learning experience for both of us.)

.

24 Comments
Show 22 older comments Hide 22 older comments

Star Strider on 5 Mar 2024

My pleasure!

I will assume that ‘Total Cases’ corresponds to ‘infected’ as before.

With the only changes to my code with these data being:

T1 = readtable('data1.csv', 'VariableNamingRule','preserve')

and:

t = days(T1{:,1} - T1{1,1});

with:

opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);

I am still getting uncharacteristic parameter estimates for a kinetic system (it completely fails with intital estimates in the expected range of about 1 for the 20 kinetic parameters):

Rate Constants:

Theta( 1) = 2598017.75000

Theta( 2) = 7621020.75000

Theta( 3) = 67016.25000

Theta( 4) = 751013.50000

Theta( 5) = 8537014.75000

Theta( 6) = 345014.25000

Theta( 7) = 8775018.00000

Theta( 8) = 9188014.50000

Theta( 9) = 6872011.25000

Theta(10) = 8805015.00000

Theta(11) = 1680018.25000

Theta(12) = 6903007.75000

Theta(13) = 4194013.25000

Theta(14) = 4230015.75000

Theta(15) = 9147018.25000

Theta(16) = 6158018.00000

Theta(17) = 2797020.50000

Theta(18) = 594021.75000

Theta(19) = 4872018.50000

Theta(20) = 2149011.00000

Theta(21) = 43.54000

Theta(22) = 63.25000

Theta(23) = 143.63000

Theta(24) = 104.67000

Theta(25) = 49.45000

Theta(26) = 143.92000

Theta(27) = 91.53000

Theta(28) = 35.83000

It again crashes when attempting to calculate the plot, so no plot of the final values is created.

I still believe there is something significantly wrong with the differential equations, however I cannot determine what that is.

.

University on 5 Mar 2024

Edited: University on 5 Mar 2024

Open in MATLAB Online

data1.csv

% I have spotted a mistake in the equations and I have uploaded a new data
% with reported dates (data1). Please use the data1. Is recorded for 264
% days.
T0 = readtable('data1.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T0 = 265x3 table
    WHOReportDate    infected    deaths
    _____________    ________    ______

     25/03/2014         86         60  
     26/03/2014         86         59  
     27/03/2014        117         77  
     31/03/2014        120         76  
     01/04/2014        130         82  
     02/04/2014        135         88  
     07/04/2014        169        102  
     10/04/2014        179        115  
     17/04/2014        224        135  
     21/04/2014        230        142  
     23/04/2014        234        157  
     30/04/2014        242        147  
     05/05/2014        244        166  
     14/05/2014        245        168  
     23/05/2014        270        183  
     27/05/2014        271        187  
VN = T0.Properties.VariableNames                           % Use These Variable Names
VN = 1x3 cell array
    {'WHOReportDate'}    {'infected'}    {'deaths'}
T1 = readtable('mydata.csv');
Error using readtable
Unable to find or open 'mydata.csv'. Check the path and filename or file permissions.
T1{end-1,1} = datetime(2014,10,31)                         % Correct Typographical Error In 'Date' Entry
% figure
% plot(T1{:,1}, T1{:,2:end})
% grid
% return
VN1 = T1.Properties.VariableNames;
L = size(T1,1);
t = day(T1{:,1},'dayofyear');                               % Time Vector (NECESSARY) Units = 'dayofyear'
t = t - t(1);
c = T1{:,2:end};
% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out
ftns = @(theta) norm(c-SEIHRDBP(theta,t));
PopSz = 500;
Parms = 28;
% optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-2, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10);                                                                  % Options Structure For 'Answers' Problems
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E+3 ones(1,8)*0.01], 'MaxGenerations',1E2, 'FunctionTolerance',1E-10, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
% optshf = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10, 'HybridFcn',@fmincon, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);   % With 'HybridFcn'
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
t1 = clock;
sprintf(['B = [' repmat('%.6f  ',1,28) ']'],theta)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
    fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
tv = t;
Cfit = SEIHRDBP(theta, tv)
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
    CV(k1,:) = hd(k1).Color;
    hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
    hlp(k1).Color = CV(k1,:);
end
hold off
grid
Ax = gca;
Ax.YScale = 'log';
xlabel('Time')
ylabel('Concentration')
% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
legend(hlp, VN, 'Location','best')
function C=SEIHRDBP(theta,t)
    c0=theta(end-7:end);
    [T,Cv]=ode15s(@DifEq,t,c0);
   %
    function dC=DifEq(t,c)
        dcdt=zeros(8,1);
        dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) +  theta(20)*c(8))*c(1)-theta(2)*c(1);
        dcdt(2) = (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) +  theta(20)*c(8))*c(1) - (theta(2) +theta(3))*c(2);
        dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(3);
        dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);
        dcdt(5) = theta(7)*c(4) -theta(2)*c(5);
        dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);
        dcdt(7) = theta(8)*c(6)-theta(9)*c(7);
        dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);
        dC=dcdt;
    end
    C=Cv(:,[3 6]);              % Return Only 'Infected' & 'Deaths'
end

Star Strider on 5 Mar 2024

That seems to be working.

I started it with:

'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[ones(1,20)*1E-3 ones(1,8)*0.01]

that should give reasonable magnitudes for the kinetic parameters, if they remain in that region, and let it go for 500 generations.

It did better, however the parameters are still not optiomal, nor is the fit to the data:

Elapsed Time: 9.596859670000000E+02 00:15:59.685

Fitness value at convergence = 371764.9840

Generations = 500

Rate Constants:

Theta( 1) = 182.78900

Theta( 2) = 0.10400

Theta( 3) = 59.95227

Theta( 4) = 0.07475

Theta( 5) = 0.00100

Theta( 6) = 24.84600

Theta( 7) = 35.69500

Theta( 8) = 0.00700

Theta( 9) = 32.74200

Theta(10) = 31.98400

Theta(11) = 23.42800

Theta(12) = 25.05000

Theta(13) = 26.40075

Theta(14) = 23.32913

Theta(15) = 28.59505

Theta(16) = 35.31950

Theta(17) = 30.12100

Theta(18) = 25.97900

Theta(19) = 31.91500

Theta(20) = 40.90413

Theta(21) = 89.95000

Theta(22) = 95.62000

Theta(23) = 118.36156

Theta(24) = 94.21344

Theta(25) = 100.94000

Theta(26) = 86.38000

Theta(27) = 118.12199

Theta(28) = 49.12000

These results are definitely not optimal, however they are better. (On the plot, the fit to the data is not good, so the differential equations may still need to be tweaked.) At the very least, consider letting it run for perhaps 5000 generations and see what that result is. If it is still not converging well on the data, take another look at the differential equations. Ideally, the final fitness value should be in the low thousands (probably less than 10000), considering the magnitude of the data. Also consider tweaking the final parameter estimates by using them as the initial estimates to the lsqcurvefit version of this.

.

Star Strider on 6 Mar 2024

You quoted that in your initial opst in your other question.

Specifically here it would be:

T0 = readtable('observed.csv')

VN = T0.Properties.VariableNames % Use These Variable Names

T1 = readtable('mydata.csv');

T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry

% figure

% plot(T1{:,1}, T1{:,2:end})

% grid

% return

VN1 = T1.Properties.VariableNames;

L = size(T1,1);

t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'

t = t - t(1);

c = T1{:,2:end};

% Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'}) % Show Once Then Coomment Out

theta0 = % theta vector from the ga results

[theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@SEIHRDBP,theta0,t,c,zeros(size(theta0)));

fprintf(1,'\tRate Constants:\n')

for k1 = 1:length(theta)

fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))

end

tv = linspace(min(t), max(t));

tv = t;

Cfit = SEIHRDBP(theta, tv)

figure

hd = plot(t, c, 'p');

for k1 = 1:size(c,2)

CV(k1,:) = hd(k1).Color;

hd(k1).MarkerFaceColor = CV(k1,:);

end

hold on

hlp = plot(tv, Cfit);

for k1 = 1:size(c,2)

hlp(k1).Color = CV(k1,:);

end

hold off

grid

Ax = gca;

Ax.YScale = 'log';

xlabel('Time')

ylabel('Concentration')

% legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')

legend(hlp, VN, 'Location','best')

function C=SEIHRDBP(theta,t)

c0=theta(end-7:end);

[T,Cv]=ode15s(@DifEq,t,c0);

%

function dC=DifEq(t,c)

dcdt=zeros(8,1);

dcdt(1) = theta(1) - (theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1)-theta(2)*c(1);

dcdt(2) =(theta(16)*c(3) + theta(17)*c(4) + theta(18)*c(6)+ theta(19)*c(7) + theta(20)*c(8))*c(1) - (theta(2) + theta(3))*c(2);

dcdt(3) = theta(3)*c(2) - (theta(2) + theta(4) + theta(5))*c(1);

dcdt(4) = theta(5)*c(3) - (theta(6) + theta(7) + theta(2))*c(4);

dcdt(5) = theta(7)*c(4) - theta(2)*c(5);

dcdt(6) = theta(4)*c(3) + theta(6)*c(4) -theta(8)*c(6);

dcdt(7) = theta(8)*c(6) - theta(9)*c(7);

dcdt(8) = theta(10) + theta(14)*c(3) + theta(15)*c(4) + theta(11)*c(6) + theta(12)*c(7) -theta(13)*c(8);

dC=dcdt;

end

C=Cv(:,[3 6]); % Return Only 'Infected' & 'Deaths'

end

So it is essentially the same as the ga code without the ga call.

.

Star Strider on 7 Mar 2024

Edited: Star Strider on 7 Mar 2024

These data are quite different from the previous data sets. To begin with, they apply to different countries, so it is necessary to choose one country and then do the analysis with only that country.

The country itself is not important. For this analysis, I chose ‘Liberia’ simply because it has the most data. Even considering that, I needed to limit it at the beginning so that the data were consistent. Toward the end of the outbreak, the numbers of infected and dead fluctuated wildly. You may want to use all of them evventually, however for the time being, I filtered out the wild fluctuations.

The code I used for that is:

Tc = readtable('ebola.csv', 'VariableNamingRule','preserve')

% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry

[Cu,iv,ix] = unique(Tc.Country);

Nr = accumarray(ix, 1)

Summary = table(Cu,Nr)

[maxNr,idx] = max(Summary.Nr)

UseCountry = Summary(idx,:)

AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)})

Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell);

CountryCell = AllCountryCell(Lv,:);

CountryData = CountryCell{:}

Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Fluctuations

CountryData = CountryData(~Lvm,:) % Filter Out Fluctuations

figure

plot(CountryData{:,2}, CountryData{:,3:end})

grid

% return

T1 = CountryData(:,2:end)

Beyond that, the code is the same as previously. I am running it now, and will post back with an edit when it finishes. The fitness values still seem to me to be unacceptably high, even given the amplitude of the data.

EDIT — (7 Mar 2024 at 02:53)

This just in —

On the plot, ‘Deaths’ fits well, while ‘Infected’ (the third and sixth elements of ‘VN’) is quite far off, still using columns (differential equations) 3 and 6. Are those still appropriate here, or does that need to be changed?

Latest Results —

'B = [170.553500 0.104000 59.656500 0.122000 0.091000 16.443500 62.706000 0.015000 19.409000 29.011000 39.846000 23.242000 18.513000 21.040000 21.314000 28.976000 22.451000 25.906000 36.867000 32.707000 120.160000 102.580000 115.120000 113.150000 20.310000 98.300000 60.677500 41.287500 ]'

Stop Time: 2024-03-07 02:41:22.3062

GA_Time =

877.3838

Elapsed Time: 8.773837640000000E+02 00:14:37.383

Fitness value at convergence = 106445.9925

Generations = 500

Rate Constants:

Theta( 1) = 170.55350

Theta( 2) = 0.10400

Theta( 3) = 59.65650

Theta( 4) = 0.12200

Theta( 5) = 0.09100

Theta( 6) = 16.44350

Theta( 7) = 62.70600

Theta( 8) = 0.01500

Theta( 9) = 19.40900

Theta(10) = 29.01100

Theta(11) = 39.84600

Theta(12) = 23.24200

Theta(13) = 18.51300

Theta(14) = 21.04000

Theta(15) = 21.31400

Theta(16) = 28.97600

Theta(17) = 22.45100

Theta(18) = 25.90600

Theta(19) = 36.86700

Theta(20) = 32.70700

Theta(21) = 120.16000

Theta(22) = 102.58000

Theta(23) = 115.12000

Theta(24) = 113.15000

Theta(25) = 20.31000

Theta(26) = 98.30000

Theta(27) = 60.67750

Theta(28) = 41.28750

EDIT — (7 Mar 2024 at 03:42)

Letting it run for 1000 generations did not change the fit much, and produced these results —

Elapsed Time: 1.785278555000000E+03 00:29:45.278

Fitness value at convergence = 104320.7923

Generations = 1000

Rate Constants:

Theta( 1) = 307.01975

Theta( 2) = 0.31750

Theta( 3) = 103.02616

Theta( 4) = 0.08000

Theta( 5) = 0.01700

Theta( 6) = 18.11753

Theta( 7) = 107.94763

Theta( 8) = 0.01093

Theta( 9) = 34.96571

Theta(10) = 60.88569

Theta(11) = 44.23353

Theta(12) = 47.06145

Theta(13) = 38.39091

Theta(14) = 49.11388

Theta(15) = 57.49012

Theta(16) = 43.48404

Theta(17) = 50.10263

Theta(18) = 39.86377

Theta(19) = 32.73687

Theta(20) = 51.32770

Theta(21) = 91.68969

Theta(22) = 117.37641

Theta(23) = 148.15000

Theta(24) = 131.28027

Theta(25) = 89.34000

Theta(26) = 85.50352

Theta(27) = 128.46062

Theta(28) = 99.14187

The final fitness value improved only slightly.

Using the correct differential equation for ‘Infected’ would definitely improve that.

.

University on 7 Mar 2024

Open in MATLAB Online

ebola.csv

Thank you so much for this. Please how can get a reasonble time (maybe in days from this data). At the moment if I print t, there are negative values. For instance, t(50)=-172

Tc = readtable('ebola.csv', 'VariableNamingRule','preserve');

% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry

[Cu,iv,ix] = unique(Tc.Country);

Nr = accumarray(ix, 1);

Summary = table(Cu,Nr);

[maxNr,idx] = max(Summary.Nr);

UseCountry = Summary(idx,:);

AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)});

Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell);

CountryCell = AllCountryCell(Lv,:);

CountryData = CountryCell{:};

Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Fluctuations

CountryData = CountryData(~Lvm,:);

% Filter Out Fluctuations

figure

plot(CountryData{:,2}, CountryData{:,3:end})

grid

% return

T1 = CountryData(:,2:end);

t = day(T1{:,1},'dayofyear'); % Time Vector (NECESSARY) Units = 'dayofyear'

t = t - t(1)

t = 139×1

0 3 7 11 13 17 19 21 26 28

t(50)

ans = -172

Star Strider on 7 Mar 2024

The code I am using calculates ‘t’ as:

t = days(T1{:,1} - T1{1,1});

and there are no negative values.

I am needing to update my code to work with the differint data files. My current pre-processing step is:

Tc = readtable('ebola.csv', 'VariableNamingRule','preserve')

% T1{end-1,1} = datetime(2014,10,31) % Correct Typographical Error In 'Date' Entry

ebola = true;

if ebola

[Cu,iv,ix] = unique(Tc.Country); % Unique Countries

Nr = accumarray(ix, 1) % Count Country Data

Summary = table(Cu,Nr) % Present REsults

[maxNr,idx] = max(Summary.Nr)

UseCountry = Summary(idx,:)

AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)}) % Cell Array Of All The ‘Country’ ‘table’ Arrays

Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell); % Country With Maximum Number Of Observations

CountryCell = AllCountryCell(Lv,:); % Table For Chosen Country

CountryData = CountryCell{:} % View ‘table’ For The Chosen Country

Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Abrupt Changes

CountryData = CountryData(~Lvm,:) % Delete Them

figure

plot(CountryData{:,2}, CountryData{:,3:end}) % Plot Country Result Data

grid

else

% Other Data Options

end

% return

T1 = CountryData(:,2:end)

VN1 = T1.Properties.VariableNames;

L = size(T1,1);

t = days(T1{:,1} - T1{1,1});

[tb1,tb2] = bounds(t)

c = T1{:,2:end};

Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'})

figure

plot(Data{:,1}, Data{:,2:end})

grid

The ‘Summary’ table lists the countries in the file and the number of data associated with each of them. The ‘UseCountry’ step chooses the country using the ‘idx’ value (that can be any number from 1 to 10 since there are 10 countries in the file) and then further processes those values to present the data to the optimisation routine.

I intend to fill the else (or if necessary, elseif) section with procedures for reading and using other files as required.

.

Star Strider on 7 Mar 2024

In the ‘Summary’ table, ‘Nigeria’ is country 5.

To choose a specific country, the if block changes to:

if ebola

[Cu,iv,ix] = unique(Tc.Country); % Unique Countries

Nr = accumarray(ix, 1) % Count Country Data

Summary = table(Cu,Nr) % Present REsults

% [maxNr,idx] = max(Summary.Nr)

ChooseCountry = 'Nigeria';

idx = cellfun(@(x)strmatch(x,ChooseCountry), Summary.Cu, 'Unif',0)

idx = cellfun(@(x)~isempty(x), idx, 'Unif',0)

idx = cell2mat(idx)

UseCountry = Summary(idx,:)

AllCountryCell = accumarray(ix,(1:numel(ix)).', [], @(x){Tc(x,:)}) % Cell Array Of All The ‘Country’ ‘table’ Arrays

% Lv = cellfun(@(x)size(x,1)==maxNr, AllCountryCell); % Country With Maximum Number Of Observations

CountryCell = AllCountryCell(idx,:); % Table For Chosen Country

CountryData = CountryCell{:} % View ‘table’ For The Chosen Country

Lvm = diff([CountryData{1,3}; CountryData{:,3}]) < 15; % Detect Abrupt Changes

% CountryData = CountryData(~Lvm,:) % Delete Them

figure

plot(CountryData{:,2}, CountryData{:,3:end}) % Plot Country Result Data

grid

xlabel(CountryData.Properties.VariableNames{2})

title(ChooseCountry)

legend(CountryData.Properties.VariableNames{3:end}, 'Location','best')

% Data = CountryData(:,2:end)

else

% Other Data Options

end

After the if block, the code continues as:

T1 = CountryData(:,2:end)

VN1 = T1.Properties.VariableNames;

L = size(T1,1);

t = days(T1{:,1} - T1{1,1});

[tb1,tb2] = bounds(t)

c = T1{:,2:end};

Data = table(t, c(:,1),c(:,2), 'VariableNames',{'Date','Infected','Death'})

figure

plot(Data{:,1}, Data{:,2:end})

grid

legend(CountryData.Properties.VariableNames{3:end}, 'Location','best')

These may change with subsequent data sets.

I am currently running the code for this data set. I will post back with an EDIT when it completes.

.

University on 7 Mar 2024

Thank you so very much for your assitance. I have been able to do it in Python using least square method and it fits well.

Star Strider on 7 Mar 2024

As always, my pleasure!

I need to spend some time with Python. A least squares (gradient descent) approach can work with relatively simple problems, however your system has 20 kinetic parameters (and in my approach 28 including the initial conditions for the differential equations), so gradient descent is less likely to be successful unless the initial parameter estimates are relatively close to the best values. For that reason, I chose the genetic algorithm approach. I am currently running it, and it is converging reasonably well after nearly an hour. (It seems to be stalling just now, so it may be close to converging.) it is experiencing a ‘graphics timeout’ so it is depriving me of progress update information right now. I will post the results when it provides them.

.

Sign in to comment.

Parameter estimation of SEIHRDBP model

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

24 Comments
Show 22 older comments Hide 22 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Parameter estimation of SEIHRDBP model

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

24 Comments Show 22 older comments Hide 22 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

24 Comments
Show 22 older comments Hide 22 older comments