Effect of Humidity on Electrochemical Sensor Signal and Seeking Correction Approaches

Hi everyone,
I've uploaded my dataset, which consists of a timetable with a model predictions, REF values (which my model should ideally match closely), humidity, and temperature readings. I've also included the plots I've generated based on this data.
The dataset I'm working with represents corrected signals from an electrochemical sensor. I've applied regression modeling and temperature correction to this data. Additionally, for the purpose of this analysis, I've removed outliers to ensure the data's reliability. Upon examining the plots, it's evident that as humidity levels fluctuate, they influence the overall signal. I believe there's a correlation here, especially since the impact spans over several hours, necessitating hours of data monitoring for accurate prediction.
While the sensor's output closely follows the trend of the reference signal, noticeable discrepancies arise in line with humidity changes. Specifically, the signal tends to overestimate when humidity increases and underestimate as it decreases. I've attempted to address this using the Regression Learner App and by comparing the error with humidity readings. However, I've not yet been successful in deriving a suitable correction factor or model. The challenge seems to be the need to assess prolonged data periods and discern if the humidity is rising or falling during those times.
Could anyone provide guidance or suggest an approach to tackle this issue?

 Accepted Answer

Relative humidity will of course be affected by temperature, inbcreases as the temperature decreases, and it can never exceed 100%. It might be better to express humidity in terms of the dewpoint (the environmental temperature can never fall below the dewpoint) or absolute humidity, although this is also a function of temperature.
Plotting the data (‘Temperature’, ‘Humidity’, and the difference between ‘Predicted_DATA’ and ‘REF’) fails (to me,a t least) to reveal any patterns I might use to model them.
LD = load('REG_Model_Data.mat');
T1 = LD.cleaned_timetable_REG
T1 = 5305×5 table
Time Predicted_DATA REF Humidity Temperature ____________________ ______________ ____ ________ ___________ 16-Jan-2023 13:00:00 29.283 18.2 84.169 6.0208 16-Jan-2023 14:00:00 25.142 23.9 81.138 5.8618 16-Jan-2023 15:00:00 26.77 22.3 82.871 5.2763 16-Jan-2023 16:00:00 32.312 21.8 85.829 3.8196 16-Jan-2023 17:00:00 38.172 25.6 86.509 2.4975 16-Jan-2023 18:00:00 42.949 34.4 87.315 1.4005 16-Jan-2023 19:00:00 46.67 40.3 87.347 0.57429 16-Jan-2023 20:00:00 48.916 43.5 80.602 0.0084591 16-Jan-2023 21:00:00 53.266 47.8 80.225 -0.74979 16-Jan-2023 22:00:00 55.718 56.3 81.496 -1.2985 16-Jan-2023 23:00:00 55.289 54.3 82.048 -1.6424 17-Jan-2023 00:00:00 58.95 64.5 87.314 -1.5495 17-Jan-2023 01:00:00 59.48 66.4 91.808 -1.9128 17-Jan-2023 02:00:00 60.96 65.5 93.916 -2.2129 17-Jan-2023 03:00:00 58.813 62.3 95.008 -2.4581 17-Jan-2023 04:00:00 59.082 59.2 95.457 -2.5598
VN = T1.Properties.VariableNames;
delta = T1{:,3}-T1{:,2};
Lv = delta >= 0;
figure
scatter3(T1{Lv,4}, T1{Lv,5}, delta(Lv), '.g')
hold on
scatter3(T1{~Lv,4}, T1{~Lv,5}, delta(~Lv), '.r')
patch([xlim flip(xlim)], [[1 1]*min(ylim) [1 1]*max(ylim)], zeros(1,4), [1 1 1]*0.5, 'FaceAlpha',0.25)
hold off
grid on
xlabel(VN{4})
ylabel(VN{5})
zlabel([VN{3} ' - ' strrep(VN{2},'_','\_')])
view(-60,30)
figure
scatter(T1.Temperature, T1.Humidity,'.')
grid
xlabel('Temperature (°C)')
ylabel('Relative Humidity (%)')
% delta = TT1.Predicted_DATA - TT1.REF;
% pred = [TT1.Humidity TT1.Temperature];
% mdl = fitlm(pred, delta)
I would include humidity as an additional independent variable in your model (if you have not already) and proceed from there. I doubt that there is any way to predict it, other than noting the rate and direction of change in both humidity and temperature.
.

23 Comments

The pattern i am have observed is from the plots i sent, where when humdity rises it under estimates and the opposite when humdity falls.
That most likely has to do with the physical dynamics of the sensor. It might be possible to model the humidity changes in real time, and then have the sensor do minor corrections based on that, however it is not obvious to me how to incorporate that otherwise. Perhaps using a state-space representation of the system, where inputs are the instantaneous humidity and NOx concentrations, and the output is a correct NOx concentration could work. The state-space model would take into consideration the changing dynamics and correct for them.
Beyond that, I have no suggestions other than that the System Identification Toolbox could be helpful in estimating that system from the known inputs and known output.
EDIT — (25 Oct 2023 at 01:22)
I gave some thought to the state space model of temperature and humidity to predict the ‘REF’ data and came up with this preliminary result. The model is large for such models (this one is order 8, and a smaller system might do reasonably well — your decision, not mine) however if you are interested in this sort of approach, this should get you started.
See the documentation for iddata, ssest, and compare for details on those functions.
The idea here is to account for the dynamics of the sytem as a function of time, not simply the instantaneous values of the input variables. Since you want to track the changes in humidity, similar approaches (such as a Kalman filter) might also be appropriate to estimate the directions of the various inputs to your system. (I do not have sufficient experience with Kalman filters or knowledge of your system and instrumentation to attempt that.)
The inputs here are ‘Humidity’ and ‘Temperature’ (in that order), and the output is.‘REF’.
A preliminary approach —
LD = load('REG_Model_Data.mat');
T1 = LD.cleaned_timetable_REG
T1 = 5305×5 table
Time Predicted_DATA REF Humidity Temperature ____________________ ______________ ____ ________ ___________ 16-Jan-2023 13:00:00 29.283 18.2 84.169 6.0208 16-Jan-2023 14:00:00 25.142 23.9 81.138 5.8618 16-Jan-2023 15:00:00 26.77 22.3 82.871 5.2763 16-Jan-2023 16:00:00 32.312 21.8 85.829 3.8196 16-Jan-2023 17:00:00 38.172 25.6 86.509 2.4975 16-Jan-2023 18:00:00 42.949 34.4 87.315 1.4005 16-Jan-2023 19:00:00 46.67 40.3 87.347 0.57429 16-Jan-2023 20:00:00 48.916 43.5 80.602 0.0084591 16-Jan-2023 21:00:00 53.266 47.8 80.225 -0.74979 16-Jan-2023 22:00:00 55.718 56.3 81.496 -1.2985 16-Jan-2023 23:00:00 55.289 54.3 82.048 -1.6424 17-Jan-2023 00:00:00 58.95 64.5 87.314 -1.5495 17-Jan-2023 01:00:00 59.48 66.4 91.808 -1.9128 17-Jan-2023 02:00:00 60.96 65.5 93.916 -2.2129 17-Jan-2023 03:00:00 58.813 62.3 95.008 -2.4581 17-Jan-2023 04:00:00 59.082 59.2 95.457 -2.5598
figure
plot3(T1.Humidity, T1.Temperature, T1.REF)
grid on
xlabel('Humidity')
ylabel('Temperature')
zlabel('REF')
figure
plot(T1.Time, T1{:,[4 5 3]})
grid
legend('Humidity','Temperature','REF', 'Location','best')
SSData = iddata(T1{:,3}, T1{:,[4 5]}, 1, 'TimeUnit','hours')
SSData = Time domain data set with 5305 samples. Sample time: 1 hours Outputs Unit (if specified) y1 Inputs Unit (if specified) u1 u2
sys = ssest(SSData, 8)
sys = Continuous-time identified state-space model: dx/dt = A x(t) + B u(t) + K e(t) y(t) = C x(t) + D u(t) + e(t) A = x1 x2 x3 x4 x5 x6 x7 x8 x1 -0.1013 0.032 0.2238 -0.186 0.2464 -0.2537 0.4956 -0.5676 x2 0.01809 -0.02022 -0.4512 0.1267 -0.05051 0.1583 -0.3409 0.3828 x3 -0.0647 0.4163 -0.08732 0.3096 -0.3656 0.101 -0.2997 0.3619 x4 -0.006939 -0.02292 -0.1742 -0.05125 0.8283 -0.1018 0.2725 -0.1963 x5 -0.03864 0.02864 0.06362 -0.2127 0.2571 0.8126 -2.245 1.823 x6 -0.002079 0.0209 0.05216 -0.02102 0.2297 0.4657 -2.605 -0.1297 x7 -0.03093 0.0767 -0.04126 -0.05281 1.005 2.616 -0.9631 0.827 x8 -0.002164 -0.02393 -0.04675 0.01628 -0.1768 0.283 0.1163 -0.2318 B = u1 u2 x1 -9.759e-05 -1.763e-05 x2 5.337e-05 -0.0003011 x3 0.0001201 0.0002809 x4 -0.0001012 -0.0003922 x5 0.0002635 0.001872 x6 0.0002405 0.001803 x7 0.0004984 0.001369 x8 6.601e-05 -8.003e-05 C = x1 x2 x3 x4 x5 x6 x7 x8 y1 -789.5 46.67 -29.02 -27.38 -7.18 -11.43 -0.5373 8.824 D = u1 u2 y1 0 0 K = y1 x1 -0.001253 x2 0.0001764 x3 -5.457e-05 x4 -6.103e-05 x5 -2.063e-05 x6 -5.142e-05 x7 -8.944e-05 x8 3.302e-05 Parameterization: FREE form (all coefficients in A, B, C free). Feedthrough: none Disturbance component: estimate Number of free coefficients: 96 Use "idssdata", "getpvec", "getcov" for parameters and their uncertainties. Status: Estimated using SSEST on time domain data "SSData". Fit to estimation data: 64.69% (prediction focus) FPE: 21.82, MSE: 21.56
figure
compare(SSData, sys)
grid
.
Thank you; I will review the suggested documents.
In terms of humidity, its effects differ from temperature as they are not immediate but may take a few hours to manifest. This delay depends on whether the humidity is increasing or decreasing and at what intensity. Consequently, incorporating humidity into the standard model doesn't significantly impact, based on my experience. Visually comparing the rise and fall of humidity over a span of several hours reveals a correlation. To illustrate this more clearly, I have attached a screenshot of the data collected over a few days.
I am not certain that this analysis adds much, however it might be of interest —
LD = load('REG_Model_Data.mat');
T1 = LD.cleaned_timetable_REG
T1 = 5305×5 table
Time Predicted_DATA REF Humidity Temperature ____________________ ______________ ____ ________ ___________ 16-Jan-2023 13:00:00 29.283 18.2 84.169 6.0208 16-Jan-2023 14:00:00 25.142 23.9 81.138 5.8618 16-Jan-2023 15:00:00 26.77 22.3 82.871 5.2763 16-Jan-2023 16:00:00 32.312 21.8 85.829 3.8196 16-Jan-2023 17:00:00 38.172 25.6 86.509 2.4975 16-Jan-2023 18:00:00 42.949 34.4 87.315 1.4005 16-Jan-2023 19:00:00 46.67 40.3 87.347 0.57429 16-Jan-2023 20:00:00 48.916 43.5 80.602 0.0084591 16-Jan-2023 21:00:00 53.266 47.8 80.225 -0.74979 16-Jan-2023 22:00:00 55.718 56.3 81.496 -1.2985 16-Jan-2023 23:00:00 55.289 54.3 82.048 -1.6424 17-Jan-2023 00:00:00 58.95 64.5 87.314 -1.5495 17-Jan-2023 01:00:00 59.48 66.4 91.808 -1.9128 17-Jan-2023 02:00:00 60.96 65.5 93.916 -2.2129 17-Jan-2023 03:00:00 58.813 62.3 95.008 -2.4581 17-Jan-2023 04:00:00 59.082 59.2 95.457 -2.5598
mdl = fitlm(T1.Humidity, T1.REF)
mdl =
Linear regression model: y ~ 1 + x1 Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ ___________ (Intercept) -1.4102 0.7868 -1.7923 0.073146 x1 0.26959 0.010118 26.644 6.7542e-147 Number of observations: 5305, Error degrees of freedom: 5303 Root Mean Squared Error: 12.4 R-squared: 0.118, Adjusted R-Squared: 0.118 F-statistic vs. constant model: 710, p-value = 6.75e-147
figure
plot(mdl) % x1 = Humidity, y = REF
grid
[r, lags] = xcorr(T1.Humidity, T1.REF, 'coeff'); % Return Correlation Coefficient
[rmax,idx] = max(r);
Delay = lags(idx)
Delay = 1
figure
stairs(lags, r)
grid
xlabel('Hours')
ylabel('R_{x,y}')
text(lags(idx), rmax, sprintf('\\leftarrow R_{x,y} = %.3f\n \\Delta = %d hours', rmax, Delay), 'Horiz','left', 'Vert','top')
There does appear to be a correlation, and no significant delay.
If these are not the variables you want to examine, substitute the correct variables.
.
Thanks i will give that a go.
For your information this what is mentioned on the datasheet. But as humdity is change constantly, this transient issue is much longer.
My pleasure.
The state space approach should have a way to deal with the transients, however it needs to have them as part of its data in order to model them effectively. That (or some version of it) may do what you want.
Hi
I have cleaned up my original signal (per minute data sample) as mentioned in my previous post.MATLAB Tools to Evaluate and Mitigate Sensor Noise - MATLAB Answers - MATLAB Central (mathworks.com)
For your reference, I have attached both the figure and raw data. The raw data comprises the REF data (hourly), raw data (per minute), and temperature/humidity (hourly). The goal is to correct the raw data so that, in principle, it looks similar to the REF signal; I am not expecting it to match 100%. I am currently evaluating the original 1-minute signal before it undergoes any type of regression modeling. Previously, I had provided the hourly averaged signal, but it was altered by a regression model. This time, it's the raw signal that I aim to correct due to humidity. Please see the plots below. In the plot, the blue represents REF data, black represents the raw data, and cyan represents humidity.
As you can observe, the raw signal decreases as humidity increases, and vice versa: it increases as humidity decreases. The intensity of these changes also correlates with the fluctuations in the raw signal.
To address this issue, I am exploring methods to correct the signal using a compensation function rather than creating a regression model at this stage. What approach would you recommend for developing such a compensation function? Ideally using the REF as abousulte signal, but use it as reference shaping signal
I belive after this correction, we could so a simple linear regression to the REF data for final value.
I do not see any new files. Are they supposed to be here, or simply the plot images? (I just need to know.)
I have been looking at this for a while (interrupted by computer problems), and note that ‘smooth_difference’ lags ‘REF’ by 15 hours, and leads ‘RH’ by 9.
In estimating this using the System Identification Toolbox (which is the only approach I consider appropriate here), what should the input (or inputs) be, and what should the output be? I do not completely understand what you want to do in that respect.
I used retime to resample (interpolate) ‘smooth_diffference’ to the time vectors shared by all the other signals (I checked to be sure that they all share the same time, so I could use it with the other signals), and created a new structure for it
My analysis so far —
LD = load('plotData_needs...orrection.mat')
LD = struct with fields:
plot1: [1×1 struct] plot2: [1×1 struct] plot3: [1×1 struct] plot4: [1×1 struct]
plot1 = LD.plot1
plot1 = struct with fields:
Time: [2112×1 datetime] RH: [2112×1 double] DisplayName: 'Humidity'
plot2 = LD.plot2
plot2 = struct with fields:
Time: [129599×1 datetime] smooth_difference: [129599×1 double] DisplayName: 'RAW Signal WE-Baseline'
plot3 = LD.plot3
plot3 = struct with fields:
Time: [2112×1 datetime] REF: [2112×1 double] DisplayName: 'REF DATA'
plot4 = LD.plot4
plot4 = struct with fields:
Time: [2112×1 datetime] T: [2112×1 double] DisplayName: 'Temperture'
P2 = timetable(plot2.Time, plot2.smooth_difference); % Resample 'plot2' To Common 'Time'
P2r = retime(P2, plot1.Time);
plot2r.Time = P2r.Time;
plot2r.smooth_difference = P2r{:,1} % Create New Structure
plot2r = struct with fields:
Time: [2112×1 datetime] smooth_difference: [2112×1 double]
figure
hold on
plot(plot1.Time, plot1.RH, 'DisplayName',plot1.DisplayName)
plot(plot2r.Time, plot2r.smooth_difference, 'DisplayName',plot2.DisplayName)
plot(plot3.Time, plot3.REF, 'DisplayName',plot3.DisplayName)
plot(plot4.Time, plot4.T, 'DisplayName',plot4.DisplayName)
hold off
grid
legend('Location','northoutside', 'NumColumns',2)
D1 = finddelay(plot1.RH, plot2r.smooth_difference) % Units: Hours
D1 = -9
D2 = finddelay(plot3.REF, plot2r.smooth_difference) % Units: Hours
D2 = 15
D3 = finddelay(plot3.REF, plot1.RH) % Units: Hours
D3 = 1
% maxlag = 24;
% Dn = finddelay(plot3.REF, [plot1.RH; plot2r.smooth_difference; plot4.T], maxlag)
figure
hold on
% plot(plot1.Time, plot1.RH, 'DisplayName',plot1.DisplayName)
plot(plot2r.Time, plot2r.smooth_difference*4, 'DisplayName',plot2.DisplayName)
plot(plot3.Time, plot3.REF, 'DisplayName',plot3.DisplayName)
% plot(plot4.Time, plot4.T, 'DisplayName',plot4.DisplayName)
hold off
grid
xlim(plot1.Time(1)+[0 caldays(14)])
legend('Location','northoutside', 'NumColumns',2)
I believe it to be appropriate to model this as a dynamic system in state space using system identification, based on my understanding of what you are doing (that may not be correct).
.
Thanks, In regards to your question.
I am currently working on a project where my primary objective is to utilize the REF signal as a reference to correct the smooth_difference signal. I have observed that the peaks of these signals rise and fall, and my aim is to align their general trends and shapes. While they do not need to be identical, achieving a close resemblance would be ideal.
The smooth_difference is derived from my sensor’s output, while REF serves as a reference sensor. Both sensors are simultaneously monitoring the same elements. Over a 24-hour period, one can observe fluctuations in humidity, with corresponding changes in the smooth_difference signal. Specifically, as humidity increases, the smooth_difference tends to decrease, and vice versa. By visually inspecting the data, it becomes apparent that the behavior of smooth_difference is inversely related to humidity levels.
My goal is to correct the discrepancies in the amplitude of the smooth_difference signal, particularly during periods of rising and falling humidity. To accurately assess and address this issue, a detailed analysis is required, potentially focusing on small increments of data and considering the rate of change of humidity, rather than its absolute value.
For data alignment purposes, I am considering averaging the smooth_difference signal over one-hour intervals, ensuring that it matches the time samples of the REF signal.
Do you also see this pattern?
I am having a difficult time visualising this. It seems that the input to the system are all four variables collected simultaneously, and thge output is a correction signal that is (‘REF’ - ‘smooth_difference’).
Testing that hypothesis —
LD = load('plotData_needs...orrection.mat')
LD = struct with fields:
plot1: [1×1 struct] plot2: [1×1 struct] plot3: [1×1 struct] plot4: [1×1 struct]
plot1 = LD.plot1
plot1 = struct with fields:
Time: [2112×1 datetime] RH: [2112×1 double] DisplayName: 'Humidity'
plot2 = LD.plot2
plot2 = struct with fields:
Time: [129599×1 datetime] smooth_difference: [129599×1 double] DisplayName: 'RAW Signal WE-Baseline'
plot3 = LD.plot3
plot3 = struct with fields:
Time: [2112×1 datetime] REF: [2112×1 double] DisplayName: 'REF DATA'
plot4 = LD.plot4
plot4 = struct with fields:
Time: [2112×1 datetime] T: [2112×1 double] DisplayName: 'Temperture'
P2 = timetable(plot2.Time, plot2.smooth_difference); % Resample 'plot2' To Common 'Time'
P2r = retime(P2, plot1.Time);
plot2r.Time = P2r.Time;
plot2r.smooth_difference = P2r{:,1} % Create New Structure
plot2r = struct with fields:
Time: [2112×1 datetime] smooth_difference: [2112×1 double]
figure
hold on
plot(plot1.Time, plot1.RH, 'DisplayName',plot1.DisplayName)
plot(plot2r.Time, plot2r.smooth_difference, 'DisplayName',plot2.DisplayName)
plot(plot3.Time, plot3.REF, 'DisplayName',plot3.DisplayName)
plot(plot4.Time, plot4.T, 'DisplayName',plot4.DisplayName)
hold off
grid
legend('Location','northoutside', 'NumColumns',2)
SysData = iddata((plot2r.smooth_difference-plot3.REF), [plot1.RH plot2r.smooth_difference plot3.REF plot4.T], 1, 'TimeUnit','hours')
SysData = Time domain data set with 2112 samples. Sample time: 1 hours Outputs Unit (if specified) y1 Inputs Unit (if specified) u1 u2 u3 u4
SSsys = ssest(SysData, 12)
SSsys = Continuous-time identified state-space model: dx/dt = A x(t) + B u(t) + K e(t) y(t) = C x(t) + D u(t) + e(t) A = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x1 0.5106 -0.1781 1.967 -0.245 -0.5576 -1.439 -0.03361 1.26 -0.00693 -0.2496 0.2962 0.847 x2 -8.465 0.7301 -0.4308 -1.135 -1.487 -1.381 -1.127 1.713 -1.455 0.04107 -0.6515 2.224 x3 -14.24 1.996 -7.355 0.05811 0.7366 0.6082 -0.4565 -1.35 -1.55 1.612 -1.279 0.004257 x4 -12.51 2.84 -7.793 -1.042 1.185 -0.4785 -0.6123 0.7043 -1.058 1.428 -0.5808 1.231 x5 -7.215 0.4767 0.7689 -1.871 -1.207 1.017 -1.271 0.8841 -0.3676 -0.1248 -0.2788 1.105 x6 7.603 -5.861 0.2214 -0.6416 -3.373 -3.481 -0.2554 2.188 0.838 1.164 1.457 0.7437 x7 -0.7314 1.012 1.612 0.3901 -0.8523 -1.465 -0.4914 1.613 -0.8073 -0.5706 -0.7314 0.24 x8 -8.92 2.512 -2.153 -0.7739 -0.4705 0.5273 -1.402 -0.3049 -1.267 1.061 -0.6433 0.132 x9 0.9532 4.903 4.468 -0.01212 -1.665 -0.008972 0.4311 1.978 -0.4806 -2.714 -0.2778 0.2569 x10 14.27 -1.472 6.738 0.6501 1.599 1.075 2.017 -1.492 3.412 -1.762 1.585 -2.498 x11 4.806 1.286 8.059 1.536 1.223 0.5706 1.755 -0.03797 1.173 -2.143 0.07518 -1.906 x12 4.752 -1.638 3.204 1.102 1.532 1.441 1.512 -0.7283 1.423 -0.6985 0.6571 -1.974 B = u1 u2 u3 u4 x1 -2.647e+13 9.436e+13 2.432e+12 -3.549e+13 x2 -3.951e+13 1.797e+14 7.539e+12 -3.443e+13 x3 -2.706e+13 1.581e+14 1.199e+12 -1.967e+12 x4 -6.722e+13 3.155e+14 -1.277e+13 -3.144e+13 x5 -1.247e+09 -3.182e+13 3.339e+12 3.047e+12 x6 -1.203e+14 5.267e+14 -4.952e+12 -1.563e+14 x7 1.129e+13 -6.149e+13 1.477e+13 -1.052e+13 x8 -1.674e+13 5.739e+12 5.641e+12 -5.855e+13 x9 9.862e+13 -3.782e+14 1.003e+12 1.09e+14 x10 6.192e+13 -3.121e+14 -8.459e+12 7.024e+13 x11 1.032e+14 -4.889e+14 1.467e+13 9.936e+13 x12 5.611e+13 -2.488e+14 3.851e+12 7.662e+13 C = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 y1 -8.233e-14 -2.502e-13 1.17e-14 -4.361e-14 -9.762e-14 1.047e-13 -8.7e-14 -3.142e-14 1.083e-14 4.99e-14 -3.908e-14 3.084e-14 D = u1 u2 u3 u4 y1 0 0 0 0 K = y1 x1 -8.228e+11 x2 2.444e+12 x3 3.085e+12 x4 2.816e+11 x5 1.674e+12 x6 -4.18e+12 x7 7.485e+11 x8 2.66e+12 x9 1.516e+12 x10 -4.087e+12 x11 2.65e+12 x12 3.619e+11 Parameterization: FREE form (all coefficients in A, B, C free). Feedthrough: none Disturbance component: estimate Number of free coefficients: 216 Use "idssdata", "getpvec", "getcov" for parameters and their uncertainties. Status: Estimated using SSEST on time domain data "SysData". Fit to estimation data: 48.47% (prediction focus) FPE: 15.83, MSE: 14.79
figure
compare(SysData, SSsys)
grid
Tsim = 1:height(plot1.Time);
correction_vector_output = lsim(SSsys, [plot1.RH plot2r.smooth_difference plot3.REF plot4.T], Tsim);
smooth_difference_corrected = plot3.REF + correction_vector_output;
figure
plot(plot2r.Time, plot2r.smooth_difference, 'DisplayName','Original')
hold on
plot(plot2r.Time, smooth_difference_corrected, 'DisplayName','Corrected')
plot(plot3.Time, plot3.REF, 'DisplayName','REF')
hold off
grid
legend('Location','best')
figure
plot(plot2r.Time, plot2r.smooth_difference, 'DisplayName','Original')
hold on
plot(plot2r.Time, smooth_difference_corrected, 'DisplayName','Corrected')
plot(plot3.Time, plot3.REF, 'DisplayName','REF')
hold off
grid
legend('Location','best')
xlim([0 caldays(14)]+plot3.Time(1))
This actually appears to calculate the correction signal ‘(smooth_difference - REF)’ reasonably well. To get the corrected value of ‘smooth_difference’ add the ‘REF’ vector to it —
syms sd REF
correction_vector_output = sd - REF
correction_vector_output = 
smooth_difference_corrected = REF + correction_vector_output
smooth_difference_corrected = 
sd
I was hoping that the corrected signal would come out closer to ‘REF’ than to the original signal. I may have missed something in your description, so experiment with this approach, since you understand the problem much better than I do.
.
Please see the plot below. This is the trend I am referring to.
This is a zoomed-in view.
You can observe that the reference signal (in blue) is fairly flat. Then, there's the humidity signal (in cyan), which rises and falls throughout the day, affecting the smooth_difference signal. You'll notice that the smooth_difference signal increases its amplitude during the periods when the humidity drops, and does the opposite when it rises.
This is the impact of humidity. I have also included temperature data in case it provides any additional insights, but temperature can be ignored for now.
Additionally, I will review the steps you outlined to see if they help address the issue.
Is this somthing we would need to correct based in the time stamps of the peaks or is a correction function which could be used? This is a signal fluctuation issue
It is difficult for me to understandconsider how to approach this. I still believe that a dynamic system approach is appropriate, however I am just not certain how to construct it. I do not have a clear understanding of what the inputs and outputs should be, since I do not understand the system you want to model.
Well, my inputs will be my Predicted_DATA signal and Humidity. The ref signal is simply there to provide a bit of visual guidance. The Predicted_DATA fluctuates when humidity changes or when the rate of humidity change varies, and this is what I am trying to correct.
Is it possible to just analyze all the rising and falling peaks, calculate the rate of change between the key points, and then adjust a block of samples from Predicted_DATA to be either multiplied or divided by a factor, so it proportionally compensates for what humidity has done?
I had to run this offline because it takes longer than the allotted 55 seconds (it needs about 4½ times that).
load('REG_Model_Data.mat')
cleaned_timetable_REG
% NrMissing = nnz(ismissing(cleaned_timetable_REG))
% cleaned_timetable_REG = fillmissing(cleaned_timetable_REG, 'nearest')
cleaned_timetable_REG_TT = table2timetable(cleaned_timetable_REG);
cleaned_timetable_REG_TT = retime(cleaned_timetable_REG_TT, 'hourly');
cleaned_timetable_REG_TTs = cleaned_timetable_REG_TT(:,[1 3 4 2])
Data = iddata(cleaned_timetable_REG_TTs)
Data = misdata(Data)
SSsys = ssest(Data, 8)
figure
compare(Data, SSsys)
grid
It produces this system:
SSsys =
Continuous-time identified state-space model:
dx/dt = A x(t) + B u(t) + K e(t)
y(t) = C x(t) + D u(t) + e(t)
A =
x1 x2 x3 x4 x5 x6 x7 x8
x1 -0.09591 0.1753 0.1319 0.1004 -0.07117 -0.1091 -0.01476 -0.1967
x2 0.09113 -0.1586 -0.2084 -0.09554 0.1261 0.1156 0.02182 0.2023
x3 -0.1483 0.2734 -0.06624 -0.4684 0.2209 0.154 0.02348 0.1978
x4 0.06936 -0.09895 0.4544 -0.08761 0.2892 0.1671 0.06083 0.2402
x5 0.04081 -0.06385 -0.0145 -0.02059 -0.178 -0.9404 0.07352 -1.255
x6 0.04111 -0.04807 -0.141 -0.1098 0.7128 -0.1866 -1.424 -0.09405
x7 0.01679 -0.01909 0.01431 -0.03315 -0.04821 1.374 0.005255 -1.766
x8 0.02773 -0.04175 -0.09524 -0.08299 0.7382 -0.4315 1.609 -0.3404
B =
Predicted_DA Humidity Temperature
x1 -4.047e-05 3.267e-05 0.0001242
x2 3.706e-05 -2.963e-05 -0.0001252
x3 4.425e-05 -8.677e-06 5.426e-05
x4 6.102e-05 -2.236e-05 5.346e-05
x5 2.801e-05 -1.11e-05 -2.314e-07
x6 -6.455e-06 3.922e-06 1.297e-05
x7 4.199e-06 -1.922e-06 -1.146e-06
x8 -8.82e-07 1.541e-06 1.093e-05
C =
x1 x2 x3 x4 x5 x6 x7 x8
REF -1751 160.9 -33.37 43.05 20.1 10.11 9.504 2.154
D =
Predicted_DA Humidity Temperature
REF 0 0 0
K =
REF
x1 -0.0005337
x2 0.0002469
x3 -0.0001914
x4 0.0002226
x5 0.0001757
x6 5.587e-06
x7 0.0001112
x8 4.167e-05
Parameterization:
FREE form (all coefficients in A, B, C free).
Feedthrough: none
Disturbance component: estimate
Number of free coefficients: 104
Use "idssdata", "getpvec", "getcov" for parameters and their uncertainties.
Status:
Estimated using SSEST on time domain data "Data".
Fit to estimation data: 87.19% (prediction focus)
FPE: 19.05, MSE: 18.78
Model Properties
Elapsed time is 231.098728 seconds.
The inputs are:
Predicted_DATA
Humidity
Temperature
and the output is:
REF
The order 8 system does a reasonably accurate approximation.
What would you want from a model such as this?
.
I will give this a try tomorrow. Will this process output some type of model or coefficients so that I can use it to predict future sensor data?
It will output something approximating ‘REF’ since that is how I designed it. My impression is that is what you want, given those inputs. It does not output a compensation signal, although I attempted exactly that earlier, although with ‘REF’ also as an input.
I can experiment with the same sort of difference output however not with ‘REF’ as an input if that makes sense. The ‘Predicted_DATA’ would still be an input however, since that is what you specified. The output would be the difference signal vector, as previously. The only difference from the previous approach would be that ‘REF’ would not be an input.
Well the in future REF will not be there, would it be able to predict or correct the Predicted_DATA signal based on the additional inputs of Humidity and Temperature? Does this output some form of coeffients?
It can output a vector of differences, similar to the approach in my earlier Comment. That approach models the time-related dynamics of the system, including the transients.
If the ranges of all the variables are accounted for in the data available here, and if ‘REF’ will not always be available, then the regression approach might be the best option. The next step would be to determine the relationships abetween the variables. If you want to model them all as linear, then that would be straightforward (multiple linear regression) and would return coefficients for each variable, for example —
If the relationships were nonlinear (and that depends on the physics and chemistry involved, and could be a polynomial relation or a more complicated functional relation), that would have to be modeled for each of them and combined into a single function. If you have a mathematical description of these relationships, estimating the parameters would be straightforward. Those models and parameters could then be incorporated into a full model and given the inputs, would produce a single output that would (within a specific error tolerance that could also be estimated and modeled), be the result. This approach might not model the transients well, however it could model a steady-state system.
Thanks I will be expirment with your code. What i found it that using Regression Leaner App with GPR model is if i was to convert my Humidity data as Rate differnce as shown in the code below, it gives a little more insight into the effect Humditity has. I aim to feed this data also into your example.
timeIntervals = [1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]
% Ensure the timetable is sorted in ascending order by time
temptime_table = sortrows(temptime_table);
% Loop through each time interval and calculate the rate of change
for i = 1:length(timeIntervals)
interval = timeIntervals(i);
% Initialize a column for the rate of change
rateChange = NaN(height(temptime_table), 1);
% Calculate the rate of change for each row
for j = 1:height(temptime_table)
% Find the time point exactly 'interval' hours before the current time point
targetTime = temptime_table.Time(j) - hours(interval);
% Find the closest previous data point to the target time
previousIndex = find(temptime_table.Time <= targetTime, 1, 'last');
if ~isempty(previousIndex) && all(~isnan(temptime_table.Humidity([previousIndex, j])))
rateChange(j) = (temptime_table.Humidity(j) - temptime_table.Humidity(previousIndex)) / interval;
end
end
% Add the new column to the timetable
new_column_name = sprintf('HumidityRate_%dhr', interval);
temptime_table.(new_column_name) = rateChange;
end
It might also be possible to use the gradient function with each of the associated variables as an additional input (covariate). It computes the numeric derivative (in this instance with respect to time), however like all high-pass filters (and the gradient or derivative is one such), it accentuates the noise, so noise has to be minimised first by smoothing the signal. (I prefer using the sgolayfilt function for this, using a 3-order polynomial and varying the ‘framelen’ until I get the result I want. The smoothdata function is also an option.)
If the data are not regularly sampled or the sampling interval is not known, compute the derivative as:
dydt = gradient(y) ./ gradient(t);
(assuming vector arguments of the same dimensions) and proceed from there.
Plotting the variables and their derivatives as functions of time will demonstrate if this is a suitable option.
It might also be appropriate to calculate and plot the derivatives of the viariables with respect to each other. This might provide insight into their relationships, even if those results are not included in the model.
.

Sign in to comment.

More Answers (0)

Products

Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!