You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
MATLAB Tools to Evaluate and Mitigate Sensor Noise
12 views (last 30 days)
Show older comments
Hi,
I possess a dataset representing the output of an electronic NO2 chemical sensor, which I've compared against reference data. I've been utilizing the Regression Learning App in MATLAB to devise an appropriate model for predictions, and it has proven effective. However, like many electronic sensors, there's noticeable noise, particularly evident when the gas concentration is minimal, e.g., 10-15 ppb.
I'm eager to further enhance my overall process beyond just using the model from the Regression Learning App. Are there any additional tools or features in MATLAB that could help me evaluate and possibly mitigate this noise when concentration levels are low? I'm open to specific tools or any machine learning tools that might discern patterns in the data, subsequently refining my results.
5 Comments
Star Strider
on 15 Oct 2023
It could help if you share a representative data set and an example of the result you want from processing it. Are you fitting a specific mathematical model to it?
Also, what is the nature of the noise? Is it thermal (‘pink’) noise, electrical interference (broadband or band-limited), or something else?
Dharmesh Joshi
on 18 Oct 2023
Thanks. Would you prefer a year's worth of data, or just a few months? Can I provide the data as a simple attachment, or does the code need to be executed directly on this forum?
The issue I'm facing involves raw sensor data that is recorded every minute. This data is then averaged to an hourly rate before being processed by a regression model. However, due to the presence of noise, characteristics like peaks get averaged out. As a result, this part of the information is lost before the data reaches the regression model.
I have tried removing outliers, but this does not really solve the issue, so woundering if some signal processing might be needed,
Star Strider
on 18 Oct 2023
If resampling the data removes some of the necessary information, then it is likely best not to resamjple it before processing it.
One effective way of dealing with noise while retaining the underlying signal is to use the sgolayfilt function. There is not any one good way of dealing with noise, however that would be my first choice, especially since I do not know the details.
Another way of reducing the noise while keeping the details is to create specific segments of the data called ensembles, ideally centred on the peaks, and then processing them individually or by averaging them first and then processing them.
Since I am not certain what you want from the data, that is all I can suggest at present.
Dharmesh Joshi
on 19 Oct 2023
Thanks
I will give that a go. For your reference i have attached the data.
There is the raw data, hour average, and ref data.
I need to improve the hour average data to include the peaks. Using the ref data, you can get an idea where the peaks would be in the raw data.
Accepted Answer
Star Strider
on 19 Oct 2023
I am not certain that I understand the sort of processing you are doing. Some of the signal noise is broadband, however much of it can be eliminiated either using a Savitzky-Golay filter or a simple lowpass filter.
I did both here —
LD = load('dataFilter.mat')
LD = struct with fields:
hour_average: [5554×1 timetable]
raw_min_data: [333239×1 timetable]
refdata: [5555×1 timetable]
rmd = LD.raw_min_data
rmd = 333239×1 timetable
Time Var1
____________________ _______
16-Jan-2023 13:02:00 2.9424
16-Jan-2023 13:03:00 2.1115
16-Jan-2023 13:04:00 1.7951
16-Jan-2023 13:05:00 2.072
16-Jan-2023 13:06:00 2.9067
16-Jan-2023 13:07:00 1.5364
16-Jan-2023 13:08:00 2.0293
16-Jan-2023 13:09:00 1.6482
16-Jan-2023 13:10:00 1.6591
16-Jan-2023 13:11:00 5.8567
16-Jan-2023 13:12:00 0.32883
16-Jan-2023 13:13:00 0.62192
16-Jan-2023 13:14:00 0.915
16-Jan-2023 13:15:00 3.7207
16-Jan-2023 13:16:00 0.5806
16-Jan-2023 13:17:00 0.2533
Ts = mean(diff(rmd.Time))
Ts = duration
00:01:00
[h,m,s] = hms(Ts);
Fs = 1/60; % Hz
Fn = Fs/2;
CheckSampling = nnz(diff(rmd.Time) > minutes(1))
CheckSampling = 0
sgf = sgolayfilt(rmd.Var1, 3, 501);
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmd.Time, sgf, '-r', 'DisplayName','Filtered Signal (‘sgolayfilt’)')
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
L = size(rmd,1);
NFFT = 2^nextpow2(L);
FTs = fft((rmd.Var1-mean(rmd.Var1)).*hann(L), NFFT)/L;
Fv = linspace(0, 1, NFFT/2+1).'*Fn;
Iv = 1:numel(Fv);
[pks,locs] = findpeaks(abs(FTs(Iv))*2, 'MinPeakProminence', 0.2);
Major_Peaks = table(Fv(locs), pks, 1./(Fv(locs)*3600*24), 'VariableNames',{'Frequency (Hz)','Peak Amplitudes','Period (Days/Cycle)'})
Major_Peaks = 3×3 table
Frequency (Hz) Peak Amplitudes Period (Days/Cycle)
______________ _______________ ___________________
6.3578e-08 0.73208 182.04
1.1603e-05 0.37121 0.9975
2.3142e-05 0.30513 0.50012
figure
plot(Fv, abs(FTs(Iv))*2)
hold on
plot(Fv(locs), abs(FTs(locs))*2, '^r')
hold off
grid
xlim([0 0.00005])
xlabel('Frequency (Hz)')
ylabel('Magnitude)')
xline(3E-5, '--k', 'Lowpass Filter Cutoff Frequency')
Var1_filt = lowpass(rmd.Var1, 3E-5, Fs, 'ImpulseResponse','iir');
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmd.Time, Var1_filt, '-r', 'DisplayName','Filtered Signal (‘lowpass’)')
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
Ths sort of processing you do depends on the details you want to model. I hope my analysis provides some insight.
.
22 Comments
Dharmesh Joshi
on 20 Oct 2023
The results, produced by the filters, are similar to the results if was to average the day our an hour.
What i am trying to achive is gain some of the peak characterstics, which are shown in the ref data.
Star Strider
on 20 Oct 2023
The ‘ref’ signal has fewer details and a different scale than the original data. I calculated the Fourier transform earlier and got significant details from it with respect to the signal characteristics.
What do you want to get from the data?
LD = load('dataFilter.mat')
LD = struct with fields:
hour_average: [5554×1 timetable]
raw_min_data: [333239×1 timetable]
refdata: [5555×1 timetable]
rmd = LD.raw_min_data
rmd = 333239×1 timetable
Time Var1
____________________ _______
16-Jan-2023 13:02:00 2.9424
16-Jan-2023 13:03:00 2.1115
16-Jan-2023 13:04:00 1.7951
16-Jan-2023 13:05:00 2.072
16-Jan-2023 13:06:00 2.9067
16-Jan-2023 13:07:00 1.5364
16-Jan-2023 13:08:00 2.0293
16-Jan-2023 13:09:00 1.6482
16-Jan-2023 13:10:00 1.6591
16-Jan-2023 13:11:00 5.8567
16-Jan-2023 13:12:00 0.32883
16-Jan-2023 13:13:00 0.62192
16-Jan-2023 13:14:00 0.915
16-Jan-2023 13:15:00 3.7207
16-Jan-2023 13:16:00 0.5806
16-Jan-2023 13:17:00 0.2533
rd = LD.refdata
rd = 5555×1 timetable
refTime refVar1
____________________ _______
16-Jan-2023 13:00:00 18.8
16-Jan-2023 14:00:00 24.5
16-Jan-2023 15:00:00 22.9
16-Jan-2023 16:00:00 22.5
16-Jan-2023 17:00:00 26.3
16-Jan-2023 18:00:00 35.2
16-Jan-2023 19:00:00 41.2
16-Jan-2023 20:00:00 44.4
16-Jan-2023 21:00:00 48.8
16-Jan-2023 22:00:00 57.5
16-Jan-2023 23:00:00 55.4
17-Jan-2023 00:00:00 65.8
17-Jan-2023 01:00:00 67.6
17-Jan-2023 02:00:00 66.6
17-Jan-2023 03:00:00 63.5
17-Jan-2023 04:00:00 60.3
figure
plot(rd.refTime, rd.refVar1)
grid
xlim([rd.refTime(1) rd.refTime(1)+caldays(7)])
Ts = mean(diff(rmd.Time))
Ts = duration
00:01:00
[h,m,s] = hms(Ts);
Fs = 1/60; % Hz
Fn = Fs/2;
CheckSampling = nnz(diff(rmd.Time) > minutes(1))
CheckSampling = 0
sgf = sgolayfilt(rmd.Var1, 3, 501);
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmd.Time, sgf, '-r', 'DisplayName','Filtered Signal (‘sgolayfilt’)')
plot(rd.refTime, rd.refVar1, '-g', 'DisplayName','‘ref’ Signal')
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
figure
yyaxis left
plot(rmd.Time, sgf, '-r', 'DisplayName','Filtered Signal (‘sgolayfilt’)')
yyaxis right
plot(rd.refTime, rd.refVar1, '-g', 'DisplayName','‘ref’ Signal')
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
legend('Location','best')
return % STOP HERE
L = size(rmd,1);
NFFT = 2^nextpow2(L);
FTs = fft((rmd.Var1-mean(rmd.Var1)).*hann(L), NFFT)/L;
Fv = linspace(0, 1, NFFT/2+1).'*Fn;
Iv = 1:numel(Fv);
[pks,locs] = findpeaks(abs(FTs(Iv))*2, 'MinPeakProminence', 0.2);
Major_Peaks = table(Fv(locs), pks, 1./(Fv(locs)*3600*24), 'VariableNames',{'Frequency (Hz)','Peak Amplitudes','Period (Days/Cycle)'})
figure
plot(Fv, abs(FTs(Iv))*2)
hold on
plot(Fv(locs), abs(FTs(locs))*2, '^r')
hold off
grid
xlim([0 0.00005])
xlabel('Frequency (Hz)')
ylabel('Magnitude)')
xline(3E-5, '--k', 'Lowpass Filter Cutoff Frequency')
Var1_filt = lowpass(rmd.Var1, 3E-5, Fs, 'ImpulseResponse','iir');
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmd.Time, Var1_filt, '-r', 'DisplayName','Filtered Signal (‘lowpass’)')
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
.
Dharmesh Joshi
on 20 Oct 2023
Thanks, I will read and try the script in detail.
In regard to your question,
I have sensor, which i am trying to replicate its output with some form of correction( filtering and regession modeling) to the ref data. I am not expecting 100% correlation.
Yes there is a differnce in scale, i normally correct this during my regresion model.
The ref data, is hourly averaged, while my raw day is per min, but eventually my data needs to be averaged hourly as well.
Star Strider
on 20 Oct 2023
The Signal Processing Toolbox resample function could make that easier. I would let it do the ‘heavy lifting’, here dong a simple interpolation and calculating the mean for every hour of the original ‘raw_min_data’ signal.
LD = load('dataFilter.mat')
LD = struct with fields:
hour_average: [5554×1 timetable]
raw_min_data: [333239×1 timetable]
refdata: [5555×1 timetable]
rmd = LD.raw_min_data
rmd = 333239×1 timetable
Time Var1
____________________ _______
16-Jan-2023 13:02:00 2.9424
16-Jan-2023 13:03:00 2.1115
16-Jan-2023 13:04:00 1.7951
16-Jan-2023 13:05:00 2.072
16-Jan-2023 13:06:00 2.9067
16-Jan-2023 13:07:00 1.5364
16-Jan-2023 13:08:00 2.0293
16-Jan-2023 13:09:00 1.6482
16-Jan-2023 13:10:00 1.6591
16-Jan-2023 13:11:00 5.8567
16-Jan-2023 13:12:00 0.32883
16-Jan-2023 13:13:00 0.62192
16-Jan-2023 13:14:00 0.915
16-Jan-2023 13:15:00 3.7207
16-Jan-2023 13:16:00 0.5806
16-Jan-2023 13:17:00 0.2533
rd = LD.refdata
rd = 5555×1 timetable
refTime refVar1
____________________ _______
16-Jan-2023 13:00:00 18.8
16-Jan-2023 14:00:00 24.5
16-Jan-2023 15:00:00 22.9
16-Jan-2023 16:00:00 22.5
16-Jan-2023 17:00:00 26.3
16-Jan-2023 18:00:00 35.2
16-Jan-2023 19:00:00 41.2
16-Jan-2023 20:00:00 44.4
16-Jan-2023 21:00:00 48.8
16-Jan-2023 22:00:00 57.5
16-Jan-2023 23:00:00 55.4
17-Jan-2023 00:00:00 65.8
17-Jan-2023 01:00:00 67.6
17-Jan-2023 02:00:00 66.6
17-Jan-2023 03:00:00 63.5
17-Jan-2023 04:00:00 60.3
ha = LD.hour_average
ha = 5554×1 timetable
Time Var1
____________________ _________
16-Jan-2023 14:00:00 0.90148
16-Jan-2023 15:00:00 -0.020871
16-Jan-2023 16:00:00 0.34794
16-Jan-2023 17:00:00 1.7695
16-Jan-2023 18:00:00 2.5165
16-Jan-2023 19:00:00 3.1385
16-Jan-2023 20:00:00 3.9222
16-Jan-2023 21:00:00 3.9126
16-Jan-2023 22:00:00 4.835
16-Jan-2023 23:00:00 5.3765
17-Jan-2023 00:00:00 5.123
17-Jan-2023 01:00:00 6.2571
17-Jan-2023 02:00:00 6.432
17-Jan-2023 03:00:00 6.5127
17-Jan-2023 04:00:00 6.2456
17-Jan-2023 05:00:00 6.2705
figure
yyaxis left
plot(rd.refTime, rd.refVar1, 'DisplayName','refdata')
yyaxis right
plot(ha.Time, ha.Var1, 'DisplayName','hour\_average')
grid
legend('Location','best')
xlim([rd.refTime(1) rd.refTime(1)+caldays(7)])
Ts1 = mean(diff(rmd.Time))
Ts1 = duration
00:01:00
[h1,m1,s1] = hms(Ts1);
sec1 = m1*60+s1
sec1 = 60
Fs1 = 1/sec1 % Hz
Fs1 = 0.0167
Fn1 = Fs1/2;
Ts2 = mean(diff(rd.refTime))
Ts2 = duration
00:59:59
[h2,m2,s2] = hms(Ts2);
sec2 = m2*60+s2
sec2 = 3.5994e+03
Fs2 = 1/sec2 % Hz
Fs2 = 2.7783e-04
Fn2 = Fs2/2;
CheckSampling = nnz(diff(rmd.Time) > minutes(1))
CheckSampling = 0
sgf = sgolayfilt(rmd.Var1, 3, 501);
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmd.Time, sgf, '-r', 'DisplayName','Filtered Signal (‘sgolayfilt’)')
plot(rd.refTime, rd.refVar1, '-g', 'DisplayName','‘ref’ Signal')
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
figure
yyaxis left
plot(rmd.Time, sgf, '-r', 'DisplayName','Filtered Signal (‘sgolayfilt’)')
yyaxis right
plot(rd.refTime, rd.refVar1, '-g', 'DisplayName','‘ref’ Signal')
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
legend('Location','best')
Size_rmd = size(rmd) % Original Table
Size_rmd = 1×2
333239 1
rmdh = retime(rmd, 'hourly'); % Use 'retime' To Interpolate 'raw_min_data' To Hourly
Size_rmdh = size(rmdh)
Size_rmdh = 1×2
5555 1
rmdhm = retime(rmd, 'hourly','mean'); % Use 'retime' To 'mean' 'raw_min_data' To Hourly
Size_rmdhm = size(rmdhm)
Size_rmdhm = 1×2
5555 1
figure
plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
hold on
plot(rmdh.Time, rmdh.Var1, '-r', 'DisplayName','Hourly Interpolated Signal', 'LineWidth',2)
plot(rmdhm.Time, rmdhm.Var1, '-g', 'DisplayName','Hourly Interpolated & ‘mean’ Signal', 'LineWidth',2)
hold off
grid
xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
xlabel('Time (minutes)')
ylabel('Value')
legend('Location','best')
return % STOP HERE
% L = size(rmd,1);
% NFFT = 2^nextpow2(L);
% FTs = fft((rmd.Var1-mean(rmd.Var1)).*hann(L), NFFT)/L;
% Fv = linspace(0, 1, NFFT/2+1).'*Fn;
% Iv = 1:numel(Fv);
%
% [pks,locs] = findpeaks(abs(FTs(Iv))*2, 'MinPeakProminence', 0.2);
%
% Major_Peaks = table(Fv(locs), pks, 1./(Fv(locs)*3600*24), 'VariableNames',{'Frequency (Hz)','Peak Amplitudes','Period (Days/Cycle)'})
%
% figure
% plot(Fv, abs(FTs(Iv))*2)
% hold on
% plot(Fv(locs), abs(FTs(locs))*2, '^r')
% hold off
% grid
% xlim([0 0.00005])
% xlabel('Frequency (Hz)')
% ylabel('Magnitude)')
% xline(3E-5, '--k', 'Lowpass Filter Cutoff Frequency')
%
% Var1_filt = lowpass(rmd.Var1, 3E-5, Fs, 'ImpulseResponse','iir');
%
% figure
% plot(rmd.Time, rmd.Var1, 'DisplayName','Original Signal')
% hold on
% plot(rmd.Time, Var1_filt, '-r', 'DisplayName','Filtered Signal (‘lowpass’)')
% hold off
% grid
% xlim([rmd.Time(1) rmd.Time(1)+caldays(7)])
% xlabel('Time (minutes)')
% ylabel('Value')
% legend('Location','best')
I believe that the retime results are the most accurate and representative ways to interpolate, and if desired, average the data. I am still not certain what you want to do with the data, however the retime results would likely be the best options to use to compare with the hourly-acquired data.
.
Dharmesh Joshi
on 20 Oct 2023
Thanks, I've been trying your code; it has provided additional insight. I did not want to open a new case, so I am asking here. When we run the following code:
plot(rd.refTime, rd.refVar1, 'DisplayName','refdata')
yyaxis right
plot(ha.Time, ha.Var1, 'DisplayName','hour\_average')
Generally, even though the data has two different scales, there is a common trend. However, you can notice that over time, ha.Var1 is drifting away. What I mean by this is that the value is much higher as time progresses, which needs to be corrected. Is there a MATLAB function to correct this drift, or can we produce a linear or polynomial correction factor? When you compress the figure a little, you do notice the drift, as shown below. For the purpose of demonstration, I pushed the second plot a little higher so you can see it more clearly.
Star Strider
on 20 Oct 2023
I did not use ‘refdata’ because it clearly did not match the original data. Instead, I used retime to create ‘rmdh’ and ‘rmdhm’.
I do not know how ‘refdata’ was calculated, however it is possible that it is integrating a constant, and that would account for the trend that does not appear in the data produced by retime. One option would be to use the detrend function with ‘refdata’.
I do not see the trend when I plot it, however I encourage you to experiment until you are happy with the result.
Example —
load('dataFilter.mat','refdata');
rd = refdata
rd = 5555×1 timetable
refTime refVar1
____________________ _______
16-Jan-2023 13:00:00 18.8
16-Jan-2023 14:00:00 24.5
16-Jan-2023 15:00:00 22.9
16-Jan-2023 16:00:00 22.5
16-Jan-2023 17:00:00 26.3
16-Jan-2023 18:00:00 35.2
16-Jan-2023 19:00:00 41.2
16-Jan-2023 20:00:00 44.4
16-Jan-2023 21:00:00 48.8
16-Jan-2023 22:00:00 57.5
16-Jan-2023 23:00:00 55.4
17-Jan-2023 00:00:00 65.8
17-Jan-2023 01:00:00 67.6
17-Jan-2023 02:00:00 66.6
17-Jan-2023 03:00:00 63.5
17-Jan-2023 04:00:00 60.3
rd.refVar1 = fillmissing(rd.refVar1,'linear');
refVar1det = detrend(rd.refVar1, 1);
figure
plot(rd.refTime, rd.refVar1, 'b', 'DisplayName','Original')
hold on
plot(rd.refTime, refVar1det, 'r', 'DisplayName','Detrended')
hold off
grid
legend('Location','best')
.
Dharmesh Joshi
on 21 Oct 2023
Thanks, I will give that function a go. The dataset with the drift is hour_average
Star Strider
on 21 Oct 2023
Edited: Star Strider
on 22 Oct 2023
It may need to be recalculated, since the same approach with retime doess not have that problem.
I would use the retime result instead.
EDIT — (22 Oct 2023 at 02:01)
I took another look at this, and the original data (‘rmd’), the derivative data (‘rmdh’) and the hourly-averaged data computed by retime (‘rmdhm’). The trend appears to be real, however it only becomes visible when the data are downsampled. Computing the mean has no obvious effect on the trend.
Performing lilnear regression on the original and derivative data:
B1 = polyfit((0:numel(rmd.Time)-1)/60, rmd.Var1, 1)
B1 =
0.0006 1.6300
rmdh.Var1 = fillmissing(rmdh.Var1, 'linear');
B2 = polyfit((0:numel(rmdh.Time)-1), rmdh.Var1, 1)
B2 =
0.0006 1.5272
B3 = polyfit((0:numel(rmdhm.Time)-1), rmdhm.Var1, 1)
B3 =
0.0006 1.6302
rd.refVar1 = fillmissing(rd.refVar1, 'linear');
B4 = polyfit((0:numel(rd.refTime)-1), rd.refVar1, 1)
B4 =
-0.0048 33.6167
B5 = polyfit((0:numel(ha.Time)-1), ha.Var1, 1)
B5 =
0.0006 1.6229
The ‘refdata’ has problems, however I will leave that to you to resolve. The others appear to be reasonably consistent.
The trend is in the original data and persists in the derivative data, so it is inherent in the data and is nothing to be concerned about. I do not know what you want to do about ‘refdata’ (I do not know what it is or how it was calculated) however the others are ready for analysis.
EDIT — (22 Oct 2023 at 17:52)
I am not certain how you want to compare ‘refdata’ and ‘hourly_average’, however one approach might be to simply regress them against each other (with due allowances for their differeing lengths and incorporated NaN values) —
LD = load('dataFilter.mat')
LD = struct with fields:
hour_average: [5554×1 timetable]
raw_min_data: [333239×1 timetable]
refdata: [5555×1 timetable]
rmd = LD.raw_min_data
rmd = 333239×1 timetable
Time Var1
____________________ _______
16-Jan-2023 13:02:00 2.9424
16-Jan-2023 13:03:00 2.1115
16-Jan-2023 13:04:00 1.7951
16-Jan-2023 13:05:00 2.072
16-Jan-2023 13:06:00 2.9067
16-Jan-2023 13:07:00 1.5364
16-Jan-2023 13:08:00 2.0293
16-Jan-2023 13:09:00 1.6482
16-Jan-2023 13:10:00 1.6591
16-Jan-2023 13:11:00 5.8567
16-Jan-2023 13:12:00 0.32883
16-Jan-2023 13:13:00 0.62192
16-Jan-2023 13:14:00 0.915
16-Jan-2023 13:15:00 3.7207
16-Jan-2023 13:16:00 0.5806
16-Jan-2023 13:17:00 0.2533
rd = LD.refdata
rd = 5555×1 timetable
refTime refVar1
____________________ _______
16-Jan-2023 13:00:00 18.8
16-Jan-2023 14:00:00 24.5
16-Jan-2023 15:00:00 22.9
16-Jan-2023 16:00:00 22.5
16-Jan-2023 17:00:00 26.3
16-Jan-2023 18:00:00 35.2
16-Jan-2023 19:00:00 41.2
16-Jan-2023 20:00:00 44.4
16-Jan-2023 21:00:00 48.8
16-Jan-2023 22:00:00 57.5
16-Jan-2023 23:00:00 55.4
17-Jan-2023 00:00:00 65.8
17-Jan-2023 01:00:00 67.6
17-Jan-2023 02:00:00 66.6
17-Jan-2023 03:00:00 63.5
17-Jan-2023 04:00:00 60.3
ha = LD.hour_average
ha = 5554×1 timetable
Time Var1
____________________ _________
16-Jan-2023 14:00:00 0.90148
16-Jan-2023 15:00:00 -0.020871
16-Jan-2023 16:00:00 0.34794
16-Jan-2023 17:00:00 1.7695
16-Jan-2023 18:00:00 2.5165
16-Jan-2023 19:00:00 3.1385
16-Jan-2023 20:00:00 3.9222
16-Jan-2023 21:00:00 3.9126
16-Jan-2023 22:00:00 4.835
16-Jan-2023 23:00:00 5.3765
17-Jan-2023 00:00:00 5.123
17-Jan-2023 01:00:00 6.2571
17-Jan-2023 02:00:00 6.432
17-Jan-2023 03:00:00 6.5127
17-Jan-2023 04:00:00 6.2456
17-Jan-2023 05:00:00 6.2705
figure
yyaxis left
plot(rd.refTime, rd.refVar1, 'DisplayName','refdata')
yyaxis right
plot(ha.Time, ha.Var1, 'DisplayName','hour\_average')
grid
legend('Location','best')
xlim([rd.refTime(1) rd.refTime(1)+caldays(7)])
rd2 = rd(1:end-1,:);
rd2.refVar1 = fillmissing(rd2.refVar1, 'linear');
% B = [rd2.refVar1 ones(size(rd2.refVar1))] \ ha.Var1
mdl = fitlm(rd2.refVar1, ha.Var1) % Regress 'refdata' ('x') Against 'hour_average' ('y')
mdl =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
________ _________ ______ ___________
(Intercept) 2.2439 0.050464 44.465 0
x1 0.04646 0.0020001 23.229 5.4851e-114
Number of observations: 5554, Error degrees of freedom: 5552
Root Mean Squared Error: 2.23
R-squared: 0.0886, Adjusted R-Squared: 0.0884
F-statistic vs. constant model: 540, p-value = 5.49e-114
figure
plot(mdl)
I am somewhat surprised that the 95% confidence intervals on the fit are as small (narrow) as they are (almost invisible on the plot). If that is the ‘noise’ that is causing problems, I would simply ignore it since it does not appear to significantly affect the linear regression, even though the R² values are near zero.
..
Star Strider
on 22 Oct 2023
I demonstrated several different ways of filtering it.
I am apparently not able to provide the result you want, so I will delete my Answer in a few hours.
Dharmesh Joshi
on 24 Oct 2023
should i open a new post, as I have noticed a correlation with my data error and humdity(rate changing), but its related to this issue. I could post the updated data if need.
Star Strider
on 24 Oct 2023
A new post would probably be preferable.
Please explain in some detail what the data are and what information you want to get from them. Include a link to this thread if doing so is relevant to your new Question.
Dharmesh Joshi
on 24 Oct 2023
After processing my data and establishing the model, I've observed significant discrepancies caused by humidity when compared to the reference dataset. The influence of humidity on the signal isn't strictly linear. Instead, it manifests over a span of several hours, suggesting that a linear gauge might not be the best way to address its effects. I'll be starting a new post to delve deeper into this issue, and will share the link once it's ready.
Dharmesh Joshi
on 24 Oct 2023
Dharmesh Joshi
on 28 Oct 2023
Sorry to ask this question here, but since it relates to noise correction, I thought it might be relevant. Using Matlab, is it possible to create a function or a process that, when we create a scatter plot (e.g., sensor data vs. ref data), removes all the samples until a good correlation is achieved? The aim is to extract all the samples that have a negative effect to see if there is a correlation that can help predict noise.
William Rose
on 28 Oct 2023
Just my opinion - I recommend you post that as a separate question. I suspect you will get some interesting opinions and questions about what you are proposing to do. What theory about your data and the noise in the data do yoou have that will guide this process? If the noise on the y-values or the x-values or both?
Star Strider
on 28 Oct 2023
The only option that comes to mind is the rmoutliers function, with the removal processes as described in the method section (specifically the last two of them). The ThresholdFactor as described in Outlier Detection Options can help ‘tune’ them. The Data Smoothing and Outlier Detection documentation section on describes this in more detail. If you have a model f the process you are modeling to fit the data,
Subtracting the fitted model from the data to get the residuals can also be helpful, since you can then see what the residuals look like, and how best to deal with the most distant of them.. That can also provide information on the model itself, and whether it accounts for the most important trends in the data.
Dharmesh Joshi
on 30 Oct 2023
Thank you, I have removed the outliers and applied some smoothing using the code below. This has cleaned the original signal, and now the effects of humidity are much more apparent. The next step is to explore how this can be corrected using MATLAB functions. My other post is:
Should I continue my questions on that post since it is still the same issue, or would you prefer that I open a new case with the new datasets?
difference = rawdata.iot_no2_we_voltage - new_data;
[outliers_removed, tf] = filloutliers(rawdata.iot_no2_we_voltage, 'linear', 'gesd');
difference_outliers = outliers_removed - new_data;
smooth_difference = smoothdata(difference_outliers,"rlowess",50);
% Count the number of outliers
num_outliers = sum(tf);
Star Strider
on 30 Oct 2023
If it is the same issue, it would likely be best to continue it there. I will watch for it.
Please explain where we are in this effort now, since I have lost track.
Dharmesh Joshi
on 30 Oct 2023
Edited: Dharmesh Joshi
on 30 Oct 2023
sorry i have posted it in my other post
More Answers (0)
See Also
Categories
Find more on AI for Signals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)