Removing Error Data from Table
10 views (last 30 days)
Show older comments
Dharmesh Joshi
on 4 Dec 2023
Commented: Star Strider
on 7 Dec 2023
Hi,
I am working on a project that involves plotting Temperature vs. Sensor Data on a scatter plot. My goal is to calculate the coefficients of a trend line for future predictions. However, there are instances where my sensor outputs false results. Instead of manually removing this data, I am in search of an outlier function that can automatically eliminate these anomalies and provide coefficients. These coefficients would be useful for defining the boundaries of valid data in future datasets.
I am aware of the standard outlier removal functions. However, the scatter plot I've attached here illustrates my challenge. You can observe the general pattern of valid data, interspersed with some errors. Although there's no trend line in the plot, it's possible to visualize how it would look and identify the inaccurate data points.
My question is: Is there a way to configure an outlier function to take into account both temperature and sensor output in identifying false results? Additionally, can this function reference previous data to aid in this evaluation? I have posted the plots below and also attached data samples of some sensor output for your reference.
0 Comments
Accepted Answer
Star Strider
on 5 Dec 2023
Edited: Star Strider
on 5 Dec 2023
This is definitely an interesting problem, and one I have thought about at other times.
I am not certain that I have completely solved it, since it still may require a bit of interactivity, depending on the data. For these two data sets, it seems to work reasonably well with these parameters. The scatterhist calls are just there as a way to illustrate the data, and are not absolutely necessary.
Try this —
LD = load('scatterData.mat')
% dataTable4 ————————————————————————————————————————————————————————————————————————————————
T4 = LD.dataTable4;
T4 = rmmissing(T4)
VN4 = T4.Properties.VariableNames;
figure
hsh4 = scatterhist(T4{:,1}, T4{:,2}, 'NBins',50, 'Marker','.')
idx4 = clusterdata(table2array(T4), 'MaxClust',15, 'Criterion','distance');
[Uidx,~,uix] = unique(idx4); % Unique Cluster Numbers & Indices
ClustCounts = accumarray(idx4, 1); % Count Cluster Members
T4_ClusterCounts = table(Uidx, ClustCounts) % Results Summary Table
[MaxCounts,MaxCidx] = max((T4_ClusterCounts{:,2})) % Largest Cluster & Index
cm = colormap(turbo(numel(unique(idx4))));
hold on
Ax = hsh4(1);
% get(Ax)
Ax.UserData
scatter(Ax.UserData{2}, Ax.UserData{3}, 10, idx4)
colormap(turbo(numel(unique(idx4))))
hold off
colorbar
Lv4 = idx4 == MaxCidx;
figure
scatter(Ax.UserData{2}(Lv4), Ax.UserData{3}(Lv4), 10, '.')
grid
% dataTable5 ————————————————————————————————————————————————————————————————————————————————
T5 = LD.dataTable5;
T5 = rmmissing(T5)
VN5 = T5.Properties.VariableNames;
figure
hsh5 = scatterhist(T5{:,1}, T5{:,2}, 'NBins',50, 'Marker','.')
idx5 = clusterdata(table2array(T5), 'MaxClust',10, 'Criterion','distance');
[Uidx,~,uix] = unique(idx5);
ClustCounts = accumarray(idx5, 1);
T5_ClusterCounts = table(Uidx, ClustCounts)
[MaxCounts,MaxCidx] = max((T5_ClusterCounts{:,2}))
cm = colormap(turbo(numel(unique(idx5))));
hold on
Ax = hsh5(1);
% get(Ax)
Ax.UserData
scatter(Ax.UserData{2}, Ax.UserData{3}, 10, idx5)
colormap(turbo(numel(unique(idx5))))
hold off
colorbar
Lv5 = idx5 == MaxCidx;
figure
scatter(Ax.UserData{2}(Lv5), Ax.UserData{3}(Lv5), 10, '.')
grid
The clusterdata function works here with 'MaxClust' (the maximum number of clusters) set at 10, however that may not generalise to all data sets, and more clusters may be necessary in those instances. That is where the ‘interactivity’ requirement may arise.
It would likely be possible to create a function from this code (with or without the plots, however I believe the plots are necessary to illustrate the data and how the code works).
I have been working on this for a few hours and will ponder that in the morning.
EDIT — (5 Dec 2023 at 04:35)
Streamlined code, added more explanation text.
.
11 Comments
More Answers (0)
See Also
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!