FILTER BIG DATA SET
4 views (last 30 days)
Show older comments
Dear All,
I have Survey data for six years each year containing 26 variables and more than 2 million rows [2533835x28 table].
I would like to filter the entire dataset by using some entries from the variable (c33) for the values
[1111 1112 1113 1114 1115 1116 1117 1118 1119 1121 1122 1123 1124 1131 1132 1133 1134 1135 1136 1137 1139 1140 1150 1161 1162 1169 1191 1193 1199 1210 1221 1222 1223 1224 1225 1229 1231 1232 1233 1239 1241 1242 1243 1249 1251 1252 1259 1261 1262 1269 1271 1272 1273 1279 1281 1282 1283 1284 1285 1286 1287 1291 1292 1293 1299 1301 1302 1309 1411 1412 1413 1420 1430 1441 1442 1450 1461 1462 1463 1491 1492 1493 1499 1500 1611 1612 1619 1620 1631 1632 1633 1639 1640 1700 2101 2102 2109 2201 2202 2203 2209 2301 2302 2303 2309 2401 2402 3111 3112 3113 3121 3122 3211 3212 3214 3215 3219 3221 3222 3223 3229 ].
Can anyone guide me how to filter the data.
2 Comments
KSSV
on 30 Aug 2024
What do you mean by fitler the data? You may use logical indexing like ==, >, < etc.
Answers (2)
Star Strider
on 30 Aug 2024
Your qquestion is a bit ambiguous.
If you want to match thee elements of the data you posted to elements of your matrix, one option is to use the ismember function (since they all appear to be integers, ii they are actually floating-point numbers instead, use ismembertol wiith a simiiilar calling syntax).
Try something like this —
V = [1111 1112 1113 1114 1115 1116 1117 1118 1119 1121 1122 1123 1124 1131 1132 1133 1134 1135 1136 1137 1139 1140 1150 1161 1162 1169 1191 1193 1199 1210 1221 1222 1223 1224 1225 1229 1231 1232 1233 1239 1241 1242 1243 1249 1251 1252 1259 1261 1262 1269 1271 1272 1273 1279 1281 1282 1283 1284 1285 1286 1287 1291 1292 1293 1299 1301 1302 1309 1411 1412 1413 1420 1430 1441 1442 1450 1461 1462 1463 1491 1492 1493 1499 1500 1611 1612 1619 1620 1631 1632 1633 1639 1640 1700 2101 2102 2109 2201 2202 2203 2209 2301 2302 2303 2309 2401 2402 3111 3112 3113 3121 3122 3211 3212 3214 3215 3219 3221 3222 3223 3229 ];
size(V)
A = array2table(randi([1000 3300], 10, 12)) % Create Data (Matrix Of Random Integers)
Aa = table2array(A);
Lm = ismember(Aa, V) % Logical MAtrix Of Locations
[r,c] = find(Lm); % Return Numeric Indices
rc = [r c] % Row & Column Indices Of Matching Values
.
0 Comments
Subhajyoti
on 30 Aug 2024
Edited: Subhajyoti
on 30 Aug 2024
To filter your dataset in MATLAB based on specific entries in a particular variable, you can use logical indexing.
Here, in the following code, I have generated a dummy data-table, and performed filtering operations on numerical and string data types.
% Create a data table of size 10x4
% Columns x1, x2, x3, x4 of data type string, double, double, boolean
num_rows = 10000000;
t = table;
% random data
x1 = string(randi([1, 10], num_rows, 1));
x2 = randi([1, 10], num_rows, 1);
x3 = randi([1, 10], num_rows, 1);
x4 = randi([0, 1], num_rows, 1);
% assign data to table
t.x1 = x1;
t.x2 = x2;
t.x3 = x3;
t.x4 = logical(x4);
- Use logical indexing to filter data where 'x3' is less than 5:
tic
%---------------------------------------%
filtered_data1 = t(t.x3 < 5, :);
%---------------------------------------%
toc
disp("Time taken to filter data using logical indexing: " + toc + " seconds")
- Use 'ismember' to filter data in 'x1' which are member of given array
tic
%---------------------------------------%
filterValues = ["1", "2", "3", "4", "5"];
filtered_data2 = t(ismember(t.x1, filterValues), :);
%---------------------------------------%
toc
disp("Time taken to filter data using ismember: " + toc + " seconds")
- For complex numeric conditional operations, converting it to array using 'table2array' function can sometime speed-up operations.
c33Array = table2array(t(:,'x3'));
% Check if data-squared is less than 27
filter = c33Array.^2 < 27;
filtered_data5 = t(filter, :);
You may go through the following MathWorks documentation links to learn more about ‘table’ in MATLAB:
- ‘table’: https://www.mathworks.com/help/matlab/ref/table.html
- Access Data in Tables: https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-table.html
- Filtering Elements: https://www.mathworks.com/help/matlab/matlab_prog/find-array-elements-that-meet-a-condition.html
I hope this helps.
0 Comments
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!