Indexing geochemical data arrays with different numbers of elements
2 views (last 30 days)
Show older comments
I have a table of data collected from an instrument that makes 6 measurements for each sample. At the end of the analysis I'm left with a CSV file containing 6 rows of data for every sample. For example, if I analyze 100 samples, I have a CSV file with 600 rows. I have written a code to process the data, and I use only the last three measurements (rows 4-6 of the array 'injection') for each sample. Here's how I read the data and create the arrays
T = readtable("data", 'VariableNamingRule','preserve');
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
%select the last three measurements
%use only injections 4-6 for each sample
line = line(injection>3);
d18O_raw = d18O_raw(injection>3);
port = port(injection>3);
injection = injection(injection>3);
I average the three measurements for each sample so I am left with one measurement per variable for sample. Importantly, I also reshape the "port" variable which helps me to identify each sample (so I match up each variable with the corresponding sample later).
d18O_raw = reshape(d18O_raw, [3, numel(line)/3]);
average_d18O = transpose(mean(d18O_raw));
port_reshaped = port(1:3:end,:);
Here's where my issue arises. Sometimes, the machine has an error and only measures a sample 5 times instead of 6 times. In the sample data included, the first sample has only been measured 5 times, but it could in theory happen at any point in the analysis. Currently I have to manually fix a file (or change my code) if there is a sample that has only been measured 5 times. I want to be able to have my code handle a sample that has EITHER 5 or 6 measurements, automatically select the last 2 or 3 measurements (i.e., always skip the first 3 measurements) and then be able to average either 2 or 3 measurements and index the corresponding ports if there are 2 or 3 samples.
My current way of handling this issue is clunky and doesn't make the script easy to share with others, which is the goal.
Thank you in advance for your help.
2 Comments
Siddharth Bhutiya
on 27 Apr 2023
Star Strider has already answered the question below. But I'll just mention this. For the lines of code that are doing the following:
line = table2array(T(:,1));
Simpler way to just extract the entire variable is to use dot indexing as follows:
line = T.Line;
% OR
line = T.(1);
Accepted Answer
Star Strider
on 26 Apr 2023
Edited: Star Strider
on 26 Apr 2023
This can be done in a relatively straightforward way by first separating the sub-matrices into dindividual cells using the accumarray function, and then using the cellfun function to calculate the mean of elements (4:end) of column 4 where ‘end’ (the length of the column) can be any length.
T = readtable("data", 'VariableNamingRule','preserve')
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
% %select the last three measurements
% %use only injections 4-6 for each sample
% line = line(injection>3);
% d18O_raw = d18O_raw(injection>3);
% port = port(injection>3);
% injection = injection(injection>3);
[G,ID] = findgroups(T.Port); % Use 'Port' To Define The Groups
A = accumarray(G, T{:,1}, [], @(x){T{x,:}}) % Accumulate Sub-Matrices According To 'G'
A{1} % Display Intermediate Results (Optional)
A{end} % Display Intermediate Results (Optional)
Outc = cellfun(@(x)mean(x(4:end,4)), A, 'Unif',0) % Calculate The 'mean' Of Rows 4:end In Each Sub-Matrix
Outn = cell2mat(Outc) % Convert The 'cell' Array To A Numeric Array
% The 'Check' Variable Can Be Deleted, Since It Simply Shows How The Code Works, And Checks The Results
Check = [mean(A{1}(4:end,4)) mean(A{2}(4:end,4)) mean(A{3}(4:end,4)) mean(A{4}(4:end,4)) mean(A{5}(4:end,4)) mean(A{6}(4:end,4)) mean(A{7}(4:end,4)) mean(A{8}(4:end,4)) mean(A{9}(4:end,4))].'
EDIT — (265 Apr 2023 at 21:48)
Changed the second accumarray argument to choose the correct data. (Not catching that earlier.)
.
4 Comments
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!