Clear Filters
Clear Filters

Indexing geochemical data arrays with different numbers of elements

2 views (last 30 days)
I have a table of data collected from an instrument that makes 6 measurements for each sample. At the end of the analysis I'm left with a CSV file containing 6 rows of data for every sample. For example, if I analyze 100 samples, I have a CSV file with 600 rows. I have written a code to process the data, and I use only the last three measurements (rows 4-6 of the array 'injection') for each sample. Here's how I read the data and create the arrays
T = readtable("data", 'VariableNamingRule','preserve');
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
%select the last three measurements
%use only injections 4-6 for each sample
line = line(injection>3);
d18O_raw = d18O_raw(injection>3);
port = port(injection>3);
injection = injection(injection>3);
I average the three measurements for each sample so I am left with one measurement per variable for sample. Importantly, I also reshape the "port" variable which helps me to identify each sample (so I match up each variable with the corresponding sample later).
d18O_raw = reshape(d18O_raw, [3, numel(line)/3]);
Error using reshape
Size arguments must be real integers.
average_d18O = transpose(mean(d18O_raw));
port_reshaped = port(1:3:end,:);
Here's where my issue arises. Sometimes, the machine has an error and only measures a sample 5 times instead of 6 times. In the sample data included, the first sample has only been measured 5 times, but it could in theory happen at any point in the analysis. Currently I have to manually fix a file (or change my code) if there is a sample that has only been measured 5 times. I want to be able to have my code handle a sample that has EITHER 5 or 6 measurements, automatically select the last 2 or 3 measurements (i.e., always skip the first 3 measurements) and then be able to average either 2 or 3 measurements and index the corresponding ports if there are 2 or 3 samples.
My current way of handling this issue is clunky and doesn't make the script easy to share with others, which is the goal.
Thank you in advance for your help.
  2 Comments
Siddharth Bhutiya
Siddharth Bhutiya on 27 Apr 2023
Star Strider has already answered the question below. But I'll just mention this. For the lines of code that are doing the following:
line = table2array(T(:,1));
Simpler way to just extract the entire variable is to use dot indexing as follows:
line = T.Line;
% OR
line = T.(1);

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 26 Apr 2023
Edited: Star Strider on 26 Apr 2023
This can be done in a relatively straightforward way by first separating the sub-matrices into dindividual cells using the accumarray function, and then using the cellfun function to calculate the mean of elements (4:end) of column 4 where ‘end’ (the length of the column) can be any length.
T = readtable("data", 'VariableNamingRule','preserve')
T = 53×4 table
Line Port Inj Nr d(18_16)Mean ____ ____ ______ ____________ 1 1 1 -39 2 1 2 -39.973 3 1 3 -39.527 4 1 4 -40.579 5 1 5 -40.9 6 2 1 -33.315 7 2 2 -33.008 8 2 3 -32.989 9 2 4 -33.028 10 2 5 -33.03 11 2 6 -33.021 12 3 1 NaN 13 3 2 -10.256 14 3 3 -9.766 15 3 4 -9.658 16 3 5 -9.644
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
% %select the last three measurements
% %use only injections 4-6 for each sample
% line = line(injection>3);
% d18O_raw = d18O_raw(injection>3);
% port = port(injection>3);
% injection = injection(injection>3);
[G,ID] = findgroups(T.Port); % Use 'Port' To Define The Groups
A = accumarray(G, T{:,1}, [], @(x){T{x,:}}) % Accumulate Sub-Matrices According To 'G'
A = 9×1 cell array
{5×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double}
A{1} % Display Intermediate Results (Optional)
ans = 5×4
1.0000 1.0000 1.0000 -39.0000 2.0000 1.0000 2.0000 -39.9730 3.0000 1.0000 3.0000 -39.5270 4.0000 1.0000 4.0000 -40.5790 5.0000 1.0000 5.0000 -40.9000
A{end} % Display Intermediate Results (Optional)
ans = 6×4
48.0000 9.0000 1.0000 -6.6190 49.0000 9.0000 2.0000 -6.5460 50.0000 9.0000 3.0000 -6.5450 51.0000 9.0000 4.0000 -6.6010 52.0000 9.0000 5.0000 -6.6250 53.0000 9.0000 6.0000 -6.6470
Outc = cellfun(@(x)mean(x(4:end,4)), A, 'Unif',0) % Calculate The 'mean' Of Rows 4:end In Each Sub-Matrix
Outc = 9×1 cell array
{[-40.7395]} {[-33.0263]} {[ -9.6240]} {[ 0.3067]} {[ -9.4157]} {[ -7.2603]} {[ -5.6343]} {[ -6.4563]} {[ -6.6243]}
Outn = cell2mat(Outc) % Convert The 'cell' Array To A Numeric Array
Outn = 9×1
-40.7395 -33.0263 -9.6240 0.3067 -9.4157 -7.2603 -5.6343 -6.4563 -6.6243
% The 'Check' Variable Can Be Deleted, Since It Simply Shows How The Code Works, And Checks The Results
Check = [mean(A{1}(4:end,4)) mean(A{2}(4:end,4)) mean(A{3}(4:end,4)) mean(A{4}(4:end,4)) mean(A{5}(4:end,4)) mean(A{6}(4:end,4)) mean(A{7}(4:end,4)) mean(A{8}(4:end,4)) mean(A{9}(4:end,4))].'
Check = 9×1
-40.7395 -33.0263 -9.6240 0.3067 -9.4157 -7.2603 -5.6343 -6.4563 -6.6243
EDIT — (265 Apr 2023 at 21:48)
Changed the second accumarray argument to choose the correct data. (Not catching that earlier.)
.
  4 Comments

Sign in to comment.

More Answers (0)

Categories

Find more on Chemistry in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!