Using accumarray to organize large CSV

1 view (last 30 days)
Emily
Emily on 21 Sep 2023
Commented: dpb on 22 Sep 2023
I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.
I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).
I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH
The other two columns, d18O and date are what I need to plot/visualize
Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH
I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data
T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));
data = accumarray(G, T{:,4},[], @(x){T{x,:}});

Accepted Answer

dpb
dpb on 22 Sep 2023
Edited: dpb on 22 Sep 2023
tT=readtable("data1.csv");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
head(tT)
d18O Site Date Depth _____ ____ _____ _____ -1.3 1004 44204 20 -1.9 1004 44263 20 -1.2 1004 44295 20 -1.7 1004 44567 20 -1.5 1004 44235 20 -9.3 1009 44323 50 -7.72 1009 44352 50 -7.07 1009 44445 50
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
G = 6×9 table
Site Depth GroupCount mean_Date median_Date std_Date mean_d18O median_d18O std_d18O ____ _____ __________ _________ ___________ ________ _________ ___________ ________ 1004_20 1004 20 5 44313 44263 146.03 -1.52 -1.5 0.28636 1004_80 1004 80 6 44404 44489 339.01 -8.5533 -8.695 0.30137 1009_50 1009 50 11 44441 44412 158.55 -8.0855 -8.22 1.0068 1012_40 1012 40 3 44381 44263 235.02 -5.81 -4.38 2.6254 1014_20 1014 20 5 43936 43984 142.16 -6.326 -6.22 0.38546 1023_20 1023 20 2 44067 44067 237.59 -4.39 -4.39 0.39598
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');
function h=doit(x,y)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
end
  2 Comments
Emily
Emily on 22 Sep 2023
this produces what I need, thank you!
dpb
dpb on 22 Sep 2023
NOTA BENE that you can get much more creative with the user function...
tT=readtable("data1.csv");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');
function h=doit(x,y,s,d)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
legend(compose('Site %3d Depth %3d',s(1),d(1)))
end

Sign in to comment.

More Answers (1)

Voss
Voss on 21 Sep 2023
Edited: Voss on 21 Sep 2023
"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"
Something like this?
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
d18O Site number Date Depth _____ ___________ _____ _____ -1.3 1004 44204 20 -1.9 1004 44263 20 -1.2 1004 44295 20 -1.7 1004 44567 20 -1.5 1004 44235 20 -9.3 1009 44323 50 -7.72 1009 44352 50 -7.07 1009 44445 50 -9.88 1009 44412 50 -6.79 1009 44264 50 -6.58 1009 44538 50 -8.51 1009 44748 50 -8.34 1009 44652 50 -8.31 1009 44385 50 -8.22 1009 44235 50 -8.22 1009 44501 50 -4.38 1012 44263 40 -4.21 1012 44229 40 -8.84 1012 44652 40 -6.87 1014 43984 20 -6.54 1014 43803 20 -6.22 1014 44102 20 -6.12 1014 43773 20 -5.88 1014 44018 20 -4.67 1023 44235 20 -4.11 1023 43899 20 -8.83 1004 44593 80 -8.74 1004 44385 80 -8.75 1004 44685 80 -8.65 1004 44719 80 -8.24 1004 44204 80 -8.11 1004 43838 80
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary = 6×4 table
Site number Depth d18O Date ___________ _____ _____________ _____________ 1004 20 { 5×1 double} { 5×1 double} 1004 80 { 6×1 double} { 6×1 double} 1009 50 {11×1 double} {11×1 double} 1012 40 { 3×1 double} { 3×1 double} 1014 20 { 5×1 double} { 5×1 double} 1023 20 { 2×1 double} { 2×1 double}
T_summary.Date{1}
ans = 5×1
44204 44263 44295 44567 44235
T_summary.d18O{1}
ans = 5×1
-1.3000 -1.9000 -1.2000 -1.7000 -1.5000
  1 Comment
Voss
Voss on 22 Sep 2023
Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
d18O Site number Date Depth _____ ___________ _____ _____ -1.3 1004 44204 20 -1.9 1004 44263 20 -1.2 1004 44295 20 -1.7 1004 44567 20 -1.5 1004 44235 20 -9.3 1009 44323 50 -7.72 1009 44352 50 -7.07 1009 44445 50 -9.88 1009 44412 50 -6.79 1009 44264 50 -6.58 1009 44538 50 -8.51 1009 44748 50 -8.34 1009 44652 50 -8.31 1009 44385 50 -8.22 1009 44235 50 -8.22 1009 44501 50 -4.38 1012 44263 40 -4.21 1012 44229 40 -8.84 1012 44652 40 -6.87 1014 43984 20 -6.54 1014 43803 20 -6.22 1014 44102 20 -6.12 1014 43773 20 -5.88 1014 44018 20 -4.67 1023 44235 20 -4.11 1023 43899 20 -8.83 1004 44593 80 -8.74 1004 44385 80 -8.75 1004 44685 80 -8.65 1004 44719 80 -8.24 1004 44204 80 -8.11 1004 43838 80
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
d18O Site number Date Depth _____ ___________ _____ _____ -1.3 1004 44204 20 -1.9 1004 44263 20 -1.2 1004 44295 20 -1.7 1004 44567 20 -1.5 1004 44235 20 -8.83 1004 44593 80 -8.74 1004 44385 80 -8.75 1004 44685 80 -8.65 1004 44719 80 -8.24 1004 44204 80 -8.11 1004 43838 80 -9.3 1009 44323 50 -7.72 1009 44352 50 -7.07 1009 44445 50 -9.88 1009 44412 50 -6.79 1009 44264 50 -6.58 1009 44538 50 -8.51 1009 44748 50 -8.34 1009 44652 50 -8.31 1009 44385 50 -8.22 1009 44235 50 -8.22 1009 44501 50 -4.38 1012 44263 40 -4.21 1012 44229 40 -8.84 1012 44652 40 -6.87 1014 43984 20 -6.54 1014 43803 20 -6.22 1014 44102 20 -6.12 1014 43773 20 -5.88 1014 44018 20 -4.67 1023 44235 20 -4.11 1023 43899 20

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!