Building tall table from tall arrays generates error
3 views (last 30 days)
Show older comments
clear
dataFile = 'data.csv';
ds = tabularTextDatastore(dataFile, FileExtensions='.csv');
ds.ReadVariableNames = true;
ds.Delimiter = ',';
ds.SelectedVariableNames = ["hash", "count"];
ds.SelectedFormats = {'%s', '%f'};
data = tall(ds);
[g, THash] = findgroups(data.hash);
TCount = splitapply(@(x) {x}, data.count, g);
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
T1 = table(hash, count);
%% This is the intended code but doesn't work
TT = table(THash,TCount);
write(fullfile(pwd,'data'),TT,FileType="parquet");
0 Comments
Answers (1)
Oguz Kaan Hancioglu
on 15 Mar 2023
Your code wasn't work because "gather(TCount)" returns cell array for each element. Therefore you are trying to write double array in to one single cell. You can find the length of each array into the cell. I hope this solves your problem.
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
cellsz = cellfun(@size,count,'uni',false);
newCount = cellfun(@(x) x(1),cellsz,'UniformOutput',false)
T1 = table(hash, newCount);
See Also
Categories
Find more on Analysis of Big Data with Tall Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!