- Have your loadPrc return a 4 × 1483 × 2824 numeric matrix (rather than a cell array)
- Your corresponding tall array t will then be 25000 × 1483 × 2824
- Instead of the for loop, simply call prctile in dimension 1
big data 2d matrix percentile calculation using tall
4 views (last 30 days)
Show older comments
I'm trying to calculate a percentile of a lot of files (25000 or even more) containing 4x1 cell, representing 4 maps or 1483x2824 matrixes.
I'm using tall arrays following indications of Percentiles of Tall Matrix Along Different Dimensions:
tic
%start local pool for mutithreading
c=parcluster('local');
c.NumWorkers=20;
parpool(c, c.NumWorkers);
folder='/home/temporal2/dsantos/mat/*.mat'; %more than 25000 files
A=ones(1483,2824,2);%aux matrix for stablish prdtile data type
y=tall(A);
%database of files cointaining 4x1cell of 1483*2824 maps
ds=fileDatastore(folder,'ReadFcn',@loadPrc,'FileExtensions','.mat','UniformRead', true)
t=tall(ds);
%fill the aux tall array with each map in the correct format
for i=1:25000
y(:,:,i)=t(1+(i-1)*1483:1483*i,:);
end
%calculate the percentile
p90_1=prctile(y,90,3)
P90_1=gather(p90_1);
save('/home/temporal2/dsantos/p90_1.mat','P90_1','-v7.3');
toc
But it seems that tall arrays won't work for this because I get the error:
Warning: Error encountered during preview of tall array 'p90_1'. At
tempting to
gather 'p90_1' will probably result in an error. The error encountered was:
Requested 500025x500025 (1862.8GB) array exceeds maximum array size preference.
Creation of arrays greater than this limit may take a long time and cause
MATLAB to become unresponsive. See <a href="matlab: helpview([docroot
'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>
or preference panel for more information.
> In tall/display (line 21)
p90_1 =
MxNx... tall array
? ? ? ...
? ? ? ...
? ? ? ...
: : :
: : :
>> Error using digraph/distances (line 72)
Internal problem while evaluating tall expression. The problem was:
Requested 500028x500028 (1862.9GB) array exceeds maximum array size preference.
Creation of arrays greater than this limit may take a long time and cause
MATLAB to become unresponsive. See <a href="matlab: helpview([docroot
'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>
or preference panel for more information.
Error in
matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadata (line
756)
allDistances = distances(cg.Graph);
Error in
matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadataFillingPart
itionedArrays
(line 739)
[metadatas, partitionedArrays] = iGenerateMetadata(inputArrays,
executorToConsider);
Error in ...
Error in tall/gather (line 50)
[varargout{:}] = iGather(varargin{:});
Caused by:
Error using matlab.internal.graph.MLDigraph/bfsAllShortestPaths
Requested 500028x500028 (1862.9GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long time
and cause MATLAB to become unresponsive. See <a href="matlab:
helpview([docroot '/matlab/helptargets.map'],
'matlab_env_workspace_prefs')">array size limit</a> or preference panel for
more information.
Any clue on how to solve this problem?
All the best
0 Comments
Answers (2)
Edric Ellis
on 13 Aug 2019
That particular error is an internal error basically because your tall array expression is simply too large - contains too many expressions. tall arrays operate by building up a symbolic representation of all the expressions you've evaluated, and then running them all together when you call gather. Because you've got a for loop over 25000 elements, this symbolic representation is large - too large to be evaluated. tall arrays are basically not designed to be looped over in this way. Instead, you need to express your program in terms of a smaller number of vectorised operations.
I would proceed in the following manner (I can't be more specific since your problem statement isn't executable - see this page on tips regarding making a minimal reproduction):
ds = fileDatastore();
t = tall(ds);
p90_1=prctile(t,90,1);
P90_1=gather(p90_1);
% and then perhaps
P90_1 = shiftdim(P90_1, 1)
0 Comments
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!