tall
Create tall array
Syntax
Description
creates a
tall array on top of datastore t
= tall(ds
)ds
.
If
ds
is a datastore for tabular data (so that theread
andreadall
methods of datastore return tables or timetables), thent
is a tall table or tall timetable, depending on what the datastore is configured to return. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries.Otherwise,
t
is a tall cell array.
converts the
in-memory array t
= tall(A
)A
into a tall array. The underlying data type of
t
is the same as class(A)
. This syntax is useful
when you need to quickly create a tall array, such as for debugging or prototyping
algorithms.
In R2019b and later, you can cast in-memory arrays into tall arrays for more efficient operations on the array. After you convert into a tall array, MATLAB® avoids making temporary copies of the whole array and works on the data in smaller blocks. This enables you to perform a wider range of operations on the array without running out of memory.
Examples
Create Tall Array
Convert a datastore into a tall array.
First, create a datastore for the data set. You can specify either a full or relative file location for the data set using datastore(location)
to create the datastore. The location
argument can specify:
A single file, such as
'airlinesmall.csv'
Several files with the same extension, such as
'*.csv'
An entire folder of files, such as
'C:\MyData'
tabularTextDatastore
also has several options to specify file and text format properties when you create the datastore.
Create a datastore for the airlinesmall.csv
data set. Treat 'NA'
values as missing data so that they are replaced with NaN
values. Select a small subset of the variables to work with.
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'}; ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ... 'SelectedVariableNames', varnames);
Use tall
to create a tall array for the data in the datastore. Since the data in ds
is tabular, the result is a tall table. If the data is not tabular, then tall
creates a tall cell array instead.
T = tall(ds)
T = Mx4 tall table ArrDelay DepDelay Origin Dest ________ ________ _______ _______ 8 12 {'LAX'} {'SJC'} 8 1 {'SJC'} {'BUR'} 21 20 {'SAN'} {'SMF'} 13 12 {'BUR'} {'SJC'} 4 -1 {'SMF'} {'LAX'} 59 63 {'LAX'} {'SJC'} 3 -2 {'SAN'} {'SFO'} 11 -1 {'SEA'} {'LAX'} : : : : : : : :
You can use many common MATLAB® operators and functions to work with tall arrays. To see if a function works with tall arrays, check the Extended Capabilities section at the bottom of the function reference page.
Calculate Size of Tall Array
Convert a datastore into a tall table, calculate its size using a deferred calculation, and then perform the calculation and return the result in memory.
First, create a datastore for the airlinesmall.csv
data set. Treat 'NA'
values as missing data so that they are replaced with NaN
values. Set the text format of a few columns so that they are read as a cell array of character vectors. Convert the datastore into a tall table.
ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'TailNum')} = '%s'; ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'CancellationCode')} = '%s';
T = tall(ds)
T = Mx29 tall table Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1987 10 21 3 642 630 735 727 {'PS'} 1503 {'NA'} 53 57 NaN 8 12 {'LAX'} {'SJC'} 308 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 26 1 1021 1020 1124 1116 {'PS'} 1550 {'NA'} 63 56 NaN 8 1 {'SJC'} {'BUR'} 296 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 23 5 2055 2035 2218 2157 {'PS'} 1589 {'NA'} 83 82 NaN 21 20 {'SAN'} {'SMF'} 480 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 23 5 1332 1320 1431 1418 {'PS'} 1655 {'NA'} 59 58 NaN 13 12 {'BUR'} {'SJC'} 296 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 22 4 629 630 746 742 {'PS'} 1702 {'NA'} 77 72 NaN 4 -1 {'SMF'} {'LAX'} 373 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 28 3 1446 1343 1547 1448 {'PS'} 1729 {'NA'} 61 65 NaN 59 63 {'LAX'} {'SJC'} 308 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 8 4 928 930 1052 1049 {'PS'} 1763 {'NA'} 84 79 NaN 3 -2 {'SAN'} {'SFO'} 447 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 10 6 859 900 1134 1123 {'PS'} 1800 {'NA'} 155 143 NaN 11 -1 {'SEA'} {'LAX'} 954 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
The display of the tall table indicates that MATLAB® does not yet know how many rows of data are in the table.
Calculate the size of the tall table. Since calculating the size of a tall array requires a full pass through the data, MATLAB does not immediately calculate the value. Instead, like most operations with tall arrays, the result is an unevaluated tall array whose values and size are currently unknown.
s = size(T)
s = 1x2 tall double row vector ? ?
Use the gather
function to perform the deferred calculation and return the result in memory. The result returned by size
is a trivially small 1-by-2 vector, which fits in memory.
sz = gather(s)
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.68 sec Evaluation completed in 0.87 sec
sz = 1×2
123523 29
If you use gather
on an unreduced tall array, then the result might not fit in memory. If you are unsure whether the result returned by gather
can fit in memory, use gather(head(X))
or gather(tail(X))
to bring only a small portion of the calculation result into memory.
Convert In-Memory Arrays to Tall Arrays
Create an in-memory array of random numbers, and then convert it into a tall array. Creating tall arrays from in-memory arrays in this manner is useful for debugging or prototyping new programs. The in-memory array is still bound by normal memory constraints, and even after it is converted into a tall array it cannot grow beyond the limits of memory.
A = rand(100,4); tA = tall(A)
tA = 100x4 tall double matrix 0.8147 0.1622 0.6443 0.0596 0.9058 0.7943 0.3786 0.6820 0.1270 0.3112 0.8116 0.0424 0.9134 0.5285 0.5328 0.0714 0.6324 0.1656 0.3507 0.5216 0.0975 0.6020 0.9390 0.0967 0.2785 0.2630 0.8759 0.8181 0.5469 0.6541 0.5502 0.8175 : : : : : : : :
In R2019b and later releases, when you convert in-memory arrays into tall arrays, you can perform calculations on the array without requiring extra memory for temporary copies of the data. For example, this code normalizes the data in a large matrix and then calculates the sum of all the rows and columns. An in-memory version of this calculation needs to not only store the array but also have enough memory available to create temporary copies of the array.
N = 5000; tA = tall(rand(N)); tB = tA - mean(tA); S = gather(sum(tB, [1,2]))
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 0.37 sec - Pass 2 of 2: Completed in 0.38 sec Evaluation completed in 1.3 sec
S = -1.0004e-11
If you adjust the value of N
so that there is enough memory to store tA
, but not enough memory for copies, the calculation still executes successfully.
Input Arguments
ds
— Input datastore
datastore object
Input datastore, specified as a datastore object. See Datastore for more information on creating a datastore object for your data set.
Tall arrays work only with datastores that are deterministic. That is, if you use
read
on the datastore, reset the datastore
with reset
, and then read the datastore again,
then the data returned must be the same in both cases. Tall array calculations involving
a datastore that is not deterministic can produce unpredictable results. See Select Datastore for File Format or Application for more
information.
Example: ds = tabularTextDatastore('airlinesmall.csv')
specifies a
single file.
Example: ds = tabularTextDatastore('*.csv')
specifies a collection
of .csv
files.
Example: ds = spreadsheetDatastore('C:\MyData')
specifies a folder
of spreadsheet files.
Example: ds = datastore('hdfs:///data/')
specifies a data set in
an HDFS file system.
A
— In-memory variable
array
In-memory variable, specified as an array.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| table
| timetable
| string
| cell
| categorical
| datetime
| duration
| calendarDuration
Complex Number Support: Yes
Output Arguments
t
— Tall array
array
Tall array, returned as one of these types:
When converting a datastore,
t
is a tall table or tall timetable for tabular datastores. Otherwise,t
is a tall cell array.When converting an in-memory array, the underlying data type of
t
is the same asclass(A)
.
See Lazy Evaluation of Tall Arrays for information about how to effectively work with tall arrays.
Tips
See Extend Tall Arrays with Other Products for information on how to use tall arrays with:
Statistics and Machine Learning Toolbox™
Parallel Computing Toolbox™
MATLAB Parallel Server™
Database Toolbox™
MATLAB Compiler™
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
If you have Parallel Computing Toolbox installed, then when you use tall
, MATLAB automatically opens a parallel pool of workers on your local machine.
MATLAB runs the computations across the available workers. Control parallel behavior
with the parallel preferences, including scaling up to a cluster.
For details, see Use Tall Arrays on a Parallel Pool (Parallel Computing Toolbox).
Version History
Introduced in R2016b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)