Main Content

Datastore

Read large collections of data

The datastore function creates a datastore, which is a repository for collections of data that are too large to fit in memory. A datastore allows you to read and process data stored in multiple files on a disk, a remote location, or a database as a single entity. If the data is too large to fit in memory, you can manage the incremental import of data, create a tall array to work with the data, or use the datastore as an input to mapreduce for further processing. For more information, see Getting Started with Datastore.

Functions

expand all

datastoreCreate datastore for large collections of data
tabularTextDatastoreDatastore for tabular text files
spreadsheetDatastoreDatastore for spreadsheet files
imageDatastoreDatastore for image data
parquetDatastoreDatastore for collection of Parquet files (Since R2019a)
fileDatastoreDatastore with custom file reader
arrayDatastoreDatastore for in-memory data (Since R2020b)
readRead data in datastore
readallRead all data in datastore
previewPreview subset of data in datastore
hasdataDetermine if data is available to read
resetReset datastore to initial state
writeallWrite datastore to files (Since R2020a)
subsetCreate subset of datastore or FileSet (Since R2019a)
isSubsettableDetermine whether datastore is subsettable (Since R2022b)
shuffleShuffle all data in datastore
isShuffleableDetermine whether datastore is shuffleable (Since R2020a)
numpartitionsNumber of datastore partitions
partitionPartition a datastore
isPartitionableDetermine whether datastore is partitionable (Since R2020a)

Functions

combineCombine data from multiple datastores (Since R2019a)
transformTransform datastore (Since R2019a)

Objects

CombinedDatastoreDatastore to combine data read from multiple underlying datastores (Since R2019a)
SequentialDatastoreSequentially read data from multiple underlying datastores (Since R2022b)
TransformedDatastoreDatastore to transform underlying datastore (Since R2019a)
KeyValueDatastoreDatastore for key-value pair data for use with mapreduce
TallDatastoreDatastore for checkpointing tall arrays

Classes

expand all

matlab.io.Datastore Base datastore class
matlab.io.datastore.PartitionableAdd parallelization support to datastore
matlab.io.datastore.SubsettableAdd subset and fine-grained parallelization support to datastore (Since R2022b)
matlab.io.datastore.HadoopLocationBased Add Hadoop support to datastore (Since R2019a)
matlab.io.datastore.ShuffleableAdd shuffling support to datastore
matlab.io.datastore.DsFileSet File-set object for collection of files in datastore
matlab.io.datastore.DsFileReader File-reader object for files in a datastore
matlab.io.datastore.FileWritableAdd file writing support to datastore (Since R2020a)
matlab.io.datastore.FoldersPropertyProviderAdd Folder property support to datastore (Since R2020a)
matlab.io.datastore.FileSet File-set for collection of files in datastore (Since R2020a)
matlab.io.datastore.BlockedFileSet Blocked file-set for collection of blocks within file (Since R2020a)

Topics