Main Content

Work with Deep Learning Data in Azure

This example shows how to set up, write to, and read from Azure® Blob Storage.

Before you can train your deep neural network in the cloud, you need to upload your data to the cloud. This example shows how to set up a cloud storage resource, upload a data set of labeled images to the cloud, and read that data from the cloud into MATLAB®. The example uses the CIFAR-10 data set, which is a labeled image data set commonly used for benchmarking image classification networks.

Download Data Set to Local Machine

Specify a local directory in which to download the data set. The following code creates a folder in your current directory containing all the images in the data set.

directory = pwd; 
[trainDirectory,testDirectory] = downloadCIFARToFolders(directory);
Downloading CIFAR-10 data set...done.
Copying CIFAR-10 to folders...done.

Upload Local Data Set to Azure Blob Storage

To work with data in the cloud, you can upload it to Azure Blob Storage and then access the data from your local MATLAB session or from workers in your cluster. The following steps describe how to set up cloud storage and upload the CIFAR-10 data set from your local machine to an Azure Blob Container.

1. Log in to your Microsoft® Azure account. For information on creating an account, see Microsoft Azure.

2. For efficient file transfers to and from Azure Blob Storage, download and install the Azure Command Line Interface tool from How to install the Azure CLI.

3. Login to Azure at a your system's command prompt.

az login

4. Create a resource group, specifying a name for the resource group and the geographic location.

az group create --name <your resource group name> --location <your storage location>

A resource group is a container that holds resources for an Azure solution. To see a list of locations, use the command az account list-locations. Any of the locations in the returned "name" fields, for example useast, can be passed as a location.

5. Create a storage account in your resource group, specifying a name for the storage account.

az storage account create --name <your storage account name> --resource-group <your resource group name>

An Azure storage account contains all of your Azure storage data objects, including blobs, file shares, queues, tables, and disks.

6. Create a storage container in your storage account, specifying a name for the storage container.

az storage container create --name <your storage container name> --account-name <your storage account name>

7. Upload the CIFAR-10 data to the container, specifying the source directory. Use the --recursive flag to upload files within subdirectories of the source directory.

az storage fs directory upload --file-system <your storage container name> --account-name <your storage account name> --source "path/to/CIFAR10/on/the/local/machine" --recursive

Access Data Set in MATLAB

By default, MATLAB does not have permission to access data stored in your Azure Blob Storage. You can grant MATLAB access to the data by generating a shared access signature (SAS) token and providing it to MATLAB.

At your system's command prompt, generate an SAS token. You can vary the permissions that the token provides and the expiry date of the token using the --permissions and --expiry parameters. For example, this line generates an SAS token that grants read, write, and list permissions until the specified date.

az storage container generate-sas --account-name  <your storage account name> --name <your storage container name> --permissions rwl --expiry YYYY-MM-DD

Copy the generated SAS token and, in MATLAB, set the environment variable MW_WASB_SAS_TOKEN using the generated token.

SASToken = "<your generated SAS Token>";
setenv("MW_WASB_SAS_TOKEN",SASToken);

Changes to environment variables do not persist between MATLAB sessions. To specify an environment variable permanently, set them in your user or system environment. When you offload to workers in a cluster, the client MATLAB session and the workers have different environment variables. For information on how to copy environment variables from the client to the workers so that the workers can access cloud storage, see Set Environment Variables on Workers (Parallel Computing Toolbox).

You can read or write data from cloud storage using MATLAB functions and objects, such as file I/O functions and some datastore objects. When you specify the location of the data, you must specify the full path to the files or folders using a uniform resource locator (URL) of the form wasbs://container@account/path_to_file/file.ext.

URL = "wasbs://<your storage container name>@<your storage account name>.blob.core.windows.net/cifar10/train";

Create a datastore pointing to the URL of the container and show the number of images in each category.

ds = datastore(URL, ...
    Type="image", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

countEachLabel(ds)
ans=10×2 table
      Label       Count
    __________    _____

    airplane      5000 
    automobile    5000 
    bird          5000 
    cat           5000 
    deer          5000 
    dog           5000 
    frog          5000 
    horse         5000 
    ship          5000 
    truck         5000 

With the CIFAR-10 data set now stored in Azure Blob Storage, you can try any of the examples in Parallel and Cloud that show how to use the data set in different situations. Note that training a network is faster if you have locally hosted training data.

See Also

|

Related Topics