Main Content

Import HDF5 Files

You can read data from an HDF5 file in several ways. Programmatically, you can use the MATLAB® high-level HDF5 functions or the HDF5 library namespaces of low-level functions. Interactively, you can use the Import Data Live Editor task or (in MATLAB Online™) the Import Tool app.

Hierarchical Data Format, Version 5 (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). HDF5 is used by a wide range of engineering and scientific fields that want a standard way to store data so that it can be shared. For more information about the HDF5 file format, read the HDF5 documentation available at The HDF Group website (https://www.hdfgroup.org).

Note

For information about importing HDF4 files, which have a separate, incompatible format, see Import HDF4 Files Using High-Level Functions.

Import Data Using High-Level HDF5 Functions

MATLAB includes several functions that you can use to examine the contents of an HDF5 file and import data from the file into the MATLAB workspace.

  • h5disp — View the contents of an HDF5 file.

  • h5info — Create a structure that contains all the metadata defining an HDF5 file.

  • h5read — Read data from a dataset in an HDF5 file.

  • h5readatt — Read data from an attribute associated with a dataset in an HDF5 file or with the file itself (a global attribute).

For details about how to use these functions, see their reference pages, which include examples. The following sections illustrate some common usage scenarios.

Determine Contents of HDF5 File

HDF5 files can contain data and metadata, called attributes. HDF5 files organize the data and metadata in a hierarchical structure similar to the hierarchical structure of a UNIX® file system.

In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, datasets, attributes, links, and data types. A dataset is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a dataset. A link is similar to a UNIX file system symbolic link. Links are a way to reference objects without having to make a copy of the object.

Data types are a description of the data in the dataset or attribute. Data types tell how to interpret the data in the dataset.

To get a quick view into the contents of an HDF5 file, use the h5disp function.

h5disp("example.h5")
HDF5 example.h5 
Group '/' 
    Attributes:
        'attr1':  97 98 99 100 101 102 103 104 105 0 
        'attr2':  2x2 H5T_INTEGER
    Group '/g1' 
        Group '/g1/g1.1' 
            Dataset 'dset1.1.1' 
                Size:  10x10
                MaxSize:  10x10
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
                Attributes:
                    'attr1':  49 115 116 32 97 116 116 114 105 ...
                    'attr2':  50 110 100 32 97 116 116 114 105 ...
            Dataset 'dset1.1.2' 
                Size:  20
                MaxSize:  20
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
        Group '/g1/g1.2' 
            Group '/g1/g1.2/g1.2.1' 
                Link:  'slink'
                    Type:  'soft link'
                    Target:  'somevalue'
    Group '/g2' 
        Dataset 'dset2.1' 
            Size:  10
            MaxSize:  10
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
        Dataset 'dset2.2' 
            Size:  5x3
            MaxSize:  5x3
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
					.
					.
					.

To explore the hierarchical organization of an HDF5 file, use the h5info function. The h5info function returns a structure that contains various information about the HDF5 file, including the name of the file.

info = h5info("example.h5")
info = 

  struct with fields:

      Filename: 'matlabroot\matlab\toolbox\matlab\demos\example.h5'
          Name: '/'
        Groups: [4×1 struct]
      Datasets: []
     Datatypes: []
         Links: []
    Attributes: [2×1 struct]

By looking at the Groups and Attributes fields, you can see that the file contains four groups and two attributes. The Datasets, Datatypes, and Links fields are all empty, indicating that the root group does not contain any datasets, data types, or links. To explore the contents of the sample HDF5 file further, examine one of the structures in Groups. This example shows the contents of the second structure in this field.

level2 = info.Groups(2)
level2 = 

  struct with fields:

          Name: '/g2'
        Groups: []
      Datasets: [2×1 struct]
     Datatypes: []
         Links: []
    Attributes: []

In the sample file, the group named /g2 contains two datasets. This figure illustrates this part of the sample HDF5 file organization.

Schematic showing structure of HDF5 data in example.h5

To get information about a dataset, such as its name, dimensions, and data type, look at either of the structures returned in the Datasets field.

dataset1 = level2.Datasets(1)
dataset1 = 

  struct with fields:

          Name: 'dset2.1'
      Datatype: [1×1 struct]
     Dataspace: [1×1 struct]
     ChunkSize: []
     FillValue: []
       Filters: []
    Attributes: []

Import Data from HDF5 File

To read data or metadata from an HDF5 file, use the h5read function. As arguments, specify the name of the HDF5 file and the name of the dataset. (To read the value of an attribute, you must use h5readatt.)

To illustrate, this example reads the dataset /g2/dset2.1 from the HDF5 sample file example.h5.

data = h5read("example.h5","/g2/dset2.1")
data =

  10×1 single column vector

    1.0000
    1.1000
    1.2000
    1.3000
    1.4000
    1.5000
    1.6000
    1.7000
    1.8000
    1.9000

Map HDF5 Data Types to MATLAB Data Types

When the h5read function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types to MATLAB data types, as shown in this table.

HDF5 Data Typeh5read Output

Bitfield

Array of packed 8-bit integers

Float

MATLAB single and double types, provided that they occupy 64 bits or fewer

Integer types, signed and unsigned

Equivalent MATLAB integer types, signed and unsigned

Opaque

Cell array of uint8 values

Reference

Cell array of the actual data pointed to by the reference, not the value of the reference

Strings, fixed-length and variable-length

String arrays

Enums

Cell array of character vectors, where each enumerated value is replaced by the corresponding member name

Compound

1-by-1 structure array; the dimensions of the dataset are expressed in the fields of the structure

Arrays

Array of values using the same data type as the HDF5 array. For example, if the array is of signed 32-bit integers, the MATLAB array will be of type int32.

The example HDF5 file included with MATLAB includes examples of all these data types.

For example, the dataset /g3/string is a string.

h5disp("example.h5","/g3/string")
HDF5 example.h5 
Dataset 'string' 
    Size:  2
    MaxSize:  2
    Datatype:   H5T_STRING
        String Length: 3
        Padding: H5T_STR_NULLTERM
        Character Set: H5T_CSET_ASCII
        Character Type: H5T_C_S1
    ChunkSize:  []
    Filters:  none
    FillValue:  '   '

Now read the data from the file. MATLAB returns it as a string array.

s = h5read("example.h5","/g3/string")
s = 

  2×1 string array

    "ab "
    "de "

The compound data types are always returned as structures. The dimensions of the dataset are expressed in the fields of the structure. For example, the dataset /g3/compound2D is a compound data type.

h5disp("example.h5","/g3/compound2D")
HDF5 example.h5 
Dataset 'compound2D' 
    Size:  2x3
    MaxSize:  2x3
    Datatype:   H5T_COMPOUND
        Member 'a':  H5T_STD_I8LE (int8)
        Member 'b':  H5T_IEEE_F64LE (double)
    ChunkSize:  []
    Filters:  none
    FillValue:  H5T_COMPOUND

Now read the data from the file. MATLAB returns it as a structure.

data = h5read("example.h5","/g3/compound2D")
data = 

  struct with fields:

    a: [2×3 int8]
    b: [2×3 double]

Import Data Using Low-Level HDF5 Functions

MATLAB provides direct access to dozens of functions in the HDF5 library with low-level functions that correspond to the functions in the HDF5 library. In this way, you can access the features of the HDF5 library from MATLAB, such as reading and writing complex data types and using the HDF5 subsetting capabilities. For more information, see Export Data Using MATLAB Low-Level HDF5 Functions.

Interactively Read Data from HDF5 File

This example shows how to use the Import Data task to explore the structure of an HDF5 file, import data from the file, and then analyze and visualize the data.

Explore and Import Data

The HDF5 file included with this example contains a swath of precipitation probability estimates from the NASA Precipitation Processing System (PPS).

Open the Import Data task in the Live Editor by selecting Task > Import Data on the Live Editor tab. Enter the name of the HDF5 file, nasa_pps.h5, in the File field. Use the task to explore the structure of the data, including the groups, datasets, and attributes. The file contains a group S1, which contains three datasets. The probabilityOfPrecip dataset contains estimates of the probability of precipitation at given locations; the Latitude and Longitude datasets contain the locations of these probability estimates. The attributes of the probabilityOfPrecip dataset include units, which has value 'percent'.

Select and import the data and attributes from the three datasets.

2023-12-07_09-19-16.png

To see the code that this task generates, expand the task display by clicking Show code at the bottom of the task parameter area.

% Create a structure to store imported HDF5 data
nasa_pps = struct();

filename = "nasa_pps.h5";

nasa_pps.Groups(1).Name = "S1";

nasa_pps.Groups(1).Datasets(1).Name = "Latitude";
nasa_pps.Groups(1).Datasets(1).Value = h5read(filename, "/S1/Latitude");

nasa_pps.Groups(1).Datasets(2).Name = "Longitude";
nasa_pps.Groups(1).Datasets(2).Value = h5read(filename, "/S1/Longitude");

nasa_pps.Groups(1).Datasets(3).Name = "probabilityOfPrecip";
nasa_pps.Groups(1).Datasets(3).Value = h5read(filename, "/S1/probabilityOfPrecip");

nasa_pps.Groups(1).Datasets(1).Attributes(1).Name = "units";
nasa_pps.Groups(1).Datasets(1).Attributes(1).Value = h5readatt(filename, "/S1/Latitude", "units");

nasa_pps.Groups(1).Datasets(1).Attributes(2).Name = "DimensionNames";
nasa_pps.Groups(1).Datasets(1).Attributes(2).Value = h5readatt(filename, "/S1/Latitude", "DimensionNames");

nasa_pps.Groups(1).Datasets(1).Attributes(3).Name = "Units";
nasa_pps.Groups(1).Datasets(1).Attributes(3).Value = h5readatt(filename, "/S1/Latitude", "Units");

nasa_pps.Groups(1).Datasets(1).Attributes(4).Name = "_FillValue";
nasa_pps.Groups(1).Datasets(1).Attributes(4).Value = h5readatt(filename, "/S1/Latitude", "_FillValue");

nasa_pps.Groups(1).Datasets(1).Attributes(5).Name = "CodeMissingValue";
nasa_pps.Groups(1).Datasets(1).Attributes(5).Value = h5readatt(filename, "/S1/Latitude", "CodeMissingValue");

nasa_pps.Groups(1).Datasets(2).Attributes(1).Name = "units";
nasa_pps.Groups(1).Datasets(2).Attributes(1).Value = h5readatt(filename, "/S1/Longitude", "units");

nasa_pps.Groups(1).Datasets(2).Attributes(2).Name = "DimensionNames";
nasa_pps.Groups(1).Datasets(2).Attributes(2).Value = h5readatt(filename, "/S1/Longitude", "DimensionNames");

nasa_pps.Groups(1).Datasets(2).Attributes(3).Name = "Units";
nasa_pps.Groups(1).Datasets(2).Attributes(3).Value = h5readatt(filename, "/S1/Longitude", "Units");

nasa_pps.Groups(1).Datasets(2).Attributes(4).Name = "_FillValue";
nasa_pps.Groups(1).Datasets(2).Attributes(4).Value = h5readatt(filename, "/S1/Longitude", "_FillValue");

nasa_pps.Groups(1).Datasets(2).Attributes(5).Name = "CodeMissingValue";
nasa_pps.Groups(1).Datasets(2).Attributes(5).Value = h5readatt(filename, "/S1/Longitude", "CodeMissingValue");

nasa_pps.Groups(1).Datasets(3).Attributes(1).Name = "units";
nasa_pps.Groups(1).Datasets(3).Attributes(1).Value = h5readatt(filename, "/S1/probabilityOfPrecip", "units");

nasa_pps.Groups(1).Datasets(3).Attributes(2).Name = "_FillValue";
nasa_pps.Groups(1).Datasets(3).Attributes(2).Value = h5readatt(filename, "/S1/probabilityOfPrecip", "_FillValue");

nasa_pps.Groups(1).Datasets(3).Attributes(3).Name = "DimensionNames";
nasa_pps.Groups(1).Datasets(3).Attributes(3).Value = h5readatt(filename, "/S1/probabilityOfPrecip", "DimensionNames");

nasa_pps.Groups(1).Datasets(3).Attributes(4).Name = "CodeMissingValue";
nasa_pps.Groups(1).Datasets(3).Attributes(4).Value = h5readatt(filename, "/S1/probabilityOfPrecip", "CodeMissingValue");

nasa_pps.Groups(1).Datasets(3).Attributes(5).Name = "Units";
nasa_pps.Groups(1).Datasets(3).Attributes(5).Value = h5readatt(filename, "/S1/probabilityOfPrecip", "Units");

clear filename

Organize and Prepare Data

Extract data from the three datasets as column vectors using the colon operator (:).

lats = nasa_pps.Groups.Datasets(1).Value(:);
lons = nasa_pps.Groups.Datasets(2).Value(:);
probs = nasa_pps.Groups.Datasets(3).Value(:);

Clean up the data in preparation for plotting. Valid indices in these three datasets conform to these conditions:

  • The value in probs is in the interval [0, 100].

  • The value in lats is in the interval [–90, 90].

  • The value in lons is in the interval [–180, 180].

Keep only the values at the valid indices in the three datasets by using logical indexing. This step removes invalid latitude and longitude values as well as invalid percent values for the probability estimates.

isValid = 0<=probs & probs<=100 & abs(lats)<=90 & abs(lons)<=180;

probs = probs(isValid);
lats = lats(isValid);
lons = lons(isValid);

Plot Data

Create a geographic density plot for the valid data in the datasets. The "warmer" colors represent greater probabilities of precipitation.

g = geodensityplot(lats,lons,probs,FaceColor="interp");
geobasemap grayland
geolimits([-80 80],[-180 -90])
title("Probability of Precipitation vs. Location")
colormap jet

Credits

Global precipitation data provided by NASA Precipitation Processing System, from their website at https://gpm.nasa.gov/data/directory.

Read HDF5 Dataset Using Dynamically Loaded Filters

MATLAB supports reading and writing HDF5 datasets using dynamically loaded filters. The HDF Group maintains a list of registered filters at Filters on their website.

To read a dataset that has been written using a user-defined, third-party filter, follow these steps:

  1. Install the HDF5 filter plugin on your system as a shared library or DLL.

  2. Set the HDF5_PLUGIN_PATH environment variable to point to the local installation of the plugin, and then start MATLAB:

    • On Windows® — Set the environment variable using System Properties > Advanced > Environment Variables, and then start MATLAB.

    • On Linux® and Mac — Set the environment variable from the terminal, and then start MATLAB from the same terminal.

After you complete these steps, you can use the high-level or low-level MATLAB HDF5 functions to read and access datasets that have been compressed using the third-party filter. For more information, see HDF5 Dynamically Loaded Filters on The HDF Group website.

Linux Users Only: Rebuild Filter Plugins Using MATLAB HDF5 Shared Library

Starting in R2021b, in certain cases, Linux users using a filter plugin with callbacks to core HDF5 library functions must rebuild the plugin using the shipping MATLAB HDF5 shared library, /matlab/bin/glnxa64/libhdf5.so.x.x.x. If you do not rebuild the plugin using this version of the shared library, you might experience issues ranging from undefined behavior to crashes. For more information, see Build HDF5 Filter Plugins on Linux Using MATLAB HDF5 Shared Library or GNU Export Map.

Related Topics