read and divide HDF5 data into chunks

23 views (last 30 days)
nlm
nlm on 12 Oct 2018
Edited: nlm on 15 Oct 2018
I have 1000 + HDF5 files of 1800 by 3600 matrix. I want to divide the 1800 * 3600 matrices into 4 chunks and store with a ID into an array. I want to repeat this process for 1000 + files. Can someone help how to use H5P.set_chunk OR H5S.select_hyperslab ? I used H5S.select_hyperslab to get only one slab, how should I repeat this process ?

Accepted Answer

Dinesh Iyer
Dinesh Iyer on 12 Oct 2018
Edited: Dinesh Iyer on 12 Oct 2018
The H5P.set_chunk is used to specify the chunk dimensions of a dataset i.e. what should the size of each chunk when it is is stored in the file. The H5S.select_hyperslab is used to specify the portion of the dataset that you want to read. If you are reading data a portion of the data from a dataset, this is probably what you need to do.
When you say that you want to store each chunk with an ID into an array, do you mean you want to read it into MATLAB or do you want to store it again into another HDF5 file?
For starters, you can use the high-level h5read function to read a portion of the dataset. I am not sure how you want to divide the data into 4 chunks but I am going to assume that each chunk is 1800x900. This does not impact the code.
The code below provides an idea on how you can do this.
fileNames = dir('*.h5');
fileNames = {fileNames.name}'
numChunks = 4;
chunkSize = [1800 900];
for cnt = 1:numel(fileNames)
fileToRead = fileNames{cnt};
s = struct();
for cnt = 1:numChunks
ID = sprintf('%s_Chunk_%02d', matlab.land.makeValidname(fileToRead), cnt);
startLoc = [1 chunkSize(2)*(cnt-1)+1];
s.(ID) = h5read(fileToRead, '/mydataset', startLoc, chunkSize);
end
end
I have not run the above code and so apologies for any errors but it does give an idea of how you can do this.
If you want to use the low-level functions such as H5D.read, you have to loop and update the h5_start input argument to point to the location of the dataset that you want to read.
  3 Comments
Dinesh Iyer
Dinesh Iyer on 12 Oct 2018
The code that I have provided should help you get started. It results in 4 chunks because I have taken a chunk size of [1800 900]. You can modify this.
If you want to speed up the operation, you can use PARFOR loops to parallelize the files that you are processing.
nlm
nlm on 15 Oct 2018
Edited: nlm on 15 Oct 2018
It results in only 4 chunks from 400 files that are there. How do I get 4 chunks for each of the 400 HDF5 files and store it in structure.
I modified your code, and I get empty arrays,
fileNames = dir('*.HDF5');
fileNames = {fileNames.name}';
numChunks = 4; chunkSize = [1800 900];
for cnt1 = 1:numel(fileNames)
fileToRead = fileNames{cnt1};
s = struct();
for cnt = 1:numChunks
ID = sprintf('C_%d', cnt);
startLoc = [1 chunkSize(2)*(cnt-1)+1]
s(cnt1).(ID) = h5read(fileToRead, '/Grid/HQprecipitation', startLoc, chunkSize);
end
end

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!