- Read Method: Implement the read method to read a specified number of frames from randomly shuffled positions in a binary file. Use "fseek" to move the file position pointer based on the shuffled frame order in the binary file.
- Shuffle Method: Implement a custom "shuffle" method that generates a random permutation of frame indices. This method should update the order in which frames are accessed during reading, without altering the binary file.
Shuffle method on custom datastore written for a single binary file
3 views (last 30 days)
Show older comments
I am writing a custom datastore and am seeking some assistance. My datasets consist of stacks of 2D images (frames) stored sequentially in a single binary file. While it's very straight forward to read in the binary stream using fread, each full dataset itself can easily be on the order of 50+ GB, making it infeasible to load everything at once on the hardware equipment I have available. This was my original motivation for exploring the use of a datastore.
In addition to the need for managing out-of-memory data, I also would like to partition the data into chunks where each chunk contains a random collection of frames from this binary file. If possible, I would like to use the shuffle method for the datastore superclass to accomplish this, as this seems to be the "proper" approach (although I'm very open to alternatives).
The problem I am currently having is that the default datastore shuffle method appears only to randomize the order of files in a datastore directory. However, since I only have one (very large) binary file, it doesn't seem to "shuffle" anything at all - running readall on the shuffled datastore returns the exact same data as if I were to run it on the original datastore. I would rather need it to "shuffle" the frames within the binary file. Presumably, if I were to save each frame as an individual image file on disk, then I could get this to work using imageDatastore or fileDatastore. However, then I would have to go through all my files and save them to disk again as individual files, which seems rather silly.
I have written code to load a chunk of the data manually by jumping around the file using fseek. However, then I lose access to the datastore object as well as its built-in functionality. So I thought I would throw this question out there to see if anyone could offer some help.
0 Comments
Answers (1)
Sanjana
on 6 Oct 2024
Hi,
You can implement a custom datastore in MATLAB to shuffle frames within a single large binary file while maintaining the benefits of a datastore.
Custom Datastore class:Create a custom datastore class that extends the matlab.io.Datastore class. This class can be implemented to read and shuffle frames within a binary file.
Implementing Custom Read and Shuffle methods:
Example Custom Datastore class definition:
classdef CustomFrameDatastore < matlab.io.Datastore
properties
FileName
FrameSize
TotalFrames
CurrentIndex
FrameOrder
end
methods
function ds = CustomFrameDatastore(fileName, frameSize, totalFrames)
ds.FileName = fileName;
ds.FrameSize = frameSize;
ds.TotalFrames = totalFrames;
ds.CurrentIndex = 1;
ds.FrameOrder = randperm(totalFrames);
end
function data = read(ds)
if ds.CurrentIndex > ds.TotalFrames
error('No more data to read.');
end
fid = fopen(ds.FileName, 'rb');
frameIndex = ds.FrameOrder(ds.CurrentIndex);
fseek(fid, (frameIndex-1)*ds.FrameSize, 'bof');
data = fread(fid, ds.FrameSize, 'uint8');
fclose(fid);
ds.CurrentIndex = ds.CurrentIndex + 1;
end
function reset(ds)
ds.CurrentIndex = 1;
end
function tf = hasdata(ds)
tf = ds.CurrentIndex <= ds.TotalFrames;
end
function shuffle(ds)
ds.FrameOrder = randperm(ds.TotalFrames);
ds.CurrentIndex = 1;
end
end
end
Here is the example code to use the above custom datastore:
% Initialize datastore
frameSize = 1024 * 1024; % Example frame size
totalFrames = 50000; % Example total number of frames
ds = CustomFrameDatastore('largefile.bin', frameSize, totalFrames);
% Shuffle and read frames
ds.shuffle();
while hasdata(ds)
frameData = ds.read();
% Process frameData
end
I hope this helps!
0 Comments
See Also
Categories
Find more on Datastore in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!