Custom datastore - why can't I just have a datastore with doubles?

10 views (last 30 days)
Hello!
I'm trying to create a machine learning model based on a multilayer perceptron. The model takes in four inputs and has two outputs. All inputs and outputs are doubles that are between 0 and 1. There are a series of shared hidden layers, then the model splits and there are a series of hidden layers specifically for each output. So, one . I have created the model using dlnetwork ash shown here:
layers1 = [
featureInputLayer(4, "Name", "input")
fullyConnectedLayer(24, "Name", "fc1")
reluLayer('name', 'relu1')
fullyConnectedLayer(24, "Name", "fc2")
reluLayer('name', 'relu2')
fullyConnectedLayer(24, "Name", "fc3")
reluLayer('name', 'relu3')
];
layersA = [ fullyConnectedLayer(16, "Name", "fc4a")
reluLayer('name', 'relu4a')
fullyConnectedLayer(16, "Name", "fc5a")
reluLayer('name', 'relu5a')
fullyConnectedLayer(8, "Name", "fc6a")
reluLayer('name', 'relu6a')
fullyConnectedLayer(8, "Name", "fc7a")
reluLayer('name', 'relu7a')
fullyConnectedLayer(4, "Name", "fc8a")
reluLayer('name', 'relu8a')
fullyConnectedLayer(4, "Name", "fc9a")
reluLayer('name', 'relu9a')
fullyConnectedLayer(4, "Name", "fc10a")
reluLayer('name', 'relu10a')
fullyConnectedLayer(4, "Name", "fc11a")
reluLayer('name', 'relu11a')
softmaxLayer
];
layersB = [ fullyConnectedLayer(16, "Name", "fc4b")
reluLayer('name', 'relu4b')
fullyConnectedLayer(16, "Name", "fc5b")
reluLayer('name', 'relu5b')
fullyConnectedLayer(8, "Name", "fc6b")
reluLayer('name', 'relu6b')
fullyConnectedLayer(8, "Name", "fc7b")
reluLayer('name', 'relu7b')
fullyConnectedLayer(4, "Name", "fc8b")
reluLayer('name', 'relu8b')
fullyConnectedLayer(4, "name", "fc9b")
reluLayer('name', 'relu9b')
fullyConnectedLayer(4, "Name", "fc10b")
reluLayer('name', 'relu10b')
fullyConnectedLayer(4, "Name", "fc11b")
reluLayer('name', 'relu11b')
softmaxLayer
];
net = dlnetwork;
net = addLayers(net, layers1);
net = addLayers(net, layersA);
net = connectLayers(net, "relu3", "fc4a");
net = addLayers(net, layersB);
net = connectLayers(net, "relu3", "fc4b");
[trainedNet, info] = trainnet(xTrain', yTrain', net, "mse", opts);
However, when running trainnet, I get the following error message:
Error using trainnet (line 46)
For networks with multiple inputs or outputs, data must be a datastore.
Error in trainRutileModel_matlabEdits (line 172)
[trainedNet, info] = trainnet(xTrain', yTrain', net, "mse", opts);
I followed the instructions for creating a custom datastore, since Matlab (for whatever reason) doesn't have a datastore for doubles. I took my input and output data and combined them into a cell array formatted as follows:
>> rutileDatastore{2}
ans =
1×6 cell array
{'0.080157'} {'0.38585'} {'0.21999'} {'0.61851'} {'0'} {'0.010101'}
Which I believe is in line with the "Datastores for deep learning" support page. The first four entries are inputs, the last two are outputs. I then found out that Matlab doesn't appear to natively handle datastores of just plain old doubles - for some reason, the most basic data type is neglected. So I had to go to the "Develop Custom Datastore" support page, followed the instructions there, made the myDatastore class, and then went on to try to validate the datastore using the "Testing Guidelines for Custom Datastores" support page. Unfortunately, I can't get this to work with any of the datastore types on the custom datastore page - the datastore of cells that reads out correctly isn't one of the 'Type' values listed on the "Datastores" support page.
I managed to get it working as a tabular text datastore, but read(ds) on a tabular text datastore outputs tables, and according to the "Datastores for deep learning" support page, read(ds) needs to output cell arrays and the tabular text datastores output tables.
What gives? Does Matlab support datastores that are just numbers? How can I go about a) fulfilling the requirement of trainnet that my training data is in datastore format, and b) having the datastore sore just plain old numbers that are output in 1D cell array format?

Answers (2)

Walter Roberson
Walter Roberson on 7 Aug 2024
Use arrayDatastore (since R2020b)
  1 Comment
Matthew
Matthew on 8 Aug 2024
I appreciate you taking the time to comment on this. I may be misunderstanding something, but arrayDatastore reads out each row of the array as a cell, so you end up with a series of nested cells that contain data of the appropriate type for a MIMO MLP network accoridng to the trainnet function. You could probably apply some sort of transform as in Garmit's response, but arrayDatastore is really just step one.
Either way, I appreciate your time. I spent about an hour plugging away at arrayDatastore and learned a lot about it!

Sign in to comment.


Garmit Pant
Garmit Pant on 7 Aug 2024
Hello Matthew,
You have followed the correct workflow to create a datastore for a model with multiple inputs and outputs. Such a datastore should store each input and output as a separate cell, so the output of the “read” method is an N x M cell array, where N is the total number of data items and M is the total number of inputs and outputs (6, in your case).
For numerical ‘double’ data, you can create a “FileDatastore” by specifying a read function to read the file contents. To achieve this, you need to follow these steps:
  1. Convert Training Data into Individual Cells: Each row in your data matrix will be split into six separate cells.
  2. Save the Converted Data: Save the cell arrays into a MAT-file.
  3. Load and Transform the Data: Use 'datastore' and 'transform' to load and reformat the data.
You can use the following code snippet to create the datastore:
% Assuming your data is in a matrix called `data`
% Each row of `data` is [input1, input2, input3, input4, output1, output2]
data = [
0.080157, 0.38585, 0.21999, 0.61851, 0, 0.010101;
0.123456, 0.654321, 0.111111, 0.222222, 0, 0.333333;
% Add more rows as needed
];
% Convert each row to a cell array of six separate cells
numRows = size(data, 1);
combinedCells = cell(numRows, 6);
for i = 1:numRows
combinedCells(i, :) = num2cell(data(i, :));
end
% Save the combined data
save('trainingData.mat', 'combinedCells');
% Create a FileDatastore
filedatastore = datastore('trainingData.mat', 'Type', 'file', 'ReadFcn', @load);
% Transform the datastore to extract combinedCells
trainingDatastore = transform(filedatastore, @rearrangeData);
% Test reading from the datastore
dataOut = read(trainingDatastore)
dataOut = 2x6 cell array
{[0.0802]} {[0.3859]} {[0.2200]} {[0.6185]} {[0]} {[0.0101]} {[0.1235]} {[0.6543]} {[0.1111]} {[0.2222]} {[0]} {[0.3333]}
function out = rearrangeData(ds)
out = ds.combinedCells;
end
For further understanding, kindly refer to the following MathWorks documentation:
  • Refer to the ‘Input Arguments’ section to understand how to correctly create different types of datastores: MathWorks Documentation.
I hope you find the above explanation and suggestions useful!
  1 Comment
Matthew
Matthew on 8 Aug 2024
So I guess I've got a follow-on qestion: Now that I have the transofrmed datastore, I can't read just one line per the "Develop Custom Datastore" datastore page. Using the read(ds) function spits out the whole datastore rather than a single 1x6 line:
>> read(val)
ans =
1000×6 cell array
{[0.0950]} {[0.4054]} {[ 0.2365]} {[0.6412]} {[ 0]} {[0.2222]}
{[0.0980]} {[0.4090]} {[ 0.2395]} {[0.6450]} {[ 0]} {[0.2626]}
{[0.1010]} {[0.4139]} {[ 0.0012]} {[0.2438]} {[ 0]} {[0.3131]}
Which is not valid input for trainnet. Attempting to train this thows the following error:
Training stopped: Error occurred
Error using trainnet (line 46)
Layer 'input': Invalid input data. Invalid size of channel dimension. Layer expects input with channel dimension size 4 but received input with size 1.
And attempting to set the read size of the datastore throws an error because the datastore has been transformed:
>> val.ReadSize = 1
Unrecognized property 'ReadSize' for class 'matlab.io.datastore.TransformedDatastore'.
Setting the readsize in the original datastore before transformation doesn't transfer to the transformed datastore.
So it appears that this puts me back at square 1, where I need a 1 x n cell array that just has doubles in it.
I wonder why MathWorks made this so hard to do?

Sign in to comment.

Categories

Find more on Parallel and Cloud in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!