Using Sparse Matrix in Series Network

4 views (last 30 days)
Rachel Bennett
Rachel Bennett on 7 Oct 2022
Commented: Shivam Malviya on 12 Oct 2022
I'm attempting to use a Series Network for a classification problem. However, I'm finding that the data I'm using currently is far too large. To deal with this I've been attempting to use the sparse function to turn them into sparse matrices, but I keep getting the error message "Invalid input data type". I've written a smaller script with a simpler data file to test it out, and even switched to a regression problem to see if the issue was the way I was changing the classification problem. However, even if I can get this simple problem to run, trying to change the input to sparse and input into the training function causes errors. Is it possible to use sparse matrices in series networks? Or am I out of luck?
I've attached a test file that contains the very simple script for an example. The dataset being used is the "accident.mat" file from MATLAB.

Answers (1)

Shivam Malviya
Shivam Malviya on 11 Oct 2022
Hi Rachel,
I understand that you want to work with a large dataset. To reduce the size of the dataset, you are converting it into a sparse matrix.
Is it possible to use sparse matrices in series networks?
No, trainNetwork doesn't support sparse matrix. That said, I have informed the concerned team about this.
To handle large datasets with neural networks, you may use Datastores. I have attached a sample script which loads one row at a time from the hwydata variable in accidents.mat file.
Please refer to the following links for more information;
I hope the above information helps.
  2 Comments
Rachel Bennett
Rachel Bennett on 11 Oct 2022
This is very helpful thank you, though I'm sorry to learn that I cannot use sparse matrices. I tested the file you sent, and it works, although on a much larger problem it seems to take a long time to run. Additionally, what if I were to change it back to a classification problem (like my original problem?) When I do that, I get the error "Invalid training data. The output size (2) of the last layer does not match the number of classes of the response (1). Is this due to the fact that we're reading in only one line of data at a time? I've attached my sample code, which is just changed slightly from your original example.
Shivam Malviya
Shivam Malviya on 12 Oct 2022
Hi Rachel,
"On a much larger problem, it seems to take a long time to run."
  • We can improve the performance by making the batch size and the read size equal. I have updated the script. Please find the attached script.
I could not execute the attached script because I don't have "preTexas.mat". Could you please share that file?
Also, is it using less memory now?
  • If not, make sure that your MAT file version is 7.3. As other versions do not support partial loading.
  • You may use the following workflow to check the version
  • Execute the following command
type <MAT-File>.mat
  • Check the comment at the beginning of the file.
  • Make sure it is like the one below.
MATLAB 7.3 MAT-file
  • If it is not, do as follows;
load('<MAT-File>.mat');
save('mycopy.mat','-v7.3');
Please find the attached script.
I hope this helps!

Sign in to comment.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!