effect by splitting data in training neural network

Hi, I would like to know the effect by splitting data when training neural network.
When I use 504x20000 training data X (20000 samples,each of which has 504dims) to train NN, what if I split them into, say, 100 parts and iterating "train" function?
Compared to training using all data, what would be the result? I mean like this;
for n = 1:100
dataind = (n-1)*200+1:n*200;
net = train(net, x(:,dataind),y(:,dataind));
end
instead of training all the data at one "train" function like;
dataind = 1:20000
net = train(net,x(:,dataind),y(:,dataind));
I'm asking this because of the memory problem when I attempted to train NN using all the data. Thank you in advance.

 Accepted Answer

Point by point training is done with the function adapt and is called adaptation.
help adapt
doc adapt
However, similar to training, very often success depends on initial random weights. Note below that successful initial weights for train can be disastrous for adapt.
close all, clear all, clc
[x,t] = simplefit_dataset;
[ I N ] = size(x) % [ 1 94 ]
[ O N ] = size(t) % [ 1 94 ]
vart1 = var(t,1)% Reference MSE = 8.3378
plot(x,t)
% The 4 local extrema in the plot suggests H = 4
% is optimal; Error depends on initial weights
tic
nett = fitnet(4); % H = 10 default
rng(0) % For replication
[nett tr yt et ] = train(nett,x,t);
% yt = net(x); et = t-yt;
NMSEt = mse(et)/ vart1 % 3.3966e-04
Rsqt = 1-NMSEt % 0.9997
trainingtime = toc % 3.3184
tic
neta = fitnet(4);
rng(0)
[neta ya ea xf af ar] = adapt(neta,x,t);
NMSEa = mse(ea)/ vart1 % 2.7851
NMSEa = ar.perf/vart1 % 2.7851
Rsqa = 1-NMSEa % -1.7851
adaptationtime = toc % 0.634
numtimesteps = ar.timesteps % 1
The difference in times are misleading because adapt was aborted after 1 timestep.
I think you have enough to do your own experimentation.
If you run into anything interesting, please post.
Hope this helps;
Thank you for formally accepting my answer
Greg

1 Comment

Sorry, my first answer, though interesting, does not directly address your question.
More later.
Greg

Sign in to comment.

More Answers (1)

Sorry, my first answer, though interesting, does not directly address your basic problem.
1. Distributions are well represented when the number of datapoints, N, is sufficiently larger than their dimension, I. A rule of thumb in statistics is N > 30*I. However, sometimes a factor of 10 or 15 is sufficient. Your data has a ratio ~ 40. So there is no problem there. In fact you might consider reducing N.
2. However, before reducing N, consider the fact that I = 504 might be ridiculously large. So, I think your 1st order of business is to try to reduce I. There are a variety of ways to do this. I do not favor PCA because it replaces inputs with linear combinations. Therefore, it is difficult to ascertain which inputs are the most important.
3. Determining the best inputs is easier to determine if BOTH INPUTS & TARGETS are standardized to zero-mean/unit-variance (ZSCORE or MAPSTD)
4. I would be satisfied with ranking inputs w.r.t. their weights in a LINEAR MODEL when BOTH input and output are standardized. MATLAB's PLSREGRESS might be a good choice.
5. Rather than trying to use all of your data to train one classifier, design an ensemble of nets and either average the outputs or design a linear classifier with the outputs.
Hope this helps.
Thank you for formally accepting my answer
Greg

1 Comment

I am so sorry to reply late. Thank you for your answer.
As for dimensionality, I need at least 500dims although you suggested It be reduced. Some tasks, like a De-noising Auto Encoder, need large number of dimension. DAE needs inputs and outputs with high-dimensional features (the size corresponds to fft size). I do not know how to deal with such a situation that inputs and outputs need high-dimensional features.

Sign in to comment.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Asked:

Ted
on 13 Sep 2016

Commented:

Ted
on 7 Oct 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!