what are the different between test data and training data??

Question

syikin md radi on 12 May 2015

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/216336-what-are-the-different-between-test-data-and-training-data

Commented: NN on 9 Mar 2021

what are different between test data and training data

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Thomas Koelen on 12 May 2015

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/216336-what-are-the-different-between-test-data-and-training-data#answer_178767

In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually a dataset is divided into a training set, a validation set (some people use 'test set' instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.

1 Comment
Show -1 older commentsHide -1 older comments

ALAMGIR SARDAR on 27 Aug 2019

Thanks

Sign in to comment.

Answer 2

Walter Roberson on 12 May 2015

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/216336-what-are-the-different-between-test-data-and-training-data#answer_178776

To expand on this a small bit:

You run calculations on the training set to determine various coefficients.

You can then use the testing set to check how well the predictions do on a wider set of data, and that gives you information about false positives and false negatives.

You can use those accuracy figures to go back and re-train. You do not need to use the same division of training and test data each time: there is a common technique called "leave one out" where you deliberately drop one item at a time from the training set and re-calculate, in case that one was an outlier that was preventing getting a good overall result.

There is a nasty problem in doing classification called 'Overtraining": the calculations might fit the data you have on hand extremely well but be useless for anything else. Dividing into training and testing reduces this risk: if the algorithm has not seen a bunch of data in its calculations then it is not going to adjust itself to be exactly right for that data and bad for other things. Using all of your data to train with is therefor not a good idea.

After the program has gone back and forth on training sets and validation sets, and has decided on the best coefficients, where the data was allowed to affect the algorithm, then it is time to run it on the remaining data and produce a report. The rest of the data might not have a known classification, but it might. If the classifications are known then when the programmer looks at the report the programmer might decide it is time to change the program. Or might not. The report is the kind of thing that gets written up in a paper: we did this and that and with a limited subset of data to train and test with, we did this well on real data. Or perhaps you send it to the people designing the equipment and experiments so they can see what needs to be improved on their end. Eventually you publish the paper or write a report or the like, and other people read it and want to use your program too. But they aren't going to do that if you haven't established evidence that it is not over-training on the particular data you gave it -- and seeing how well it did on data that was not used to design the details of the algorithm is evidence.

2 Comments
Show NoneHide None

Isabel Hostettler on 15 Feb 2017

I've just read your answer, can I ask for advice/help or ask a question? I've come across the sentence: "quality of prediction was estimated to be good if the difference between the training and test dataset was <5 and acceptable if it was <10%". Now my question is, how did the person choose this difference to be good or acceptable, respectively? Is that the difference on always takes or is there a rule? A reference to relate to? Advice would be much appreciated. Isabel

NN on 9 Mar 2021

about leave one out part, how is it done ?is it by leaving one data point and taking the rest again as test data ?

Sign in to comment.

what are the different between test data and training data??

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

what are the different between test data and training data??

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None