Choosing Machine Learning Algorithms

Which Machine Learning Algorithm Is Right for You?

You have data and an application, but which algorithm should you try first? There are tradeoffs no matter what you choose. Here are some basic principles to get you started.

Table of Contents

Datasets
Training Speed
Interpretability
Tuning

Download cheat sheet

Size of Your Dataset

Algorithms are very sensitive to the size of your dataset. While there are no absolute rules that dictate which algorithm should be used for datasets under 50 MB or over 1 TB, here are the algorithms you may want to start with given the amount of data you have and assuming your sample dataset is balanced.

Small

Decision trees
Linear models (including logistic regression and linear discriminant)

Medium

(Nonlinear) SVM
Naïve Bayes
Nearest neighbor
Neural network (shallow)

Large

Deep nets
Ensembles

Training Speed

Training speed is how long the new model takes to build and train for a given computational resource. Factors like algorithm architecture and complexity (among others) affect how quickly the model will train. Here are algorithms to consider if your project is very sensitive to training speed and you don’t have acceleration hardware.

Very fast

Decision trees
Linear models (including logistic regression and linear discriminant)
Naïve Bayes

Moderately fast

Ensembles
Nearest neighbor
Neural network (shallow)

Moderately slow

(Nonlinear) SVM

Very slow

Deep nets

Learn more about training speed

Interpretability

Machine learning models can be non-intuitive and difficult to understand. Interpretability refers how transparent the algorithm’s decision-making process is. However, interpretability often comes at the expense of power and accuracy. Different industries and applications can also have specific requirements around interpretability. To get you started, here are some basic ratings on how easy or difficult to interpret the algorithms are.

Easy to interpret

Decision trees
Linear models (including logistic regression and linear discriminant)

In the middle

Nearest neighbor
Neural network (shallow)
Naïve Bayes

Difficult to interpret

(Nonlinear) SVM
Ensembles
Deep nets

Learn more about interpretability

Tuning

Tuning is when you optimize the parameters or hyperparameters of a specific model to find the best result for your model. Some algorithms don’t want to be tuned, and limit the number of parameters or hyperparameters you can change to optimize it for your application. After you choose a particular type of model to train, you can automatically change the parameters that strongly affect its performance to optimize your model. How much tuning do you want to be able to perform?