Convolutional Neural Network

A convolutional neural network (CNN, or ConvNet), is an architecture commonly used for deep learning. CNNs are often used to recognize objects and scenes, and perform object detection and segmentation. They learn directly from image data, eliminating the need for manual feature extraction. The use of CNNs for deep learning has become increasing popular due to three important factors:

  • It eliminates the need for manual feature extraction—the features are learned directly by the CNN.
  • It produces state-of-the-art recognition results.
  • CNNs can be retrained for new recognition tasks and allow for building on pre-existing networks.

Deep learning workflow. Images are passed to the CNN, which automatically learns features and classifies objects.

How CNNs Work

A convolutional neural network can have tens or hundreds of layers that each learn to detect different features of an image. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer. The filters can start as very simple features, such as brightness and edges, and increase in complexity to features that uniquely define the object as the layers progress.

Example of a network with many convolutional layers. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer.

Hardware Acceleration with GPUs

A convolutional neural network is trained on hundreds, thousands, or even millions of images. When working with large amounts of data and complex network architectures, GPUs can significantly speed the processing time to train a model. Once a CNN is trained, it can be used in real-time applications, such as pedestrian detection in advanced driver assistance systems (ADAS).

CNNs for Object Recognition

One method for creating a convolutional neural network to perform object recognition is to train a network from scratch. The architect is required to define the number of layers, the learning weights, and number of filters, along with other tunable parameters. Training an accurate model from scratch also requires massive amounts of data, on the order of millions of samples, which can take an immense amount of time to train.

A common alternative to training a CNN from scratch is to use a pretrained model to automatically extract features from a new data set. This method, called transfer learning, is a convenient way to apply deep learning without a huge data set and long computation and training time.

Using MATLAB and Neural Network Toolbox enables you to train your own convolutional neural network from scratch or use a pretrained model to perform transfer learning.

Products that supporting using CNNs for image analysis include MATLAB®, Computer Vision System Toolbox™, Statistics and Machine Learning Toolbox™, and Neural Network Toolbox™.

Software Reference

See also: Statistics and Machine Learning Toolbox, Neural Network Toolbox, Computer Vision System Toolbox, deep learning, machine learning, object detection, object recognition, feature extraction, image recognition, pattern recognition, predictive analytics, data analytics, MATLAB GPU computing