A long short-term memory (LSTM) network is a type of recurrent neural network (RNN). LSTMs are predominantly used to learn, process, and classify sequential data because they can learn long-term dependencies between time steps of data.
How LSTMs Work
LSTMs and RNNs
LSTM networks are a specialized form of the RNN architecture. RNNs use past information to improve the performance of a neural network on current and future inputs. They contain a hidden state and loops, which allow the network to store past information in the hidden state and operate on sequences. RNNs have two sets of weights: one for the hidden state vector and one for the inputs. During training, the network learns weights for both the inputs and the hidden state. When implemented, the output is based on the current input, as well as the hidden state, which is based on previous inputs.
In practice, simple RNNs are limited in their capacity to learn longer-term dependencies. RNNs are commonly trained through backpropagation, in which they may experience either a vanishing or exploding gradient problem. These problems cause the network weights to become either very small or very large, limiting effectiveness in applications that require the network to learn long-term relationships.
LSTM Layer Architecture
LSTM layers use additional gates to control what information in the hidden state is exported as output and to the next hidden state. These additional gates overcome the common issue with RNNs in learning long-term dependencies. In addition to the hidden state in traditional RNNs, the architecture for an LSTM block typically has a memory cell, input gate, output gate, and forget gate. The additional gates enable the network to learn long-term relationships in the data more effectively. Lower sensitivity to the time gap makes LSTM networks better for analyzing sequential data than simple RNNs. In the figure below, you can see the LSTM architecture and data flow at time step t.
The weights and biases to the input gate control the extent to which a new value flows into the LSTM unit. Similarly, the weights and biases to the forget gate and output gate control the extent to which a value remains in the unit and the extent to which the value in the unit is used to compute the output activation of the LSTM block, respectively.
The following diagram illustrates the data flow through an LSTM layer with multiple time steps. The number of channels in the output matches the number of hidden units in the LSTM layer.
LSTM Network Architecture
LSTMs work well with sequence and time-series data for classification and regression tasks. LSTMs also work well on videos because videos are essentially a sequence of images. Similar to working with signals, it helps to perform feature extraction before feeding the sequence of images into the LSTM layer. Leverage convolutional neural networks (CNNs) (e.g., GoogLeNet) for feature extraction on each frame. The following figure shows how to design an LSTM network for different tasks.
Bidirectional LSTM
A bidirectional LSTM (BiLSTM) learns bidirectional dependencies between time steps of time-series or sequence data. These dependencies can be useful when you want the network to learn from the complete time series at each time step. BiLSTM networks enable additional training because the input data is passed through the LSTM layer twice, which can increase the performance of your network.
A BiLSTM consists of two LSTM components: the forward LSTM and the backward LSTM. The forward LSTM operates from the first time step to the last time step. The backward LSTM operates from the last time step to the first time step. After passing the data through the two LSTM components, the operation concatenates the outputs along the channel dimension.
Get Started with LSTM Examples in MATLAB
LSTM Applications
LSTMs are particularly effective for working with sequential data, which can vary in length, and learning long-term dependencies between time steps of that data. Common LSTM applications include sentiment analysis, language modeling, speech recognition, and video analysis.
Broad LSTM Applications
RNNs are a key technology in applications such as:
- Signal processing. Signals are naturally sequential data, as they are often collected from sensors over time. Automatic classification and regression on large signal data sets allow prediction in real time. Raw signal data can be fed into deep networks or preprocessed to focus on specific features, such as frequency components. Feature extraction can greatly improve network performance.
- Natural language processing (NLP). Language is naturally sequential, and pieces of text vary in length. LSTMs are a great tool for natural language processing tasks, such as text classification, text generation, machine translation, and sentiment analysis, because they can learn to contextualize words in a sentence.
Try the following examples to start applying LSTMs to signal processing and natural language processing.
Vertical LSTM Applications
As deep learning applications continue to expand, LSTMs are used in vertical applications such as:
Keep Exploring This Topic
Using LSTM Networks to Estimate NOx Emissions
Renault engineers used LSTMs in developing next-generation technology for zero-emissions vehicles (ZEVs).
They obtained their training data from tests conducted on an actual engine. During these tests, the engine was put through common drive cycles. The captured data, which included engine torque, engine speed, coolant temperature, and gear number emissions, provided the input to the LSTM network. After iterations on the design of the LSTM architecture, the final version of the LSTM achieved 85–90% accuracy in predicting NOX levels.
LSTMs with MATLAB
Using MATLAB® with Deep Learning Toolbox™ enables you to design, train, and deploy LSTMs. Using Text Analytics Toolbox™ or Signal Processing Toolbox™ allows you to apply LSTMs to text or signal analysis.
Design and Train Networks
You can design and train LSTMs programmatically with a few lines of code. Use LSTM layers, bidirectional LSTM layers, and LSTM projected layers to build LSTMs. You can also design, analyze, and modify LSTMs interactively using the Deep Network Designer app.
Import and Export Networks
Deploy Networks
Deploy your trained LSTM on embedded systems, enterprise systems, or the cloud:
- Automatically generate optimized C/C++ code and CUDA code for deployment to CPUs and GPUs.
- Generate synthesizable Verilog® and VHDL® code for deployment to FPGAs and SoCs.
Resources
Expand your knowledge through documentation, examples, videos, and more.
Related Topics
Explore similar topic areas commonly used with MATLAB and Simulink products.
30-Day Free Trial
Get startedSelect a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)