Why is the input dimension different from the output dimension in the MATLAB documentation of version 2022b of the multihead self-attention mechanism?

Question

jie huang on 12 Jan 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1892800-why-is-the-input-dimension-different-from-the-output-dimension-in-the-matlab-documentation-of-versio

Answered: Himanshu on 31 Mar 2023

How to turn the multihead self-attention mechanism in wave2vec2.0 into a deep learning layer?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Himanshu on 31 Mar 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1892800-why-is-the-input-dimension-different-from-the-output-dimension-in-the-matlab-documentation-of-versio#answer_1205449

Open in MATLAB Online

Hello Jie,

As per my understanding, you want to know why the input dimensions of the multi-head self-attention mechanism differ from the output dimensions in MATLAB. You also want to know how to convert the multi-head self-attention mechanism in Wave2Vec 2.0 into a deep-learning layer.

There is a difference in the input and output dimensions of the multi-head self-attention mechanism because the input features are split into several heads, and each head attends to a different part of the input. Then, these attended features are concatenated and projected to create the output. Therefore, the output dimension may not always be the same as the input dimension.

To convert the multi-head self-attention mechanism in Wave2Vec 2.0 into a deep learning layer, you can create a custom deep learning layer in MATLAB. You can follow the below steps:

Define a class for the multi-head self-attention layer that inherits from the "nnet.layer.Layer" class. This class will contain properties for layer parameters (such as the number of heads, weights, etc.) and methods for the layer's forward and backward passes.
Implement the "predict" method using the multiheadSelfAttention function. You might need to adapt the function to work as a method within the custom layer class.
Implement the "backward" method to compute gradients for the layer. You will need to derive the gradients for the multi-head self-attention layer with respect to its inputs and learnable parameters.
You can use the Deep Learning Toolbox functions such as "layerGraph", "assembleNetwork", etc, to include it in your deep learning model.

classdef MultiheadSelfAttentionLayer < nnet.layer.Layer
    properties
        % Define properties for the layer here, such as the number of heads, weights, etc.
    end
    methods
        function layer = MultiheadSelfAttentionLayer(name)
            % Set the layer name and initialize properties
            layer.Name = name;
        end
        function Z = predict(layer, X)
            % Implement the forward pass of the layer using the multiheadSelfAttention function provided in the documentation
        end
        function dLdX = backward(layer, X, Z, dLdZ, memory)
            % Implement the backward pass of the layer to compute gradients
        end
    end
end