Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition
Compute the value of X in the equation A'AX = B for real-valued matrices using asynchronous Q-less QR decomposition
Since R2022b
Libraries:
Fixed-Point Designer HDL Support /
Matrices and Linear Algebra /
Linear System Solvers
Description
The Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition block solves the system of linear equations A'AX = B using asynchronous Q-less QR decomposition, where A and B are real-valued matrices.
When Regularization parameter is nonzero, the Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition block solves the matrix equation
where λ is the regularization parameter,
A is an m-by-n matrix, and
In =
eye(n)
.
This block operates asynchronously. The forward- and backward-substitution and Q-less QR decomposition run independently using the latest R and B matrices.
Examples
Implement Hardware-Efficient Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition
Implement a hardware-efficient solution to the real-valued matrix equation A'AX=B using the Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition block.
Ports
Input
A(i,:) — Rows of real matrix A
vector
Rows of real matrix A, specified as a vector. A is an m-by-n matrix where m ≥ 2 and m ≥ n. If B is single or double, A must be the same data type as B. If A is a fixed-point data type, A must be signed, use binary-point scaling, and have the same word length as B. Slope-bias representation is not supported for fixed-point data types.
Data Types: single
| double
| fixed point
B — Rows of real matrix B
vector
Rows of real matrix B, specified as a vector. B is an n-by-p matrix where n ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixed-point data type, B must be signed, use binary-point scaling, and have the same word length as A. Slope-bias representation is not supported for fixed-point data types.
Data Types: single
| double
| fixed point
validInA — Whether A(i,:)
input is valid
Boolean
scalar
Whether A(i,:)
input is valid, specified as a Boolean scalar.
This control signal indicates when the data from the A(i,:)
input
port is valid. When this value is 1
(true
) and
the readyA
value is 1
(true
),
the block captures the values at the A(i,:)
input port. When this
value is 0
(false
), the block ignores the input
samples.
After sending a true
validInA
signal, there may be some delay before
readyA
is set to false
. To ensure all data is
processed, you must wait until readyA
is set to
false
before sending another true
validInA
signal.
Data Types: Boolean
validInB — Whether B
input is valid
Boolean
scalar
Whether B
input is valid, specified as a Boolean scalar. This
control signal indicates when the data from the B
input port is
valid. When this value is 1
(true
) and the
readyB
value is 1
(true
),
the block captures the values at the B
input port. When this value
is 0
(false
), the block ignores the input
samples.
After sending a true
validInB
signal, there may be some delay before
readyB
is set to false
. To ensure all data is
processed, you must wait until readyB
is set to
false
before sending another true
validInB
signal.
Data Types: Boolean
restart — Whether to clear internal states
Boolean
scalar
Whether to clear internal states, specified as a Boolean scalar. When this value
is 1 (true
), the block stops the current calculation and clears all
internal states. When this value is 0 (false
) and the value at
validIn
is 1 (true
), the block begins a new
subframe.
Data Types: Boolean
Output
X — Matrix X
matrix | vector
Matrix X, returned as a matrix or vector.
Data Types: single
| double
| fixed point
validOut — Whether output data is valid
Boolean
scalar
Whether the output data is valid, specified as a Boolean scalar. This control
signal indicates when the data at output port X
is valid. When this
value is 1 (true
), the block has successfully computed the matrix
X. When this value is 0 (false
), the output
data is not valid.
Data Types: Boolean
readyA — Whether block is ready for input A(i,:)
Boolean
scalar
Whether block is ready for input A(i,:)
, returned as a Boolean
scalar. This control signal indicates when the block is ready for new input data. When
this value is 1
(true
) and
validInA
is 1
(true
), the
block accepts input data in the next time step. When this value is
0
(false
), the block ignores input data in the
next time step.
After sending a true
validInA
signal, there may be some delay before
readyA
is set to false
. To ensure all data is
processed, you must wait until readyA
is set to
false
before sending another true
validInA
signal.
Data Types: Boolean
readyB — Whether block is ready for input B
Boolean
scalar
Whether block is ready for input B
, returned as a Boolean
scalar. This control signal indicates when the block is ready for new input data. When
this value is 1
(true
) and
validInB
is 1
(true
), the
block accepts input data in the next time step. When this value is
0
(false
), the block ignores input data in the
next time step.
After sending a true
validInB
signal, there may be some delay before
readyB
is set to false
. To ensure all data is
processed, you must wait until readyB
is set to
false
before sending another true
validInB
signal.
Data Types: Boolean
Parameters
Number of rows in matrix A — Number of rows in matrix A
4
(default) | positive integer-valued scalar
Number of rows in matrix A, specified as a positive integer-valued scalar.
Programmatic Use
Block Parameter:
m |
Type: character vector |
Values: positive integer-valued scalar |
Default:
4 |
Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B
4
(default) | positive integer-valued scalar
Number of columns in matrix A and rows in matrix B, specified as a positive integer-valued scalar.
Programmatic Use
Block Parameter:
n |
Type: character vector |
Values: positive integer-valued scalar |
Default:
4 |
Number of columns in matrix B — Number of columns in matrix B
1
(default) | positive integer-valued scalar
Number of columns in matrix B, specified as a positive integer-valued scalar.
Programmatic Use
Block Parameter:
p |
Type: character vector |
Values: positive integer-valued scalar |
Default:
1 |
Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar
Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to least-squares estimates.
Programmatic Use
Block Parameter:
regularizationParameter |
Type: character vector |
Values: real nonnegative scalar |
Default:
0 |
Output datatype — Data type of output matrix X
fixdt(1,18,14)
(default) | double
| single
| fixdt(1,16,0)
| <data type expression>
Data type of the output matrix X, specified as
fixdt(1,18,14)
, double
,
single
, fixdt(1,16,0)
, or as a user-specified
data type expression. The type can be specified directly, or expressed as a data type
object such as Simulink.NumericType
.
Programmatic Use
Block Parameter:
OutputType |
Type: character vector |
Values:
'fixdt(1,18,14)' | 'double' |
'single' | 'fixdt(1,16,0)' |
'<data type expression>' |
Default:
'fixdt(1,18,14)' |
Algorithms
Choosing the Implementation Method
Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.
Implementation | Throughput | Latency | Area |
---|---|---|---|
Systolic | C | O(n) | O(mn2) |
Partial-Systolic | C | O(m) | O(n2) |
Partial-Systolic with Forgetting Factor | C | O(n) | O(n2) |
Burst | O(n) | O(mn) | O(n) |
Where C is a constant proportional to the word length of the data, m is the number of rows in matrix A, and n is the number of columns in matrix A.
For additional considerations in selecting a block for your application, see Choose a Block for HDL-Optimized Fixed-Point Matrix Operations.
Synchronous vs Asynchronous Implementation
The Matrix Solve Using QR Decomposition blocks operate synchronously. These blocks first decompose the input A and B matrices into R and C matrices using a QR decomposition block. Then, a back substitute block computes RX = C. The input A and B matrices propagate through the system in parallel, in a synchronized way.
The Matrix Solve Using Q-less QR Decomposition blocks operate asynchronously. First, Q-less QR decomposition is performed on the input A matrix and the resulting R matrix is put into a buffer. Then, a forward backward substitution block uses the input B matrix and the buffered R matrix to compute R'RX = B. Because the R and B matrices are stored separately in buffers, the upstream Q-less QR decomposition block and the downstream Forward Backward Substitute block can run independently. The Forward Backward Substitute block starts processing when the first R and B matrices are available. Then it runs continuously using the latest buffered R and B matrices, regardless of the status of the Q-less QR Decomposition block. For example, if the upstream block stops providing A and B matrices, the Forward Backward Substitute block continues to generate the same output using the last pair of R and B matrices.
The Burst (Asynchronous) Matrix Solve Using Q-less QR Decomposition blocks are available in both synchronous and asynchronous operation variants, as denoted by the block name.
AMBA AXI Handshake Process
This block uses the AMBA AXI handshake protocol [1]. The valid/ready
handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid
signal indicates when data is available. The ready
signal indicates that the block can accept the data. Transfer of data occurs only when both the valid
and ready
signals are high.
Block Timing
The Burst Asynchronous Matrix Solve Using Q-less QR Decomposition blocks accept matrix A row-by-row and matrix B as a single vector. After accepting the first valid pair of A and B matrices, the block outputs the X matrices row by row continuously. The matrix is output from the first row to the last row.
For example, assume that the input A matrix is 3-by-3. Additionally
assume that validIn
asserts before ready
, meaning that
the upstream data source is faster than the QR decomposition.
In the figure,
A1r1
is the first row of the first A matrix,A1r2
is the second row of the first A matrix, and so on.validIn
toready
— From a successful A row input to the block being ready to accept the next row.validOut
tovalidOut
— Because the Forward Backward Substitution block runs continuously, it generates output at a constant rate. This is the delay between two adjacent valid outputs.Last row
validIn
tovalidOut
— From the last mth row input to the block starting to output the solution.This block is always ready to accept B matrices, so
readyB
is always asserted.
The following table provides details of the timing for the Real Burst Asynchronous Matrix Solve Using Q-less QR Decomposition block. Latency depends on the size of matrix A and the data types of the A and B matrices. In the table:
n is the number of columns in matrix A.
wl represents the word length of the input data in matrix A.
Input Data Type | validIn to ready (cycles) | validOut to validOut (cycles) | Last Row validIn to validOut
(cycles) |
---|---|---|---|
Fixed point fi | (wl + 5)*n + 2 + (n + 1) | 4*n2 + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl) | 4*n2 + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl) + (wl + 5)*n + n |
Scaled double fi | (wl + 5)*n + 2 + (n + 1) | 4*n2 + 23*n + 5 + 2*n*wl | 4*n2 + 24*n + 5 + 2*n*wl + (wl + 5)*n |
double | 59*n + 3 | 4*n2 + 21*n + 5 | 4*n2 + 80*n + 5 |
single | 30*n + 3 | 4*n2 + 21*n + 5 | 4*n2 + 51*n + 5 |
Hardware Resource Utilization
This block supports HDL code generation using the Simulink® HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).
This example data was generated by synthesizing the block on a Xilinx® Zynq® UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado® v.2020.2 (win64).
The following parameters were used for synthesis.
Block parameters:
m = 16
n = 16
p = 1
Matrix A dimension: 16-by-16
Matrix B dimension: 16-by-1
Input data type:
sfix16_En14
Target frequency: 250 MHz
The following tables show the post place-and-route resource utilization results and timing summary, respectively.
Resource | Usage | Available | Utilization (%) |
---|---|---|---|
CLB LUTs | 16131 | 425280 | 3.79 |
CLB Registers | 21469 | 850560 | 2.52 |
DSPs | 4 | 4272 | 0.09 |
Block RAM Tile | 0 | 1080 | 0.00 |
URAM | 0 | 80 | 0.00 |
Value | |
---|---|
Requirement | 4 ns |
Data Path Delay | 3.544 ns |
Slack | 0.437 ns |
Clock Frequency | 280.66 MHz |
References
[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/AMBA-AXI3-and-AXI4-Protocol-Specification/Single-Interface-Requirements/Basic-read-and-write-transactions/Handshake-process
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Slope-bias representation is not supported for fixed-point data types.
HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General | |
---|---|
ConstrainedOutputPipeline | Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is
|
InputPipeline | Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
OutputPipeline | Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
Supports fixed-point data types only.
Fixed-Point Conversion
Design and simulate fixed-point systems using Fixed-Point Designer™.
Version History
Introduced in R2022b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)