Hardware-Efficient Linear Algebra for Radar and 5G - MATLAB & Simulink
Video Player is loading.
Current Time 0:00
Duration 45:13
Loaded: 0%
Stream Type LIVE
Remaining Time 45:13
 
1x
  • Chapters
  • descriptions off, selected
  • en (Main), selected
    Video length is 45:13

    Hardware-Efficient Linear Algebra for Radar and 5G

    Overview

    This seminar will demonstrate how to design and implement the minimum-variance distortionless-response (MVDR) adaptive beamforming algorithm on the Xilinx® Zynq UltraScale+™ RFSoC platform. By following a Model-Based Design strategy, we show how the MVDR beamformer can be articulated in MATLAB, Simulink, Fixed-Point and HDL. We further demonstrate a prototyping workflow by deploying the beamformer to the ZCU111 RFSoC evaluation board for run-time testing, debugging and visualization.

    Highlights

    • Live demo – MVDR beamformer running of ZCU111 RFSoC board
    • Theory – QR vs. Cholesky Matrix Decomposition
    • Implementation – Latency/area tradeoffs

    About the Presenters

    Tom Mealey is a Senior Application Engineer at MathWorks. He supports customers in adopting Model-Based Design to accelerate the design and deployment of complex control and algorithmic applications including communications, radar, computer vision, audio, and deep learning. He is an expert on the design and implementation of applications targeting the Xilinx Zynq platform and other SoC devices. Prior to working at MathWorks, Tom worked at the Air Force Research Laboratory’s Sensors Directorate, where he was involved in projects implementing digital signal processing on FPGAs and SoCs. Tom earned his B.S. in Computer Engineering from the University of Notre Dame and his M.S. in Electrical Engineering from the University of Dayton.

    Tom Bryan has been a developer at MathWorks since 1997 where he created Fixed-Point Toolbox.  He worked at Raytheon Electromagnetics Systems Division in Goleta, California from 1984 to 1996, where he worked on radar applications, and received a PhD in Electrical and Computer Engineering at University of California, Santa Barbara on a Raytheon Miccioli Scholarship.  Tom has been awarded 10 patents and is the inventor of the fi object, cast(a,'like',b), and datatype visualization features that are used in this workshop.

    Recorded: 22 Jun 2021

    Hello, welcome to this webinar, Hardware-Efficient Linear Algebra for Radar and 5G. My name is Tom Mealey. I'm an application engineer at MathWorks, and I'm joined today by my colleague Tom Bryan in development. And so today we're going to cover how to implement complex algorithms that require linear algebra on FPGA or ASIC hardware, using HDL coder and Fixed Point Designer.

    So just briefly, an agenda of what we're going to talk about today. We'll do a quick overview of some of the applications that would require linear algebra, radar, comms, and wireless, and then we're going to show a hardware demo for RFSoC, implementing a beamforming algorithm. Then Tom Bryan is going to cover a bit of the theory and implementation behind this beamforming algorithm in MVDR, and how we implemented it for FPGA.

    And then, finally, I'm going to go into a deep dive of the Simulink models that actually show the HDL coder implementation. We'll talk about the final resource mapping and utilization for the RFSoC demo. So the algorithm specifically that we're talking about today is an adaptive beamformer. And so what separates adaptive beamforming from traditional fixed-weight beamforming is that the algorithm considers the received data in real time and selects the optimal weights, based on receive statistics.

    And so what it does is place nulls in the beam pattern automatically at the interference angles, wherever we're receiving energy not in the direction of our steering angle. And so some of the applications that we see this in would be, of course, radar, right? You are focusing on one target and you have interference coming from another direction. You want to be able to suppress that interference automatically, and increase your angular resolution on the target in doing so.

    In the wireless space, 5G, actually beamforming is a part of the spec. And so being able to beamform and communicate with a multitude of users is going to ultimately improve your throughput and coverage of your system. So now I'm going to dive into the hardware demo here that I have for an RFSoC board.

    And specifically this is the ZCU111 gen one eval kit. And so the system that I'm showing today, we are transmitting a signal of interest, which is going to be a QPSK signal. And I'm also transmitting an interferer, right, and I'm applying steering weights to both of those signals, and then summing them and transmitting them out of four channels. I loop that back on my board into four receive channels, and then I apply my MVDR adaptive beamformer on the receive channels, to recover my signal of interest and null out the interfere.

    And the way that I'm going to show this running is directly in MATLAB. So I have a little MATLAB app we're going to see in a second. And I'm using a feature of HDL coder called FPGA I/O to communicate with the hardware and configure registers on the device.

    So here I'm showing the adaptive beamformer running live on my RFSoC hardware. All right, so what we're looking at here, this thick blue line is the beam pattern produced by the adaptive beamformer, the MVDR beamformer. And this dashed line is the pattern that would be produced by a fixed beamformer.

    All right, so we see here that we've got a null right in the direction of the interferer. And my signal right here at about negative 45 degrees azimuth, lines up with my steering angle, and so we get about 0 dB of

    gain in that direction. All right? So let's see what happens now if we steer the interferer to, let's see, the middle of one of these side lobes. I'm going to try and go at about 30 degrees here.

    So normally, with a fixed weight beamformer, right, we'd be getting about negative 6 dB off from the main lobe here. But here we're seeing the effect of the adaptive beamformer, and the power of the algorithm, right? We can automatically place a null in the direction of the interference, right, where the gain in our target direction is still maximized.

    OK, and so now I'm going to go ahead and steer my beamformer here, to, let's see, about 20 degrees. All right, so we see what happens now is both the interferer and my signal are nulled out, right? So now we see the constellation diagram from my QPSK signal break, right?

    So then, when I steer this back, the signal lines up with the steering angle of the beamformer, we're going to see that the constellation diagram gets recovered, all right? So there we go. So again, we have maximized the response in our target direction and automatically placed a null in the direction of this interfere, right? Now, I'm going to pass it to Tom Bryan. He's going to talk about the theory behind the adaptive beamformer, and some of the linear algebra involved in that, and how we can implement it for FPGA.

    Hi, I'm Tom Bryan. I've been a developer at MathWorks for 24 years. And I'll be talking about the algorithm and implementation of the beamformer. The beamformer that we implemented today is the one described in this 1996 paper by Charles Rader. A simple setup for the beamformer problem is to use a linear array with equally spaced antenna elements. If a sinusoidal signal impinges on the antenna array, then the signal arrives at each antenna element offset in phase.

    The phase offset is described by these formulas. The measurements are made in phase and quadrature to make a complex signal. The end phase measurement is the real part and the quadrature, or 90 degree phase offset measurement, is the imaginary part. To attenuate noise, we may take many more samples than there are antenna elements. The measurements form a column vector, and the complex conjugate transpose of the measurement vectors form the rows of the measurement matrix A.

    Complex conjugate transpose is also called Hermitian transpose, after Charles Hermite, a French mathematician, who lived in the 1800s. That is why the letter H for that transpose. In MATLAB the single quote is Hermitian transposed for complex matrices and transposed for real matrices. The notation we will use today is D is the element spacing, theta is the direction of arrival, lambda is the wavelength. And phi is the random phase of the source signal.

    Often you see short, fat measurement matrices, organized so each column is one sample reading. To unify with standard linear algebra notation for the least squares problems, we transpose the data matrix, so that it is tall and skinny. Each row is one sample reading. There are many more rows than columns. In other words, there are many more readings than there are antenna elements.

    A transpose A is the covariance matrix. For a random, for a uniformly spaced linear array, this is the steering vector. It's a vector of complex exponentials, pointing in the direction of the angle of interest. The phased array toolbox has a steering vector function that computes this for different configurations of antenna arrays.

    Given the measurement matrix A, and steering vector B, you can compute the minimum variance distortionless response beamformer with these formulas. Solve for x in the covariance matrix formula, compute the weight vector by normalizing the solution x with the inner product of the steering vector B

    and x. The MVDR response is the inner product of the weight vector and a new reading from the antenna array.

    We have working examples that Tom Mealey and I will show you in a few minutes. This beamformer passes signals in the direction of the steering vector and attenuates signals from all other directions. You can compute the MVDR beamformer in MATLAB with these three lines of code. Form the covariance matrix A transpose A, and use the backslash or left matrix divide to solve for x.

    The next two steps are simple inner products and a divide. Another way to solve for x is to compute the upper triangular economy size QR factor of A, and forward and backward substitute. I'll show you why it's sometimes called Cholesky in a couple of slides. This matrix solution is the hard part in hardware.

    When solving using the covariance matrix, resist the urge to use the inverse, even though that's what you always see in formulas and books. With built-in floating point types, always use backslash in MATLAB, which is smart about recognizing the form of the matrix and using the most efficient and numerically robust method. Using inverse is the equivalent of computing the reciprocal and multiplying for scalar problems.

    Here are three examples of the problems with inverse. First, it introduces unnecessary round-off errors. In solving 3x equals 21, you can divide 21 by 3 and get 7 exactly. If you convert, compute the inverse, 0.333, then you would need an infinite number of digits to get an exact result. Wherever you cut it off, you would get unnecessary round-off errors.

    You can see that this is not 7. Another problem, it can introduce huge differences in dynamic range. In the next two problem, the answer is 2, and the left and right hand sides have similar dynamic range. If you compute the inverse of a big number, then it will be a small number. And it may underflow. And even if it doesn't underflow, then you still have dynamic range issues, in fixed point.

    If you compute the inverse of a small number, then it gets big. It may overflow. And even if it doesn't overflow, then you also have dynamic range issues in fixed point. In solving a matrix equation with a symmetric positive definite matrix, which A transpose A is, then MATLAB will recognize this and compute the Cholesky factorization. The Cholesky factorization of A transpose A is an upper triangular matrix R. And R transpose R is equal to A transpose A.

    So to solve the system of equations, A transpose A x equals b, you can compute that Cholesky factorization of A transpose A. Then the solution is computed with forward and backward substitution. It's called backward substitution, because R is upper triangular, and you start dividing from the lower right corner and substituting up, like you did in high school, after zeroing out the sub-diagonal elements and solving a system of equations.

    It's called forward substitution, because R transpose is lower triangular, and you start dividing from the upper left corner. When solving with the covariance matrix A transpose A, then you can use this identity. The R from the Cholesky factor of A transpose A is equal to the R from the QR factorization of A alone. R equals the Cholesky factor of A transpose A means that A transpose A equals R transpose R.

    And QR equal to the QR factorization of A means that Q times R equals A, and so A transpose A is equal to R transpose Q transpose QR. Q is orthogonal, and so Q transpose Q is the identity. And so it is absorbed in A transpose A equals R transpose R. And so you can see that they are the same, theoretically.

    But there will be numerical differences using a finite computer, which they all are. Computing A transpose A squares the condition number of A. Squaring makes big numbers get bigger and small numbers get

    smaller. But Tom Mealey was able to get faster updates in hardware using A transpose A in the example we will show you in a few minutes. So more efficient hardware made it worth doing.

    Computing R directly from A has the advantage that it's more well-conditioned. You can see from this example that QR and Cholesky are identical up to four decimal places. We're using the fixed point qless QR function here. We also have fixed line functions for computing Q if you need it. In this problem, we don't need the orthogonal factor Q.

    So, to put it all together, here's the covariance matrix equation to solve. In MATLAB you could do the QR decomposition. The tilde means that you don't use the Q output. You just use the upper triangular factor R. The 0 means that it's the economy size QR, so R is squared in n by n. Then forward and backward solve for x, using R.

    In fixed point, you can use the qless QR matrix solve function, which puts it all together. This function supports both fixed point and floating point, if you want to switch back and forth while designing a system. And it supports fixed point C co-generation.

    This is the equivalent Simulink block for the equations that we just saw. The simulation behavior is bit faithful to the generated HDL running on hardware. Bit faithful is why I named the fixed point type in MATLAB phi, after the notation used by J.H. Wilkinson. It's the first two letters of fixed point.

    J.H. Wilkinson was the founding father of numerical computing, and after the Marine Corps motto, Semper Fi, which is Latin for always faithful. This block runs in Simulink with the same latency as the generated HDL running on hardware. The block has valid in, valid out, and ready lines for hardware.

    In applications like a beamformer, you can continuously stream data in as it arrives from the sensor array without waiting for a certain number of rows. But for computing an answer, each row is factored in as it is it arrives in this block. So the magnitude doesn't grow without bound, a forgetting factor is applied after each factorization.

    So old data has less and less effect as new data arrives. The qless QR factorization with forgetting factor is used inside this matrix solve block. Proficient fixed point C-code generation in systems engineering, we have MATLAB functions that are equivalent to the blocks, to solve for QR, solving a matrix solve problem with QR, updating the QR, and computing the QR directly.

    Today we just talked about one of the linear system solvers. We also have a suite of them for different applications. They're in Simulink library browser and Quick Insert. In a Simulink model, if you start typing the name of one of the blocks, then Quick Insert gives you a selection of a block to insert. These are the matrix factorizations that are used in the linear system solvers. They are also in Simulink browser, Simulink library browser, and Quick Insert.

    You can find articles that say that it's not possible to solve linear systems of equations with fixed point. So that got me wondering why so many people have been doing it successfully for so many years. Charles Rader demonstrated it in 1996 in the paper we showed at the beginning of my presentation.

    So that got me thinking about why it works. Jenna Warren and I found methods to determine fixed point types that can be proven to have given precision, and very high probability that they won't overflow. We filed a patent for it. We wrote these functions that do it. And we published our methods in our MathWorks documentation. These functions are new and released 2021 b, which came out in prerelease this month, June 2021.

    So now let's show it in action, in an example in the documentation. You can find this example by searching documentation, or in MVDR, and then selecting the first example with fixed point in the title. So all of this is what we've been talking about today.

    When you run the example, when you run the Simulink model on the example, you can see this, when the signal of interest, which is indicated by this green line, is in the same direction as the steering angle, indicated by the blue line, they're overlaying each other. Then you have a gain of 0 dB in that direction. We have noise, interfering noise signals, coming from, indicated by these signals in the red direction here.

    And you can see that they're nulled. We moved the noise signalling around. We can see that, as soon as the system settles, that that noise signal is again attenuating. And this is very much like the RSC is attenuated here. It's very much like you're at a party, and you're facing someone, and your two ears are your sensor array, and that person, that we're interested in, walks away.

    Here, this is the green line here, walking away from where we're facing. Then we can see that, here at the green line, the person we're interested walks away from where the direction we're talking. So we're still listening in this direction. We're listening to basically background noise here. It's still got a gain of 0.

    But the signal that used to be of interest to us has walked away. It's no longer in the direction that we're facing. And it's being considered noise at this point, and it's being completely nulled. So let's dive down into this model and look at how it's constructed.

    So these form up the equations that we've just been looking at. This is the matrix solve, the weight update, and the system response are just inner products. So those are fairly easy to implement, and we won't go looking at those. What we are interested in is how this matrix solve is done, and we've done that in a white box method, or implemented as white box, in that you can see into how we've implemented it.

    So if you click on this little down arrow here, look inside mask, we're using the shortcut key to that, you can see that that's implemented just like we've shown, with a QR to composition on this side, that is doing a complex partial systolic thing here with the forgetting factor, to factor in each row of A as it arrives. Then we do the forward and backward substitution over here.

    We then dive down into here. We can further see how it's implemented. It's been pipelined inside this for each block here, the computation for zero now below the diagonal, as we continue to look inside. We want to look, keep drilling down, at how it's implemented.

    It was implemented with CORDIC rotations. And so if you look down inside that, you can see all along the Simulink has got delays that, so inside between each pair of delays, we have a minimum amount of computation going on. In that way, we've been able to make it run at a very high clock speed, and in the hardware. So if we keep drilling down into that, then we can see here, the MATLAB code that was used.

    And this is inside a MATLAB function block. And so this code I'll shift. You can drill down and look into it, and see and maybe borrow some of these techniques and use them in your own designs, if we don't ship exactly what you need.

    So we wanted to look at the actual thing that does the math here in this QR rotation. So this is the basic processing element for all of this. We're zeroing out below the sub-diagonal element, using CORDIC rotations, if you're familiar with those. And it's just made up of shifts, and adds or subtracts, depending on which way we want to rotate.

    And so with this we're able to both do the complex, making things from complex to real, and we're able to zero out everything underneath the diagonal for this. There's no sines and cosines being computed to do

    the Givens rotations, and CORDIC is a way to do Givens rotations. But there's no sines and cosines that we're using here, to make it more efficient in hardware.

    And we're not using any divisions to zero things out beneath the diagonal. So now I'm going to pass it over back to Tom Mealey, who's going to continue the talk with the HDL coder implementation with a communications example.

    All right. Thank you, Tom. Now for this last section, I'm going to do a little bit of a deep dive into the HDL coder implementation of the MVDR beamformer shown running on my RFSoC board. And we're going to walk through the Simulink model, point out a couple of details there, and then finally cover the resource mapping and utilization for that design.

    Now, if you are an algorithm engineer, or an FPGA engineer yourself, you know that implementing complex digital signal processing on FPGA is not an easy task, a lot harder than writing software usually. So some of the challenges associated with that traditionally would be fixed point math, right? How do you take the floating point algorithm and convert it to fixed point, making trade offs between performance and area, getting the design to fit on the chip while also meeting your performance requirements, dealing with data rate versus the clock rate, having to overclock your design potentially in order to save resources, and then, finally, dealing with your project timeline and schedule, right? Making sure that, for these complex projects and designs, you're able to get something working, running, verified, all within usually a tight, tight timeline. And so our answer to some of these challenges here at MathWorks is a workflow based on HDL coder and fixed point designer.

    So we like to teach our customers who are new to the tool in the workflow, this five step process, where we start with a MATLAB reference code for your algorithm that you want to implement, then we convert that to a Simulink design that reflects the underlying hardware architecture of the design. We convert the design from floating point to fixed point, and then generate HDL code, sometimes making some optimizations in order to get the design to use the right amount of resources, and then finally verification and deploying and targeting your FPGA hardware, and then, throughout this process, using some of the tools that we have to do verification at every step of the way, maintain traceability between each of the iterations of your design. All right.

    So when I set out to develop this beamforming demo, you know, I started like the process I just showed. I took some MATLAB code, right? So this code that I'm showing here implements the MVDR beamformer, right? So I actually just looked up the source code for MATLAB and found something like this.

    So we could see it's actually really simple, and only four lines of code here. We're actually hiding some of the complexity, in that these variables here are vectors and matrices. But I'll get into that in a second.

    And so, as I had this MATLAB code, began to try to elaborate and understand, really, what was going on here. So we look at that first line. This is actually just a matrix vector multiplication. And so in terms of hardware implementation, mapping that to an FPGA, that just multiplies, adds, taking the absolute value, right? That's really simple. So that's not too bad there.

    I'll skip the second line. I'll go to this third line here. So this is actually very similar. This is a matrix vector multiply and scalar divide. So here we have, again, multiply, add, absolute value, and then divide, traditionally, is kind of a scary thing for FPGA. But HDL coder, I know, provides a reciprocal block, so we can just, in terms of implementation, that's pretty straightforward, right?

    So then the last line here, same thing, really just matrix vector multiplication, all pretty straightforward. Now we get to the second line here, and this operator, this is our matrix division, right? This is a

    challenging task, traditionally not something that you can just whip up in an afternoon, let's say. But, thanks to a new library that we introduced in release 20b, we actually now have a block that implements the matrix solve, in a variety of different ways, as Tom covered.

    And so, you know, just like these other blocks that I'm showing here, adds, multiplies, reciprocals even, we now have a block that implements this matrix left divide, matrix solver, that now I can just handle this with essentially a library block. So in terms of implementation, this was a huge timesaver, on my end. All right, so now I'm going to go into the model itself.

    And we're going to look at how I wired up some of those blocks that I just showed. So I have here my top level model, which implements a test bench for my MVDR HDL block. All right? So I'm going to go ahead and run this. And the way I have it set up, is I have a primary signal, which is QPSK, and an interfering signal, which go through a four channel receiver.

    And now I have here three separate paths to look at the output. So I have in one case, I'm just summing my four received channels. And so what we get here is just basically the spectrum is superimposed on top of each other. So we have our interfering signal producing this kind of hump in the middle of the spectrum.

    And if we tried to decode the QPSK primary signal, it's not going to work, right? So we see here, in the constellation diagram, that's completely broken. Now in the second path, I'm implementing the beamformer here in MATLAB code. So this represents my reference algorithm, right?

    And you see here, this is what we expect. So we just get the primary signal, no interference, and our constellation diagram looks great. Now this third path is the HDL compatible version of the beamformer. So we see just visually, comparing these two, the spectrum looks the same. And our constellation diagram looks great, all right?

    So this is just a nice way to construct your Simulink model to be able to compare the HDL compatible subsystem with what we call the golden reference, or your floating point algorithm code. So this is just a MATLAB function block that has those exact same lines of code that I showed before. All right, let's dive now into the HDL subsystem.

    This is what I ultimately generated VHDL code for and mapped to my RFSoC board, and that demo we saw before. All right, so, you know, just at the top level you could see that I've organized the model into really five major subsystems, and these map up pretty nicely to the lines of MATLAB code that I showed before. Actually this form covariance matrix, moving average, kind of happens on the first line together, but then these other three subsystems represent the other three lines of MATLAB code, right?

    And so let's just go through this from left to right. So here, forming the covariance matrix, right? So I take my vector four samples coming in, and I multiply that by the conjugate of that. And what we get here is a 4 by 4 matrix coming out, right? And so the nice thing to point out here is HDL coder can handle vector and matrix signals, right?

    So this here I implemented is just a matrix multiply, or really a vector, vector times vector multiply, that produces a matrix. And we don't have to play any special tricks in order to generate HDL code for that. So implementing this was pretty straightforward. Now, since we're streaming here, so I have to take the moving average of that covariance matrix. So the way that I implement that is with the pretty common little trick here, so my window size for the moving average is 1,024 samples, and so what I'm going to do is delay my input by 1,024 samples, or at the end of the window.

    And I have this one-cycle delay, which represents my accumulator. So I'm just adding my input on every clock cycle, but then I'm always subtracting out the 1,024th sample of the window. So that implements effectively a moving average, using some pretty fundamental blocks here, just delays, adds, and subtracts, right?

    And then I'm playing a little trick here to implement a gain. So I set my window size so that the square root of that would be a power of 2. And then instead of dividing or multiplying by a reciprocal, I can just do a shift right, and that effectively implements the scaling there.

    All right, so now in this compute weight vector, now this, remember, was the second line of code that I pointed out, with that matrix solve or matrix left divide, this was initially kind of a scary line to look at. But I was able to drop in this library block that just does all the work for you. So really made my life a lot easier.

    So we have here some control logic to feed that block. And if you look at the documentation for it, there's examples for all of these fixed point linear algebra blocks, on how to feed the data into the block. But that was it, really, this does all the work for me. You can see, it's completely parameterizable, right, so I have, in this case, this is just a four channel beamformer.

    But I could scale this up easily. Actually, I could map this to MATLAB variables, and then change that outside of my Simulink model, and have it update automatically. So, again, I want to stress just how valuable this library and this block is for a problem like this, just being able to drop in a matrix solver into your design and know that it's going to generate efficient HDL, something that's going to meet timing. It's a really, really powerful tool to have. All right?

    So now, moving into this block here, which represents that third line of MATLAB code, right? So again this was, the scary part of this initially was maybe the division or multiply by reciprocal. Here's actually a block that was released recently, as well, normalized reciprocal HDL-optimized. And so this uses CORDIC under the hood, so again HDL or hardware-efficient.

    And so, again, I was just able to drop this into the design. We take a scalar input and produce a scalar output, which is the reciprocal. And then I multiply that with my vector here, to produce effectively the scalar divide, OK?

    All right, then. Finally here, so I have my wOut here represents the weights that I've computed for the beamformer, and so this last subsystem here is just applying those weights to the input channels. So you can see here I've delayed my input in parallel, with the rest of the processing, so that, by the time this result is ready, I'm applying it to the sample that it matches up with. OK? And so this is all pretty straightforward.

    We're just multiplying this four element vector here with the four complex weights, right? So I'm actually doing a little bit of a kind of trick here to implement the complex multipliers. So we're using the for each subsystem, with this pipelined multiplier. And so you can see here this fully fleshes out the complex multiply, which is composed of three multiplies and five adds. All right? So I'm using this for each subsystem to just kind of scale that across the four channels and weights here.

    And then this block here, so this is summing the output, right? We have a single scalar beam channel coming out. And we're summing these four receive channels, or rather the beam channels after applying the weights. And just point out here a HDL coder optimization that I'm using called distributed pipelining.

    And so if you go to the HDL block properties for this subsystem, we're going to see, let's see, here distributed pipelining turned on, OK? And so what that means is we look inside here. So I have this the

    sum of elements which it's just sum of these four vector elements. And I have the architecture of this set to tree, OK, right?

    So what that means is we're going to sum elements 1 and 2 and 3 and 4, and then sum those results. So we end up using, rather than doing that all serially, we're forming a tree structure, right? So what distributed pipelining does, is it takes this pipeline delay and distributes that between these stages, right?

    So I don't have to actually elaborate this whole tree structure, and pipeline it manually. I can use the HDL coder distributed pipelining feature to handle that during implementation, all right? All right, now, let's take a look at the implementation results here. Again, this is for the zu28dr device on the ZCU111 RFSoC eval board.

    So for timing, we're getting a maximum frequency of just over 450 megahertz, which is pretty good. For resource usage, looking pretty good as well across the board. Now, I especially like point out this DSP count here, only 92 DSP slices or 3.5% of the available on the device. That leaves you plenty of room for all the filters, FFTs, whatever else you need in your design.

    And this beamformer could be easily integrated into a larger design, without worrying about taking up too many resources or running into timing issues. And so if you want to go ahead and try this yourself, you can download it from either File Exchange or GitHub. I'll provide all the MATLAB code and Simulink files that you need to build this design, and, out of the box, run it on ZCU111.

    And so if you're interested in doing that, I'd encourage you to go out, check that out. If you have any questions, just get in touch with your account manager, who will get you set up with our application engineering team. So I'd like to thank you. Thanks to everybody for your time. And, like I said, if you are interested in this topic, best way to get in touch with us is to talk to your account manager. And so I hope to hear from a lot of you soon.