Integrating AI into Simulink for Simulation and Deployment
Gary Matson, MBDA Systems
Nicola Easton, MBDA Systems
There has been a recent explosion of research into artificial intelligence (AI) which has demonstrated immense promise across all industries, with levels of performance accuracy previously unobtainable now commonplace.
The rapid evolution and advancement of the tools and techniques used for training cutting-edge AI solutions benefits from a wide variety of tools such as MATLAB®, PyTorch® and TensorFlow™ working together. However, to exploit these AI solutions beyond research and into products, it is critical that they can be used for inference “in the wild.”
See how MBDA engineers have successfully imported externally developed AI solutions into full system Simulink® models for testing and proving, and subsequently deployed these solutions onto embedded hardware.
Published: 10 Nov 2023
I'm Gary Matson. I'm here with my colleague Nikki Easton. And we're going to be talking to you about integrating AI, or Artificial Intelligence, into Simulink for simulation and deployment.
To start with, I'll just give you a bit of an introduction who MBDA Systems are. So we're a missile systems company. And we were formed back in 2001 from a series of mergers across Europe. So mergers of British companies, French companies, and Italian companies came together in 2001. And since then, we've also merged with German and Spanish companies.
So we are the largest European company in the missile systems sector. And we're the only company that's able to satisfy all the needs of our domestic armed forces. So we have three shareholders, who are all very well represented today in the room, from Airbus, BAE Systems, and Leonardo.
If we focus on where we are, so we have over 4,500 people based in the UK, in Bristol, Bolton, and Stevenage. And then across the rest of continental Europe, we have over 6,000 people in France, mostly around the Paris area, over 1,000 in Germany, around Munich, nearly 2,000 in Italy, with then a small complement in Spain, Poland, and even over to the US. In total, we have over 14,000 people. And 60% of these are in technical or engineering functions.
So now focusing on us, so we're based at our Filton office, which is in north of Bristol. And we're part of the UK Image Processing team. So our team are responsible for developing computer vision algorithms to provide our systems with situational awareness in order to enable us to navigate across terrain and ultimately to precisely guide the missiles to the target. So I'm Gary. I'm a technical expert. And as Chris said, I've been working at MBDA for over 15 years now. And--
I'm Nikki. I joined MBDA about four and a half years ago. And I'm a senior engineer in image processing.
So within MBDA within the UK, we've got a very well-established process of using MATLAB and Simulink for development and deployment of our algorithms. We've been doing it for well over a decade now, and where we take a problem definition, we identify some data that we're going to use to develop that algorithm and test it. We then come to, normally, MATLAB, where we'll experiment, build up the basic algorithm design.
Then as that matures, we'll move it across into Simulink for modeling and testing and integration with all our other components, where we do our whole system testing. And then ultimately, we generate autocode, using Embedded Coder, for deployment on embedded processors.
So this technique has huge merit compared to the legacy techniques, in that we have a common model and embedded target processor code using that model-based design that we've heard a lot about this morning. And this reduces time and costs and increases robustness because we don't have to cross validate between a variety of different sources of our software.
So this topic was presented back in 2015 at MATLAB EXPO, specifically on accelerating FASGW/ANL image processing with model-based design. And just purely because I haven't-- we are a missile systems company, and there were no other pictures of missiles. So there's a picture of FASGW.
So as I said, this is very well established. We've been using it for well over a decade. But we've had a bit of a curveball over recent years in the exploitation of AI. So I'm sure I don't need to tell you that there's been an explosion of AI research across everything. So certainly, within MBDA, we've done a lot of research. Within the wider defense industry, academia, it's been shown that AI can give us performance that we wouldn't have thought was achievable even just 10 years ago.
So some examples of where we've used AI-- on classification and detection of objects within images, pose estimation, tracking objects through very complex scenes, feature matching-- so for example, to match images between different modalities-- infrared and visible band. And this is just scratching the surface of what we've looked at.
But these algorithms are developed in a different way. They have a very different need. And so the challenge is, how can we exploit these? How can we take that pipeline that we've used for well over a decade and get these algorithms integrated onto software-- onto hardware?
So what does exploitation of AI mean to us? It essentially means we need a pipeline that you saw on the previous slide but for AI now. And when we started researching this explosion of AI, we started looking at these algorithms in PyTorch. So we have a pretty well-developed pipeline of training, designing algorithms through use of data with PyTorch.
The question is, how do we go from our designed algorithms in PyTorch into the simulation and testing environment that we want to use and love so much and then deployment onto suitable hardware for AI? And what does that suitable hardware even mean these days? So let's jump straight in, looking and getting it into the simulation environment.
Well, back when we started looking at this in 2019, the industry standard for this was a tool called ONNX. So ONNX stands for Open Neural Network Exchange. It's essentially a file format that lets you pass from lots of different products to lots of different products. In this particular case, it takes us from PyTorch and into MATLAB and Simulink.
It's important to note that you can now do this directly since 2022b. But I've kept ONNX up there for another reason. And I'll talk about that later. But it's still a useful pipeline.
So what do we do once we get this ONNX network that we've trained into MATLAB and Simulink? Well, actually you can interrogate it and look at using a really cool tool called Deep Network Designer. And what this means is you can open your network up, look at all the layers, how they're put together, and validate that what you've put into Simulink is exactly what you think you put into Simulink. You can also do things play around with the layers, take away dependencies you might have had in the import process using ONNX.
And then so we've got it into MATLAB. And this is really useful. You can do lots of validation and testing here. But really, the important place to get it to for testing is Simulink. And this is really easy. So usually, I'd point to the screen. But you can see this AI network on the block. And that's the Predict block. And all you do is drag and drop this MATLAB network into this Predict block.
And what this means is we can now look at AI in a wider system model with all of your traditional algorithms going on in tandem and have the same robustness and testing properties that Gary mentioned earlier and all of our previous speakers have gone on about in Simulink. So to give you an example of this, we could have traditional detection passing into an AI classification and passing back out into postprocessing, running in a full system.
So basically, we've got this system that works really well, passing it into our modeling and testing environment. And we're pretty happy with that. But how do we go to the next step? Well, we need to go to the deployment. And really, we need to put it on hardware to even consider putting it in our products and utilizing the explosion of AI.
But the problem with AI is that it comes with a huge number of matrix operations. And in particular, CNNs, for our vision and image processing case, comes with a huge number of convolutions. And these come with a huge number of matrix multiplications.
And what you'll note is that these aren't suitable for regular CPUs we've run our traditional algorithms on, simply because of this huge number of operations. To give you an idea of the scale of this problem, ResNet-18-- that is quite a small CNN-- has about 11 million parameters. If you jump up to VGG-16, it quickly goes 10-fold more parameters.
And you go to GPT-3, which is the backbone of ChatGPT 3, which admittedly isn't a vision network, but everyone knows what I'm talking about. It's 170 billion parameters at least. So AI scales up massively. And we need hardware to deal with these problems that we haven't faced before with more traditional algorithms.
So what am I talking about? Well, here's some commercial examples. So there's the NVIDIA range of Jetson products, for example. So the picture you've got is an NVIDIA Jetson Nano. Other products are available, such as the Texas Instruments Jacinto 7, which we'll focus on in this presentation. But there's loads of different commercial options available. A couple are listed there.
And these can harness running loads of matrix operations really quickly and having real-time throughput, as we'll discuss. So what have we done to try and get these algorithms onto these embedded hardware boards?
OK. So we started looking at this back in 2020. And at that point in time, Simulink didn't have support for a GPU Coder. So we worked in partnership with MathWorks. And we beta tested the support for GPU Coder. And we selected NVIDIA hardware for this early demonstration purely because it was the best supported hardware to generate a benchtop demonstrator.
And what you see in the movie generated here is the example we chose. So we chose the MSTAR data set, which is a radar images of ground-based targets. And we did a classification problem to try and classify what is in those images.
And in the video, you can see that run in real time on the board, which is the small board in the bottom left. And that's been autocoded directly from Simulink. And that's it then running in real time. So this was a big success.
So we then went to the next step, going, what-- can we actually do this with traditional algorithms and fly it as well? So we took some different AI algorithms. We integrated them in Simulink with a traditional algorithm suite.
We autogenerated all of that code as one and put it onto the aeroplane you see in the bottom right. And we flew this around. It had another variant of the NVIDIA Jetson family on it and operated these systems in real time.
This was fantastic. We'd never done this before. And this highlighted the fact that it was a process that was possible. However, for a whole variety of different reasons, NVIDIA hardware is not suitable for use in our products. So we had to look elsewhere and see what other products were out there.
And so the product we chose for our initial experiments here was the Texas Instruments Jacinto. So this is a very, very powerful board. It's got a whole suite of features on it. The features we care about are the CPU. And it's got two ARM Cortex A72s and the DSP, the Digital Signal Processor, where it has a C7x, which is a very capable piece of silicon.
But there's a small subset of this that is an MMA. So that's a Matrix Multiply Accelerator. And this is equivalent to the GPU of NVIDIA. This is the bit that is designed to accelerate the operations with the matrix multiply operations that are so prevalent in CNNs, for example.
So this is a very capable processor. It's got a throughput of 8 teraoperations per second. And when you get this from TI, TI provide a whole suite of tools called the Texas Instruments Deep Learning Runtime library, or TIDL-RT, as we call it, which allow quick implementation onto the MMA. So there's a whole suite of examples, autonomous car examples, where you can deploy them onto the processor and see how powerful the chip is. And it is very, very powerful. It's very, very capable. And this allows supported networks to be run in real time.
So we've shown that we can deploy standard architectures using these TIDL-RT tools. But what about custom architectures? Well, I've shown that we can do the top route really easily into MATLAB and Simulink. But to get it onto hardware, currently, that step wasn't supported at the time of 2022, 2021, when we were looking at this.
So we tried going down the TIDL-RT route using the Texas Instruments libraries. But we hear a lot of challenges going through this process. I don't think we'd be here today if it was all plain sailing.
What are some of the challenges? Well, to start with, those standard architectures that Gary mentioned that run really quickly with these libraries use a list of supported layers. And if the network layer that you want in particular isn't on this supported list, then it won't be accelerated to the MMA. It will be accelerated to the A72 CPUs.
What does this mean for us? To give you an example, batch norm in inference mode, if you know anything about AI, runs really quickly on the board. It will run at, say, 500 hertz for a ResNet with batch norm in it. If you suddenly take that out and put instance norm, which is a small change, the network with more than 12 layers in it, the whole thing will have to run on the A72 and run at 1 hertz. So we're getting a 500 times speed-up by using the MMA.
So there's a massive challenge there with us trying to implement our custom architectures and not wanting to be limited by our hardware choice. Other problems we've had are the dependencies. So for example, there's a dependency on these libraries on certain elements of a Linux OS. And we don't want to be limited by the operating system we have on our software.
Other dependencies include hidden source code that we can't see because of the proprietary software. And in particular, the functions that accelerate to the MMA that we want to use are essentially black box source code. And as soon as you say the words "black box," validation goes out the window, and you can't use them.
So we've got big problems there. And talking about validation, an extra step would be required between the MATLAB and Simulink stage and the deployment stage. And we don't want the extra overhead of having to validate that what actually came out of our complex testing matches what comes out on hardware.
So what do we actually want, looking for 2023 and the future? We want generic code generation that doesn't have any dependencies on operating systems or any black box code that we can't see. And we also don't want this extra validation step. And I really hope you know what's coming next.
And we've been collaborating with MathWorks and their deployment and consulting teams to try and enable this pipeline straight from the modeling and testing environment onto deployment. And this hopefully will solve all of these needs we have. So Gary will talk to you about the progress with these steps.
So this is live work we're talking about we're working on with MathWorks with Steven Thomsett and his team at the moment. So our current status of this is that we have successfully targeted the C7x. So there's the little schematic of what the Jacinto has, just to remind you.
And so we've successfully taken a network, used Simulink Embedded Coder to autogenerate that code and deploy it onto the C7x. And further than that, we've succeeded in using the special instruction sets that the C7x has, things like single instruction multiple data, which gives us about a five-time speed up if we enable that option.
What we haven't yet achieved, and clearly is where we want to go, is we haven't yet targeted the MMA. So as I mentioned, the MMA is the dedicated piece of silicon that is there specifically for AI. So this is the step we're working to now, as to how do we push on that one step further.
Clearly, we want to also then compare the runtime of what we can achieve with this process with that that we can achieve using the TIDL-RT libraries, just to see scope, what the art of the possible is. We should be able to get close, we would hope, to the TIDL-RT, even if we're willing to sacrifice some performance for the huge benefits that this process would give us.
And finally, this is just one example. Hopefully, this has highlighted the challenge of choosing some hardware other than NVIDIA. NVIDIA is very easy. And the Texas Instruments Jacinto was just one example. There are others. So Xilinx, NXP also make boards that are very, very suitable for this. How can we take that same process and autogenerate the code from Simulink to these boards? So that's all work for the future.
[APPLAUSE]