Description

Model-Based Design for Multicore Embedded Targets

Overview

The design of data processing systems is suffering from a huge complexity increase in many industries. On the application side, for instance, trends such as autonomy, or the increasing challenges to guarantee safety and security, require for huge datasets to be processed in real time or near real time. At the same time, the semiconductor industry evolves towards the integration of heterogeneous processors into complex Systems on Chip (SoC). Normally, including multicore clusters and complex interconnects. Designing and mapping complex applications for these new embedded devices, while guaranteeing the necessary safety standards, is becoming a big bottleneck.

In this seminar we will learn how Model-Based Design supports certain aspects like:

Partitioning of applications into execution units.
Simulation of applications including hardware and software effects.
Mapping and scheduling tasks into processors
Generate and test code and use profiling to compare performance.

About the Presenter

Juan Valverde is the Aerospace and Defence Industry Manager for the EMEA region at MathWorks. His technical background is on the design of dependable embedded computing solutions for aerospace.

Prior to MathWorks, Juan was a Principal Investigator for Embedded Computing at the Advanced Technology Centre for Collins Aerospace - Raytheon Technologies in Ireland. Juan has a PhD in Microelectronics and Computing Architectures by the Technical University of Madrid (Spain).

Recorded: 8 Nov 2022

Full Transcript

The idea of today's seminar will be going over our proposed workflow to help you mapping applications into multi-core targets. But first, let's start with an introduction about what we see happening in the industry. This includes new applications, how technology is evolving, and what changing needs from industry we are seeing.

In general, we see a huge increase in application complexity of course. One of the main drivers for that is autonomy. We see that across all industries, from ADA systems to assist decision making in next generation combat systems, industrial robots, et cetera. Autonomy requires huge amounts of data processing for situational awareness, for detection, et cetera.

Another main driver of this complexity increase is distributed computing. This imposes hard requirements in communications and resource constrained processing platforms. The idea at the end is having the right data at the right time in the right place. And I'm emphasizing the right data. So security is also crucial and imposes hard requirements in computing. This means they trust architectures, verifiability, et cetera.

In conclusion, we have more applications where data is the center, applications that are time constrained, and we're guaranteeing security and safety is even more important than before. Therefore, more processing power is required.

At the same time, a semiconductor company has responded, and are responding to this application complexity increase by offering highly integrated and heterogeneous systems of chips. The thing is that while software design methodologies are pushing for higher levels of abstraction using service oriented architectures, different levels of virtualization, automatic generation, et cetera, hardware devices are evolving towards more specialized architectures, like vector processors, FPGAs, real-time course, et cetera.

Then there seems to be a disconnection between these two trends. More complex applications require a proper handling of abstraction layers, while they also benefit from specialized processors. For that, model based design will help bridge in this disconnection. Besides, most systems on chips are not designed for safety critical applications. And this makes it very difficult to characterize type behavior.

In other words, calculating worst case execution time will be very complicated. For that reason, directly migrating from single core to multi-core is not a straightforward task. In conclusion, we need new workflows and the proper abstraction management.

So overall, what is what we are seeing in the industry? We see that these increase in application complexity driven by these new capabilities I mentioned, plus the evolution of semiconductor technology are causing migration in complexity from systems to embedded systems. This means more integration, more teams working on the same platform requirements for correct partitioning and isolation and new challenges for hardware, software, and system certification. In conclusion, we need not only to improve our design methodologies, but improve the way that these teams collaborate.

Today, I'm afraid we will not be able to address all the previous challenges, but working to focus on the following to include design methodologies that favor team collaboration and automation. These are how to define and analyze a software architectures, how to use different ways of partitioning application models to optimize a mapping into multi-core targets, and how to bring some hardware and software effects to our models to help design and better our applications.

So let me show, with a very simple example, what I mean. Imagine that we have the following generic application. I need to capture some information from sensors. I need to somehow format and merge some of these data, using different algorithms. I need to process these data, and use it to show information in this place. Or to send command signals a for actuation a control, for example, for a secondary flight control a surface of an aircraft.

Then if I'm willing to map these algorithms into a multi-core target, I need to do certain tasks certification of my application. This means partitioning my models into execution units that can be mapped into the different processors.

Then I can see how to map these tasks into the different processors or cores, and how can I explore parallelism, et cetera. But for that, I need information about scheduling. Not only where the tasks are executed, but when, in what order, possible offsets, et cetera. But in order to decide about scheduling and mapping, I need to have some information about timing and about execution time of these tasks. So we can end up having a bit of a chicken and egg problem, because the time behavior and multi-core is highly influenced by mapping and scheduling. This is where modeling and simulating different scenarios can be very beneficial.

But the thing is this is not only about a modeling where my tasks are executed and how they are executed. But it's also about where my data goes. And a data transfer means where my data is going from and where the data is going to. So for that we need some more details about that.

However, capturing all these details from the hardware platforms and software abstraction layers, like operating systems, might not be the most practical thing to do at this level. However, having some sense about how my data is transferred from one task to another, the fact that I'm using certain peripherals, or that I'm using a DMA transfer can be very beneficial to find early in the design process where, when, and how my tasks need to be executed.

With that in mind, we are proposing the following high level workflow. Assuming system analysis is done, and we have system requirements defined by system engineers, we can start with high level requirements allocated software. Then with these requirements, we can start defining the software architecture. The software architecture can be used for early analysis, identify components and functions, interfaces, et cetera. And this can be done by a combination of functional experts and embedded system engineers.

Then it is possible to go to the component implementation. This is the modeling of the different parts of by system for simulation. Normally done at different levels, starting by algorithm designers and then refined with the collaboration of embedded system experts, we will see how this partitioning can be done at this level. Then it is possible to merge algorithms with these effects from hardware and software elements using the associate blocks.

This will be a-- we will be able to simulate tasks, a task to process of mapping, IPC channels, scheduling sets, et cetera. This will be more the responsibility of the embedded system engineer, keeping algorithm experts in the loop iteratively. Then this model can be used for code generation and deployment into different targets, using the SOC block set, Embedded Coder, or directly from the component model, using embedded code. The products that are included in this workflow, our System Composer is simulating different parts of simulating, including associated offset and Embedded Coder.

So let's see the different parts of the workflow in more detail, starting by the software architecture. We already mentioned the importance of properly capturing a software architecture. Many systems contain software compositions made of sets of software components that exchange data through sets of well-defined interfaces.

Here, we are enabling a framework where software architectures can be quickly created top down or bottom up to compose embedded applications from reusable software components. These other components are normally modeled in Simulink. So now you can explicitly create a generic software architecture in System Composer, which is our primary product offer for model based system engineering.

It is true that you can already model software compositions in Simulink, but moving this into an architecture framework gives you some important capabilities and a complete workflow, from high level specification, down to Embedded Code. Now it's actually possible to go to new architecture. And you have the option to use the new architecture model template for authoring specifically software architectures.

A typical workflow we heard from different companies, it starts with a high level description of your architecture that meets your stakeholders' needs. The architecture is usually defined in an architecture authoring tool, like System Composer. Then either manually, or automatically imported from an interface a control document, the architecture captures the structure, in terms of components and interfaces, but also behavior, like functions, timing, and scheduling.

Once your architecture meets your stakeholder needs, you can annotate it with different requirements. As new requirements are created, the architecture can get more detailed, as this drives further the definition of the architecture. So this is an iterative process.

You can, of course, also generate reports for communicating more effectively with different stakeholders, using different viewpoints. This is a customer domain experts or different types of engineers. This is especially important when your architecture is very complex and contains hundreds, or even thousands of components.

As part of the architecture definition, you often need to capture the characteristics of your software, such as performance, cost, size, power, weight, reliability, et cetera. These characteristics all need to be optimized via analysis and trade studies. Then you further elaborate your software architecture model by describing its intended behavior, in terms of sequencing and scheduling.

Finally, the architecture often flows down into design and implementation, using modern design with the components being model and Simulink, and implemented by producing code from the model is the last part. In terms of product families and features usage, we envision that Requirements Toolbox is laberity for capturing high level requirements. Then System Composer is the platform for authoring, viewing, and analyzing architectures. And Simulink will be the one for the components modeling and implementation in conjunction with Embedded Code in this case.

So we are already addressing some of these workflows in System Composer, like I mentioned. I have already said that you can sketch your software architectures and allocate requirements. You can also add your own custom properties and metadata and profiles. In terms of views, you already have a component diagram and architecture here give you. And since this year, we have also a new class diagram available.

In terms of analysis, you can use MATLAB analytics to perform different trade studies, for example power consumption, latency, or execution time. You can link components to similar behavior models and schedule component functions. You can, of course, import and export your architectures from and to MATLAB.

We already mentioned about the creation of reports. And finally, you can describe the behavior of your software, also in terms of sequencing of your components and messages. And this is done using a sequence diagrams.

And this is just one example to see how the way to start building your system is to create its architecture, either in the component diagram or in the sequence diagram view. The lifelines in the sequence diagram on the left, which are the vertical lines, are the components from the component diagram on the right. The messages between the lifelines describe the event transfer between consumer and producer components on the left side.

As you see, the two editors are fully synchronized. So after describing the structure of the system, we ended up with an architecture made of five components that represent the station, the robots, the rotor, and the results combined.

Another important part of the workflow we see is that once you have the definition of your software architecture, you want to assign the software components to the processing elements-- and these are processing elements of your hardware platform, to indicate the deployment strategy. To achieve that, you can create a hardware architecture, using a custom profile with components representing the processors, or the ECUs of your hardware platform and their properties.

Then you can use a System Composer's allocation feature to assign the software components from your software architecture to these hardware CPU components of your hardware architecture. Then we have here the allocation editor that will use a matrix representation where the rows represent the modeling elements of your software architecture, while the columns represent the modeling elements of the hardware architecture.

Then by double clicking on the cell in the matrix, you can associate these modeling elements. You can also create multiple allocation scenarios, using different allocation sets and compare them quantitatively, which is very, very useful. For example, you can determine if the processor has enough capacity to house all the different software components by summing up all the binary sizes of the components allocated to a particular process. So you can start doing this type of analysis.

Great, so once we have captured requirements, and we know how to capture the silver architectures, and we have performed different types of analysis, we can now start with the implementation of the different components. This is using all the modeling and simulation of the behavior of the different functions.

Then it is important to highlight that information about this implementation can be actually used to enrich the previous architecture analysis and compare different implementations and architectures. This information can be included in the stereotypes I mentioned before. This way, it is possible to capture metrics, such as power usage, memory, execution, time, et cetera, that will help in the mapping process later on. There are different ways to capture this information from generating code and capturing data, from execution in target, or on a test of machine, complexity estimation, or extracting code metrics from the generated code.

Then most times the partitioning in execution units of my model-- this is task, threads, it depends on the granularity. It's not immediately reflected in your model. Domain experts will partition the model according to their needs. And this is not always the best option to optimize mapping. The architecture should match the software architecture already defined. But we are also subject to improvements in either iterations.

At these states the behavior in model, it is possible to start partitioning and performing some core allocation to run different scenarios in simulation. In general, Simulink supports three types of partitioning, implicit partitioning based on rates, explicit partitioning in tasks-- that is manually created doing a manual assignment of blocks to tasks. Or even automatic partitioning, using data flow subsystem.

For the implicit partitioning, we will have one task for sample raid work in a multitasking configuration. These tasks will run concurrently in the target. For the explicit partitioning and mapping, you will partition your model, using model references, model functions, et cetera. And using the user interface, you will allocate them to tasks. You will map those tasks to different processors, and so on. This way, it's also possible to configure the data transfer among tasks.

Then if for instance, you don't have a different separates for a data flow model, sometimes it is possible to do that automatically as we will see in the next slide. This is the way to start putting together your architecture and run tests. You can go back and iterate into in your partitioning. And doing it early in the design process will give you a lot of information about the possibilities for mapping and the complexity of your application.

Then you can start answering questions like, does this partitioning make sense? Can I parallelize more? Should I run this piece into the FPGA, or one of my CPUs, et cetera?

Then like I mentioned, for data flow subsystem, there is a support to perform Automatic Partitioning to tasks. This way, it will be possible to identify obvious task parallelism create more opportunities for parallelism, by breaking direct connections using pipe-lining, or even using unfolding techniques. Then it is also possible to run task duration estimation, using software in the loop and processor in the loop on both target devices and laptop.

Then with the multi-core analysis workflow, you can get more control over the partitioning of your applications. So then it will be possible to use different options to determine the duration of your tasks, to specify partitioning constraints, such as maximum number of threads, and of course, visualize results in a friendly manner.

But let me show you what a proposed workflow look like. So first, we select a method to calculate course simulation. This is a simulation profiling, et cetera, and we do the calculation. Sometimes, we'll need to override a previous values. Then we specify analysis constraints, such as the number of threads or time thresholds. Then we run the analysis and observe results, such as the possible speed up of your application, and then we iterate.

So let us see an example of that. So in this case, we have an example of an acoustic beam-forming. Then we can set the execution domain to data flow. Then we can select the costs the calculation method. And we can run the profiling this can be using cost estimation, SIL/PIL profiling, et cetera.

Then we will see after the execution, we should be able to see the different costs of the different tasks. Then we can actually run the multi-core analysis. And then we will see how the different tasks are allocated to threads and we can see some parts of this analysis. So we can see the speed of the application compared to a single thread adoption et cetera.

Then you can see an adaptive latency. So it can be proposed, so then you can accept it in this case. And then we will see how by changing the latency possibilities, we will see that this is added in the model. We will see that second.

So we can see that this is a adding latency. And then we can re-run the analysis, and see if there have been some improvements in terms of parallelism and acceleration. So we see that actually this happened. You can also go, and start playing, and override the different task costs. So that you can, for example, perform a sensitivity analysis by re-running the multi-core analysis. This, of course, can be scripted.

Now that we have seen how to capture the analysis and analyze software architectures, we included a mapping to multi-core devices and the software architectures. We've seen how to start modeling the behavior of the different components and analyze partitioning, mapping, and see opportunities for parallelism. But in some cases, it will be very interesting to simulate some of the different components together and impossible effects that are introduced by hardware platforms or operating systems. Like I mentioned before, if I transfer data, I need to know where from and where to, and what is the peripheral I'm using, will I have task overruns, et cetera.

Using the SOC block set in Simulink, it is possible to allocate the previous tasks to processors at inter-processing communication channels, et cetera. The SOCB semantics will allow you to represent processing units at the Task Manager, to simulate your scheduling, at memory blocks, IO peripherals, and continue using all possibilities for plan modeling or for bi-simulating.

So let me show you some examples of these features. For instance, you send the Task Manager, you will be able to simulate task duration and preemption, find overruns, and dust drops. You can set the core affinity of the tasks. You can create interrupts, et cetera. For example, for the task duration, it is possible to add stochasticity data, which is very useful for sensitivity analysis in the presence of shared resources, for example.

In this example, you can see how different tasks that have been allocated to different course. And the scheduling set is simulated using the Task Manager. Then it is possible to visualize the concurrent execution, like we see in the image.

This scheduling of the different partitions can be specified and visualized using a scheduling editor. They should say, for example, including periodic tasks, like we see. Either you create on the one way, and then you visualize, and you can modify tasks.

Once you have run your simulations, it is also possible to visualize results using the tasks execution report to see profiling metrics, hardware usage, et cetera. And also use the overrun inspector for more details about task execution.

Another example is how you can model inter-process data exchanges. We already mentioned that. So this way, you can have different ways of buffering, whether it is in blocking or non-blocking mechanisms, transfer delays. And see results in the execution reports, like I mentioned before. And then see the inter-process data channel statistics.

In this way, a task execution and data transfer can be modeled in a much more realistic way. So we are adding to the actual algorithm. And to the different possibilities for mapping, we are adding a more realistic scenario.

Great, so let's see an example, including the effects of these sporadic tasks and overruns. In the example of the model, we have a controller with three tasks that are referring to current speed and position. We have simplified model of the plan and some sensor interfaces. Here we can see that under the Task Manager mask, we can configure different parameters of this task, such as the name, type. For instance, if it is a task this time or driven in this case, you have period core affinity, et cetera.

Here you can also select the option to drop tasks that are overrunning, and so on. We can also see how to capture tasks execution and statistical distributions. Then when we activate the Data Inspector, we can see more details about the execution of the task. This is when they are activated, whether they are dropped, et cetera.

Then we can also launch the execution report. And this will show us a snapshot of the execution, including time, which task was dropped or overrun, et cetera. Of course, it is also possible to go to the hardware settings, modify the number of cores, modify the core location of some of the tasks. And then rerun the execution, and see how the different values are modified, like we will see.

So I think that this is all for today. So to finish the session, let me briefly review what we have seen, and add some conclusions. So using System Composer, it is possible to capture and analyze software architectures before modeling behavior in detail. We saw that. Then using the concurrent execution workflow and multi-core analyzer in Simulink, it is possible to perform different types of partitioning of your models. And this will help you mapping them into multi-core devices, by finding more opportunities for parallelism.

Then after the behavior of your components is modeled and partitioning is done, we can capture information about hardware and software effects, using the SOC block set. And then this can be done in your behavior models early in the process. This is very beneficial to accelerate design, because you will be able to find issues, even before moving to target. Then even though we didn't cover much today about code generation, it is, of course, possible to generate code using Embedded Coder for different targets.