Description

Build Scalable AI Solutions with MATLAB Production Server in Kubernetes on Azure

Björn Müller, Aerzen Digital Systems GmbH

With more than 150 years of experience in the industry, the AERZEN Group is one of the top 3 application specialists for high-performance blowers. The compressors, blowers, and turbos are mainly used in wastewater treatment plants, in the process industry, and in oil-free pneumatic conveying of bulk materials. Sustainability, smart energy, resource usage, and reliability of machinery are important concerns for AERZEN’s customers when operating their plants all over the world. The Aerzen Digital Systems business unit is working on smart services so that users can operate the machines even more efficiently and reliably with the help of AI and machine learning.

Aerzen Digital Systems designs AI models for forecasting, condition monitoring, and predictive maintenance for this purpose. Functions and models may be used interactively for data exploration as well as automatic processing of streaming IoT data. While planning to operationalize these models, several challenges surfaced and certain requirements for the platform were defined:

Flexibility and scalability for many unique plants and sets of machinery
MLOps to monitor and adapt AI models over decades of equipment lifetimes
Agile and quick deployment of new functionality on availability of improved AI techniques
Integration with applications from different frameworks developed both in-house and by customers

In this talk, we detail a sample solution to these challenges centered around MATLAB Production Server™ running in Kubernetes on Microsoft Azure. The Aerzen Digital Systems libraries of MATLAB functions and AI models are deployed to MATLAB Production Server through an MLOps pipeline. In the present case study, data from a large wastewater treatment plant is analyzed and an anomaly detection algorithm for a single blower is developed. This model is then uploaded to the cloud and trained individually for all blowers. During runtime execution, every model is monitored and automatically retrained if necessary.

Published: 3 May 2023

Full Transcript

[AUDIO LOGO]

Hello, my name is Bjorn Muller, and I am process engineer and data scientist at AERZEN Digital Systems. Today I want to talk about how we build scalable AI solutions with MATLAB Production Server on Kubernetes and Azure Cloud. But before we start, let me shortly introduce myself.

Education-wise, I'm a mechanical engineer with the specializations and automation and system dynamics as well as thermodynamics. And recently, I worked several years at University of Kassel in research of database process controls. Some examples of this development are startup time optimization for cooling systems; also, group optimization for machine groups and stratified ventilation for transient heat loads.

For all of these applications, it was very important to combine knowledge in several different domains. That's also what we're doing here at AERZEN Digital Systems, where we have split up our development team in three different groups. The AI team is developing the latest models and take care of cloud deployment, while the automation team is very deep in PFC programming and adding additional sensors, for example.

And we are at the process team, taking care of plant simulation and every information flow that is required from the actual plant processes. So we may be a relatively young company, but originated-- we are coming from the AERZEN machine fabric, which is developing and manufacturing high efficient blowers since more than 150 years now. The main applications of these machines are wastewater treatment plants; process, gas supply, and chemical, and other process industries; as well as oil-free compressed air in the food industry.

So to digitalize these machines more and make them more intelligent, in 2019, AERZEN Digital System was founded with the two main business branches of AERprogress, which is a cloud platform for condition and energy monitoring of AERZEN machines as well as third-party machines. There is also some applications, additional applications, for anomaly detection. And the second branch of our business is individual consulting, where we help customers, for example, to integrate their machines and equipment into cloud and make them IoT ready. And also, we develop custom monitoring and optimization solutions as well as simulations of process plans.

So what are our applications in detail? Let me show it. See-- look through this. Here, we have the wastewater treatment plant again, which is, in fact, one of our most important application of AERZEN machines. So you all may be familiar with the process of generating wastewater. But let me briefly draft how it is treatment.

So first, when the influent comes in, it is freed from any floating debris like sand or larger particles. And then the wastewater entered the core of the plant, which is the activated sludge process. Here, in so-called aeration tanks, cultures of microorganisms degenerating the organic waste. And to do their job, they need a constant supply of energy, of compressed air. This is provided by our machines, for example.

But this is just a sketch of the process. In fact, there's many different types of irrigation tanks and bioreactors. And the wastewater treatment plant itself is highly dependent on the composition of the influent and the amount of wastewater that is treated. So that's why our machines are very highly configurable, not only in terms of compressing technology, like turbo or screw compressors. They can are also available in different sizes and with different sensor equipment.

So to deploy machine-learning models to custom plans and with custom solutions, we obviously need high flexibility and efficiency during that process. So let's have a look at the examples that we want to deploy. I've taken two examples. This is the first one. It's called Machine State Classification. Here, we have a deeper look at how our machines are operated, because mostly they are operating in groups of several machines. Here are five identical machines, for example.

And to make sure that the customer gets all the knowledge of how our machines treat the best for maximizing the efficiency, we made this application where we basically classify the machine stated and different groups and then further analyze it to estimate the role of every machine. Often, it is so that one machine is operating nearly completely in quasi-stationary operation points, while other are constantly changing their operation points. So with this application, we can analyze this. We can also identify the set point of the control. Means when the machine is switched on and off, we can estimate this and then give suggestions to the customer of what he can improve in his control scheme to increase efficiency and also reliability of these machines.

All right, then the next application is an online anomaly detection. Here, we're just visualizing an anomaly score and give some alarms if the behavior of the machine is change-- has changed. And for this, we made a simple LSTM neuronal network based on physically dependent variables and implemented, as well as the previous application, on the cloud.

But how do we get at the data? So here we see how this physical process is connected to the automation and control system. We have connected all sensors and actors to a PLC here, for example. And to get our data of our machines most efficient, we have a customized gateway which is sending them to the cloud. So from there, it's accessible via browser for the operator at the SCADA system.

So since we are working here with MATLAB code, how do we deploy MATLAB functions on Azure Cloud? Well, we choose here the MATLAB Compiler SDK Package, which allows us to compile MATLAB code into a so-called CTF file. It's basically an archive which can then be hosted on MATLAB Production Server. So that's so far fine for monitoring only.

But when it comes to manipulate the process and use actually the results of any monitoring and analyze software, we need a little bit more. So when it comes to implementation of advance process control, it's always-- in many cases, it's useful to test them in simulation first and also use this simulation to optimize and test the system before it's actually deployed on the real plant. But When it gets deployed, we have also an edge device which can be connected to the gateway. And with Compiler SDK Package, we can also not only compile our code to a CTF file. Also we can dockerize or even can create an executable which is-- can then be installed on a process computer, for example.

So next, we come to the process of the model generation. So first of all, we're starting with an-- within a problem definition that we want to solve with a model. Then we gather data to achieve this. And we start with exploring and pre-processing, cleaning the data, before we select features and the model architecture. This is, then, of course, to be trained and verified and validated. This may be an iterative process, but on the end, we have a trained model that's ready to deployment.

As previously mentioned, we have these pre-deployment stages. We take here the assumption that we also want to manipulate the process with a supervisory control. So then we obviously deploy first on the Simulink simulation environment where we can test our control under several different conditions. And once it's past this, then we are just capable of compiling the same application to the archive file and deploy it on cloud so that then the customer can access it and see the results of these monitoring solutions. And it's fitting also with real data.

Then we can apply, compile, to an edge device, maybe as Docker or even as C code directly. We are free with our setup in this way. But this is basically a batch process which wants created and trained model. But if we want to scale this and apply to more plants, we have also to automate this process and re-execute it. But the runtime application is a bit differ. So when we deploy, we take here the assumption that the customer cloud with database and dashboard already exists. This maybe also AERprogress, our own product. But it don't have to.

So then when we want to access our applications, we use REST API in combination with a called metadata which is an argument of our application that is then passed to our Kubernetes cluster. This cluster has an integrated file storage which is connected to our DevOps so that the developer on site just need to push the code into GitHub. And then from there, it's automatically directed to DevOps and to the file storage and implemented in the MATLAB Production Server. So our application is here hosted as an ensemble. And it's the whole pipe-- the whole runtime pipeline.

So here, we also first extract data from that database, validate it, and pre-process with the same models we used previous to train the model. And then, since our systems and process industry are oftentime variant, we made here the decision that we also implement the capability of retraining the model. So first, let's decide if it's a blank model or a pre-trained model. If it's blank, then it's initially trained and then executed in runtime and also continuously monitored. And when we detect any changes, then this automatically can trigger the retraining process under certain conditions.

So in detail, it looks like this. Every application is designed as an ensemble of layers. We have on the first layer, we have the application, which is the actual function that is then be called by a client. And on the second layer, we have all these data processing steps like training, the pre-processing. We just have fixed functions with defined inputs and outputs. And then we just need to arrange the functions on the data processing layer. So this gains us more flexibility and development progress, since every application is an ensemble of existing functions that has already been implemented.

So how is this controlled? Well, on prem during the development, we can use MATLAB Client and pass this metadata file via MATLAB struct format. And when we go to the cloud, then we can use a REST API to call it. And with a simple one-liner of code, we can easily translate the struct format into JSON file which can be understood by REST API.

So this metadata file that is uniformed, and it contains information of the actual query addresses for data sources and destination. And also in the model part, we store every information regarding the model hyperparameters, like layers or specific training set distribution and so on. And also, we can host some address of specific initial training data if we so choose.

So in combination, this comes up with our business model. We start first with acquiring a specific task. So that means a physical plant is given to us, a data set of and the technical specification of what the customer wants to develop. Then we start the actual development process where we create a model, use Simulink for runtime testing. And during development, we constantly make use of our local accessible functions that we already have developed in previous process. So that means the often you iterate this process, the bigger your library gets, and the more efficient you can get.

And once an application is complete, then we just have to-- can push it to the server. And then comes the cloud engineering part and cloud engineers then configuring the client and make all these connections to data sources. And then it's ready for deployment. If it's the customer demand as an existing applications, then we, of course, don't only need to configure the client and can skip through the development process. So and to scale our solution, we have make use of Kubernetes, which allows us to scale our cluster and expand the calculation power if so needed. Our cloud applications and so forth for visualization are a scalable API for integration and other code or programs.

So in summary, we can say that this system needs at minimum two experts, one on prem developer on MATLAB Simulink and one cloud engineer to make the application on the assumption that the server is already installed. And we have the advantage that our code is protected via these CTF archive files. And we can easily integrate our functions in third-party systems.

Also, through the use of MATLAB, we are capable of testing the model lifecycle in Simulink or any other runtime environment. And the last advantage is that we have with these compiler packages the possibility to deploy on any device or any code, or even some languages we can translate to. So on the downside, initially, to build up these systems, it takes of course more time, since every capsulatable part is made as a function, which takes more time and makes debugging a bit difficult. But on the end, once you have a certain amount of functions in your library, then you can start to make actual benefit of this system.

So how is further going? We want to, of course, expand our library, develop the deployment method to the edge device via cloud, and also integrate Simulink models in the MATLAB Production Server. And further down the road, we want to even automate the model selection process of our predefined functions. And in the end, we want automate digital twin creation and therefore accelerate monitoring and optimization solutions for process industry. This was Bjorn Muller from AERZEN Digital Systems. Thank you for your attention.