Low Code Data Analysis in MATLAB - MATLAB
Video Player is loading.
Current Time 0:00
Duration 23:05
Loaded: 0.72%
Stream Type LIVE
Remaining Time 23:05
 
1x
  • Chapters
  • descriptions off, selected
  • en (Main), selected
    Video length is 23:05

    Low Code Data Analysis in MATLAB

    Data analysis is an integral part of many engineering and scientific workflows, but applying the right data analysis techniques typically takes lots of manual coding before getting useful results. It can be particularly challenging when you are not familiar with your data. MATLAB® Live Editor Tasks and apps such as Data Cleaner make it easy to explore, clean, and prepare data. These apps and tasks can also automatically generate the equivalent MATLAB for you to build on. This demo shows how you can analyze data and generate reusable data analysis programs in MATLAB with minimal coding.

    Published: 5 May 2023

    Hello, everyone. Thanks for joining us today. My name is Onomitra. I'm a product manager here at MathWorks, and as a product manager in MATLAB my primary area of focus are all the different data analysis and data science technologies that you'd find in MATLAB. And today for this talk I'm joined here by my colleague, Lola Davidson, from the MATLAB Data Analysis Group. I'll let Lola introduce herself.

    Thanks, Ono. I'm Lola and I've been with MathWorks for about five years now. I'm a developer on the Data Analysis team, so I work on low code tools that help users to clean and process their data.

    Thank you, Lola, and thanks again for joining in this talk called Low Code Data Analysis with MATLAB. And this is how we are going to do things today. We are actually going to show most of what we are going to talk about in the form of a demo. You're going to use the case study, an example of a flight data set recorded by NASA. The flight data is stored in an Excel file and contains various sensor information like temperature, pressure, and so on.

    Now ideally what we'd like to do is be able to build a virtual sensor model using this data set and be able to predict something like the true air speed value. But quite often when we get this kind of data set directly from the field, we are not ready for modeling or machine learning. So as a part of this demo, we are going to show how we can use different local tools in MATLAB to explore, analyze, and prepare this flight data set and get it ready for modeling.

    Now I've been tossing low code a lot. What is low code? Well, low code is basically a software development technique by which you can rapidly develop software without having to do a lot of the programming or coding yourself. We are going to touch upon a lot of these tools today, but just to give you an example, probably many of us are familiar with the plotting command in MATLAB.

    Now Plot plots a line graph. There are many different ways we can actually annotate that plot, like adding labels or grids. Our thing on the screen is how we are using a simple point and click approach to add those labels and create. And in the end, what we got is not only the chart that we wanted, but also the code that helped to draw that chart. And that is what low code is, and that's what some of the benefits are. Low code has a very shallow learning curve.

    We don't need to start by learning everything. In fact, it can help us-- it can teach us how to code, just like you saw in this simple plot example. It showed us the exact code that I needed. And last but not the least, low code not only gets you started, it can get your job done sooner. Quite often all we need is to get my simple analysis done, and low code can get us there. But if that's not enough, low code tools can also generate code and help me to build my program around that.

    So low code is not something just for beginners. For users like Lola and myself, who use MATLAB a daily basis, we use low code to get started, and then when we need to code, we move on to that. With that, I'm going to hand it over to Lola to show some of these tools in action using the case study that I described before.

    Thanks, Ono. So we'll be using a live script today as we go through our workflow. Live scripts are interactive documents that combine MATLAB code with formatted text, equations, images. You can also add embedded interactive tools right in your script that can help generate code for you, and that's what we're going to focus on today.

    So the first step is to access our data. If we're not sure what to do, I can simply start by typing some related word, like import, into the live script, and the Live Editor will give me suggestions. So here we see one called import data. This little icon tells us that it's not a function, but it's actually a small app that we call a task that gets embedded right here into our script. Now the file I need to import is an Excel file, but this same interface can be used for many different file types, such as audio files or text files.

    Let me grab my Excel file and we'll get going. So we noticed that this live task automatically picked the timetable data type by default for us. So it knew that it was coming from a spreadsheet and that it started with a time based variable. We could have picked from one of these other data types but I think timetable is going to work well for us.

    Right.

    So if we look through our timetable a bit, we also see true air speed, oil pressure, oil temperature. A bunch of other variables that are describing what the aircraft is doing at each point in time throughout the flight. So what do you think, Ono?

    This looks good to me. I'm glad that we are able to directly see the data that we imported. By the way, this live task is actually running MATLAB code, right?

    That's right. So actually, live tasks are excellent tools for generating code. So if I click this arrow at the bottom of the task, I can see all the code that was just run to import this timetable. So I can either copy and paste this code to wherever I need it, or if I'm done with the interactive tool, I can come up here and I can convert the task to editable code. Or I can simply minimize this section and just look at the output. So I think we've accomplished our import step. We now have the data in MATLAB and we're ready to start our analysis.

    Let's do that. One of the things that I do quite often at the very beginning of any kind of analysis is to plot it. Can we do some plotting using similar point and check approach?

    Of course. So there are a lot more of these tasks. So I come up here to the Task Gallery, I can see that import data task we just used. But there are many, many more. We have tasks for all kinds of workflows, such as cleaning missing data, finding trends, joining tables. But here we have our Create Plot Task, so let's take a look at that. So you can see here that there are lots of types of graphs and plots here, but let's go with our simple line plot again. And we can take a look at our first variable plotted against time. So first we had that true air speed.

    Right. The one we want to actually model against.

    Yeah, and we see here at the beginning and the end, we actually have some zeros.

    Right. Probably the flight was just-- hadn't even started at that time. That's why we have the zeros. But we don't need them, right?

    That's right. So we could definitely clean up this data, but let's do that a little bit later. For now, let's go ahead and take a look at the next variable.

    Oh, wow. What in the world?

    Yeah, this one looks interesting too. This has a lot of these dropouts, and I bet all of these zeros are indicating that the sensor went out or something and not that the pressure was actually zero.

    Hopefully not.

    So we'll definitely have to clean up those outliers.

    Yes.

    So there are more variables to look at, but I think you get the idea. We can quickly visualize the different data variables in the live task, and we get this code automatically. We don't have to rewrite this code every single time.

    This is nice. I think this very quickly gives us a sense of what this data looks like, and as we said at the very beginning, it is not always ready for modeling. So how about we start cleaning up some of these problems in this data set? How about let's start with the zeros in the true air speed data?

    All right. So that's actually fairly easy for us to clean up right here in the live script. So if I just display the table in the live script, then I get a nice, rich display that I can actually interact with. So if I open this here, then I see I have got a simple sorting feature here. But I can also filter the rows of the variable based on its value, so I can rearrange this here. Or I could also choose to include or omit missing rows, rows where this variable is missing.

    And as I'm interacting with that pop out, I can see that there's some code that's getting generated here. And if I like it and I want to add it to my script, I can just click Update Code and we have finished filtering the data by air speed as soon as I run this section. And we can see the values for true air speed that were zero are now gone. So this value of 10 that I picked here, of course, that's kind of arbitrary.

    Right.

    Right?

    It probably needs some kind of range to play around with.

    That's right. If we wanted to have an easy way to change this value in the future, sort of see what the results would look like with different input values, then we have a tool for that. This kind of tool is called a live control. So you can find this in the tool strip as well. Maybe here we want to grab a numeric slider. If I right click on the slider, I can update the properties. Here it's starting with a range of 0 to 20, but I can update that with a range of, say, 0 to 100.

    Or I could change how the execution works, means what is going to happen when I interact with this slider. So that's going to be really useful, especially if I want to share this script with my colleagues maybe in the future. So with this way we've seen, we can do some really simple interactive data filtering right here on the table, and then using live controls to help us as well.

    This is nice. I like how you use the live controls to quickly make this whole script very exploratory. One can just slide that slider and see its impact. Can we also do something similar for those oil pressure and clean those up?

    Yeah. So those dropouts were sort of interspersed throughout the variable, so that's not something that we could do with basic thresholding. Maybe we want something a bit more specific. Maybe we want something a bit more sophisticated. We do have a Live Editor task specifically for cleaning outlier data, but we actually have a lot more variables to take a look at, and we don't know if we might need to clean some other variables as well.

    True.

    So in this case, let's take a look at the Data Cleaner app by clicking this Clean Data button. So the Data Cleaner app is an app that's going to let us visualize, explore, and clean the data interactively. I've taken the liberty of pre-selecting our variable here and using this panel to visualize all of the variables in the table.

    So to get a good sense of what these data look like, I can scroll through and see each variable plotted against time. So there's my true air speed, and we've gotten rid of those zeros, which is great. There is the oil pressure, which I'm noticing all of those dropouts. I also have oil temperature, which, unfortunately, looks very similar to the oil pressure. It also has all these dropouts. I don't think the temperature actually went to minus 400.

    Yeah.

    So as I scroll down, I'm noticing here on the right I've got some summary statistics. In each one I see, for example, the missing count is zero, which is great. That's telling me that none of these variables have any missing data, which is good. Oh. When I get down here, I'm noticing these two variables--

    Very noisy.

    Yes, they're quite noisy. So if we're going to use these variables for modeling a bit later, we probably want to reduce the noise. All right. But for now, let's go ahead and just focus on those variables with dropouts, the oil pressure and the oil temperature. So here in our cleaning method gallery I've got a clean outlier data method. So as I look here at the title, I'm noticing that it's not actually picking up these dropouts as outliers, so I'm going to need to come over here and define the outliers differently.

    So if I change the detection method and the threshold factor a little bit, then I can detect these values as outliers. And I can interact with this plot here. I can zoom in and I can see that it's filling this outlier with linear interpolation there, which is great. And I like that, so I'm going to go ahead and click Accept. And we can move on to those other two variables that were noisy and clean those up as well. This time we have a smooth data cleaning method I'll try out.

    Oh, this looks nice.

    Yeah. So I actually really like this by default. If I wanted to smooth it a little bit more, I could just increase the smoothing factor here, or I could decrease it if I wanted to pick up a few more of the attributes from the original data. But I think I like this as well.

    Yeah.

    So now I've got two steps in my history panel. Now these steps are mutually exclusive, but others may not be. So if we're interested in seeing how it's going to affect the results, we can use this panel to manipulate the order. Just click and drag and rearrange them. Or I can see what happens if I want to omit a certain step, but I think I'm pretty happy with these two.

    Yep. This looks good.

    We fixed the two problems that we uncovered and now we want to export our results. So I could just export the table, but I think I'm probably going to want to do these steps over and over again on new data sets coming in, so I'll go ahead and generate a function. Now all I have to do is rename the function, save it, and I'm ready to use it in my script.

    So if I just run the function on the data, I can see that I've got my cleaned table here. So we just saw how we can use the data cleaner app to clean our data, and the app provided us with an interface to identify the problems in the data and try out different techniques to clean it until we were happy with the results.

    This looks really nice. I think the data is all clean. Can we quickly verify?

    Sure. So remember earlier when we were exploring the data with those plots, we were also generating the corresponding code.

    Right.

    So I have copied that plot code over here, and I can reuse that same plot code. All I have to do is run the section and I can see that my clean data function ran correctly.

    Yeah, no zeros.

    My zeros are gone, the dropouts in oil pressure are gone, and that worked great. So during this demo, we covered a few different ways to explore, analyze, and clean your data. We used several kinds of low code tools here. We used live tasks, table editing, live controls, and we even used an app today.

    If you need simple help adjusting parameters in your code, then we can use those live controls to do that. Live tasks are designed to help you with simple data analysis workflows while you are coding in your scripts, and we have apps that are designed for more sophisticated, self-contained workflows, like data cleaning, where you can try out different techniques before deciding on the correct one. So what do you think, Ono?

    This looks good to me. Thank you so much, Lola. So as you saw today, there's really no single ideal way to analyze and explore your data. It's more iterative process, and this is where local tools can be really, really helpful, but data analysis workflow is not just about accessing the data and cleaning it. There's a lot more to it you can access it from different data sources, like databases, hardware. During exploration you may be building models. You may be creating engineering applications, and then when it's ready you want to share.

    Now, what Lola showed you are some of the local tools that are available just in MATLAB, but we've barely scratched the surface. There are a lot of other different kind of local tools that are available in various toolboxes. For example, we talked about data access using the input live task, but then there's also the input tool, which is another app that provides more flexibility and control as you input your data. If your data is in a database, you can actually use the Database Explorer app to build a SQL statement and fetch your data. If your data is coming directly from hardware, there are apps for those too.

    Moving on to the analysis and exploration part. Lola showed some simple techniques on how to create plots. She also talked about grouping, but there are a lot more. If you need to do optimization, there are live tasks for optimization. If you need to do some domain specific engineering applications, like maybe build a control algorithm or maybe analyze your signal data in time and frequency domain, then you have apps for that as well, like the one that you see on the screen, which is the signal analyzer app.

    And then finally, if you're building some machine learning models, then there are apps that can help you to label the data. It can help you to train, test, and validate the data, as well as quantify it and deploy it. Speaking of deployment, there are various ways to share your results. Now, Lola was working with a script. Your script can be your analysis report. You saw how Lola was adding different text and code together. You can add many different richer options, richer text formatting options like hyperlinks or equations.

    And then once you are happy with that, you can share your results in the form of a PDF or an HTML file, or maybe a Word document. But if you want to actually take your code and put it in production, then there are ways to deploy that as well. The MATLAB Compiler app can help you to package your code and build a standalone executable. Or if you're working to generate code and embedded in hardware, then the MATLAB Coder app can take your code, generate C code or HDL code or GPU code and put it in your hardware.

    In summary, what we'd like to say is low code is a simple way to just get you started, and in many times it can get you to where you want to go. But if your need grows, then the MATLAB language can grow with you. Low code can get you started and then generate out the code that you can now take and build your program around it. With that, I'd like to thank Lola and thank everyone for joining us today, and feel free to ask your questions in the chat window and we'll use the rest of the time for the talk to answer your questions on low code techniques in MATLAB.

    View more related videos