What Is Curve Fitting? Fitting Models to Data Made Easy with MATLAB - MATLAB
Video Player is loading.
Current Time 0:00
Duration 8:48
Loaded: 3.54%
Stream Type LIVE
Remaining Time 8:48
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
      Video length is 8:48

      What Is Curve Fitting? Fitting Models to Data Made Easy with MATLAB

      Curve fitting is a technique used to fit mathematical models to your data, helping you understand the relationship between different factors within your data set. Learn how to apply various curve fitting techniques using MATLAB® to wind turbine analysis with the aim of understanding how various factors influence power output.

      With MATLAB, you can:

      • Interactively fit curves and surfaces to your data using the Curve Fitter app
      • Explore higher dimensional models through linear and nonlinear regression methods from Statistics and Machine Learning Toolbox™
      • Optimize fitted models by specifying bounds and constraints with the functionality from Optimization Toolbox™
      • Incorporate your curve fits from the Curve Fitter app as a lookup table for use in Simulink®

      Published: 10 Apr 2024

      Data is everywhere, being generated by all the technology in our lives every second of every day. But how do we take all this jumbled data and turn it into a model? We fit a curve to it. Of course, this is easier said than done. How can we take all these unconnected data points and turn them into a cohesive and reusable model? That's the question we'll be answering today.

      What is curve fitting and how do we perform it in MATLAB? As you may well know, data in its raw form is largely unhelpful for an engineer who's just trying their best to make computations or conclusions based on it. For data to become truly usable and put an overwhelmed engineer at ease, it must be processed. Curve fitting does this processing by applying a trend to raw data through literally fitting a curve to it by fitting a mathematical expression that best represents the data.

      Plus, applying techniques like preprocessing and cleaning up data to make our curve fit more robust against pesky outliers and disruptive noisy data points. Curve fitting allows us to see the story that our data is trying to tell us. By cutting through the noise and individual data points and instead getting to the overall trends of the data and understanding the effects of different factors and parameters, we can glean real conclusions. This is essential when making sense of large data sets.

      Let's look at some different techniques to perform curve fitting in MATLAB and understand what these models tell us about the data. In this video, we'll see how we can quickly and interactively fit a curve to our data, extend it to a nonlinear regression model to analyze statistical significance of various predictors, then optimize our model by constraining the bounds of the model coefficients, and finally see how we can integrate our curve fit into a Simulink model.

      Now, I'm sure we've all seen data before. So let's look at some data which might just blow us away. It's data from wind turbines. Wind turbines rotate when the wind blows. This rotating motion then turns a generator, which creates power in the form of electricity. Based on this chain of events, we can plot out the sigmoidal relationship between wind speed and the power output of the turbine, representing wind turbine operation.

      We can assume this relationship is sigmoidal since the wind needs to be at a certain speed to begin pushing the wind turbines blades. And at a certain higher speed, the generator maxes out how much electricity can output. Other factors play a role in how much power a wind turbine puts out, but we'll get to those in a bit.

      So how do we get from data, the wind speed and power output of a turbine, with its fair share of outliers, to the sigmoidal curve? Well, curve fitting, of course. Let's dive in. Here we have a data set of the various factors affecting the power output of a wind turbine. Once we pull the data out of the spreadsheet and prepare to plot it in MATLAB, we can look at the important relationship between wind speed and power output.

      Before we do that, we can apply some basic preprocessing steps to clean our data and prepare it for curve fitting. We've already identified anomalous events from the performance logs, so we'll just remove those. For more specific data cleaning methods and techniques, you could use the Data Cleaner app. Let's now use the Curve Fitter app in MATLAB to not only plot the data points showing this relationship but also fit a curve to it.

      First, we select the data we want, wind speed on the x-axis and power output on the y. Then we select the type of curve we want fitted onto our data. As we can see by the shape of the data, we want to select a sigmoidal curve. And there we go. In just a few steps, we have fit an accurate curve plotting the relationship between wind speed and power output.

      Now, we could further refine this fit by excluding specific outlier points or modifying advanced fit options based on our problem requirements. But we have a pretty good fit here, so we'll work with this. Now that we've actually fit a curve to the data to show the sigmoidal relationship between wind speed and power output, let's look at the significance of the relationships between other wind factors and power output.

      In addition to wind speed, there is also wind direction, air density, turbulence, and wind shear. To understand the effects of all these factors, we can create a nonlinear model from the Statistics and Machine Learning Toolbox. Since we know the relationship between wind speed and power output is sigmoidal, we can define that as such. And for the other factors, we can assume a linear relationship for simplicity.

      We can see the impact of each factor in the plot slice window. As shown here, wind speed clearly has a big impact, and all the others have very small impacts, with all their curves looking close to horizontal. And if we look at wind shear specifically, its curve is almost exactly horizontal. And its p value is far above 0.05. So we can say with certainty that wind shear does not have a statistical significance on the power output. And we can exit from our model.

      Now that we fit a first guess at a curve to our data and have determined which factors have significance in our model, let's add bounds and constraints. This will create a more precise version of our model and will more accurately reflect the limitations of a wind turbine since there's no point modeling past those limits.

      To add these bounds and constraints, let's use the functionality from Optimization Toolbox and perform curve fitting through optimization with the optimized live task. We can define the variables or coefficients we want to optimize and set the start points from the model we obtained using Stats and Machine Learning Toolbox. Then let's add lower and upper bounds to constrain our coefficients. We can define the curve-fitting problem as minimizing the objective function, and we will rewrite the fitting equation we used earlier as the objective function, which minimizes the sum of squared errors.

      If we wanted to, we could add any complex linear or nonlinear constraints and specify the solver. Let's use lsqnonlin for this problem and specify the options, which will then solve our curve-fitting problem as an optimization problem.

      OK, so now we have two slightly different versions of our curve-fitting models. We can compare the two models using different statistical metrics, like R squared, adjusted R squared, root mean square error, and sum of squared errors. As we can see, the two models are quite similar, and either functionality is usable depending on the requirements and the problem to solve.

      All right, so we've made some pretty great models, here but what can we do with them now? We can use these models to predict power output for future data that is explainable as a mathematical model. If you're doing more advanced modeling, you could deploy machine learning and deep learning techniques.

      If you have a Simulink design for your system, you could also use these models to calibrate your designs, subsystem blocks, and lookup tables that incorporate these factors and responses. Here we have a Simulink model representing our wind turbine. There are several subsystems here, but the one we're interested in is the power controller subsystem. And inside of that, the MPPT control system that has two 1-D lookup tables.

      If you had the proper data, you could use this first lookup table showing the mpptOmega relationship. But what we care about today is the second lookup table, which represents the relationship between wind speed and power output. Remember, we modeled this way back at the start of the video using the Curve Fitter app. We can simply open that again and export this model as a Simulink lookup table and replace it here. Now, we have our subsystems and lookup tables calibrated based on the latest experimental data.

      So there you have it. We've gone from data to fitted curves, to regression with multiple predictors, to optimization with constraints, and finally to creating a Simulink lookup table from the original curve fit, which goes to show the wide range of goals you can achieve through telling the mind-blowing story of your data using curve fitting with MATLAB.

      For more information on curve fitting in MATLAB, check out the product pages for the toolboxes we used today, including Optimization, Stats and Machine Learning, and Curve Fitting Toolbox, which also features details on the Curve Fitter app, all linked in the description. Thanks for watching, and I'll see you in another video.