- Correlation analysis: You can calculate the Pearson correlation coefficient between each time-series feature and the target variable and select the top features with the highest correlation values. This approach can help identify the features with the strongest linear relationship with the target variable. However, it may not capture more complex nonlinear relationships.
- Feature importance from a trained model: You can train a neural network model using all the available time-series features and then use feature importance techniques to determine which features are most important for the model's predictions. For example, you can use the feature importance scores from a random forest or gradient boosting model. This approach can capture both linear and nonlinear relationships between the features and target variable.
- Principal component analysis (PCA): PCA is a dimensionality reduction technique that can help identify the most important features that explain the most variance in the data. You can apply PCA to the time-series features and select the top principal components as inputs for your model. This approach can be useful when there are high correlations among the time-series features.
- Forward feature selection: You can use a forward feature selection algorithm to iteratively add the most informative time-series features to the model until the desired number of features is reached. This approach starts with an empty set of features and adds the most informative feature at each iteration based on a predefined criterion, such as the increase in model performance. This approach can be computationally expensive but can lead to a more optimal feature set.
- Lasso regression: Lasso regression is a sparse regression technique that can select the most important features while also performing feature regularization. Lasso regression can help identify the most relevant features for the model while also reducing the risk of overfitting. This approach is particularly useful when there are many features and the number of observations is limited.
Feature selection in sequence to one regression
6 views (last 30 days)
Show older comments
Michal Slezak
on 26 Jun 2023
Commented: Michal Slezak
on 30 Jun 2023
I have dataset that has about 3000 observations. Each observation consists of 28 time-series variables (pressure in particular areas of cardiovascular system) and a single-value (resistance of the heart valve).
My goal is to train a model (neural network) that would take some of those time-series as an input and do the regression of that single-value parameter.
Now, the question is how to do a feature selection, so that I could choose like 3-6 out of those 28 time-series as inputs. I don't need an already finished code but rather an idea or a clue.
If I had a sequence-to-sequence regression problem instead, I could simply use a Pearson correlation coefficient. If I had categorical data, I think I could use chi-square technique. But I cannot find out what to do in case of sequence-to-one regression problem.
0 Comments
Accepted Answer
Kautuk Raj
on 27 Jun 2023
In the case of a sequence-to-one regression problem, where you have multiple time-series features and a single-valued target variable, there are several feature selection techniques you can try. Here are a few ideas:
More Answers (0)
See Also
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!