Create Unique Interaction Variables

Efficiently adds N-way, custom-named, *unique* interaction variables to your dataset.
611 Downloads
Updated 4 Apr 2012

View License

function data_set = create_interaction_variables(data_set,vars,range_nways,separator,max_varname_length)
% create_interaction_variables
%
% for Matlab R13+
% version 1.1 (April 2012)
% (c) Brian Weidenbaum
% website: http://www.BrianWeidenbaum.com/.
%
%
% OUTPUT: your dataset (or a dataset based on your matrix), updated with new, aptly-named *unique* interaction variables,
% ranging from at least 2 to any number the user specifies
%
% INPUTS (** = OPTIONAL)
% input name: (input datatype/s) -- description
%
% data_set: (dataset OR matrix) -- the data you want to alter
%
% **vars: (cell array of chars/numbers, OR vector of numbers, OR 'ALL') --
% default: 'ALL'
% the names of the variables you want to interact; alternatively, the column numbers of the variables you want to interact
% OR, you can just say 'ALL' to include all variables automatically
%
% **range_nways (vector OR 'MAX') --
% default: 2
% the range of numbers of variables to include in the interaction terms generated by this function
% alternatively, just type 'MAX' to use 2 variables to the maximum possible number of vars
%
% **separator (char) --
% default: '_'
% the separator string you want to use to divide the
% variable names that contributed to a new interaction variable. Default
% is '_'. E.g., by default, an interaction between Var1 and Var2 will be
% named 'Var1_Var2'.
%
% **max_varname_length (number) --
% default: 63
% the maximum length of the newly created interaction terms' variable names.
% Any dynamically generated variable names (e.g. 'Var1_Var2') that exceed this number will be excluded from the new dataset.
% You should set max_varname_length according to the database you plan to use with your data--
% e.g., if you only want to use this data in MATLAB, you should set max_varname_length=63 (the maximum length supported by the dataset class,
% but if you plan to export your data to Oracle, you should set it to around 30
%
%
%
% EXAMPLES
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 3 ways, for all of your variables.
% You type:
% new_dataset = create_interaction_variables(old_dataset,'all','max');
% new_dataset will contain:
% a.*b, named 'a_b'
% a.*c, named 'a_c'
% b.*c, named 'b_c'
% and a.*b.*c, named 'a_b_c',
% plus all original variables.
% It will NOT contain b.*a, c.*a, etc because these are not unique combos.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 2 ways, only for columns 2 and 3.
% You type:
% new_dataset = create_interaction_variables(old_dataset,[2 3],2);
% new_dataset will contain only b.*c,
% plus all original variables.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 2 ways, only for 'a' and 'c'.
% You type:
% new_dataset = create_interaction_variables(old_dataset,{'a','c'},2);
% new_dataset will contain only a.*c,
% plus all original variables.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, ONLY 3 ways, only for all vars.
% You type:
% new_dataset = create_interaction_variables(old_dataset,'all',3);
% new_dataset will contain only a .* b .* c,
% plus all original variables.
%

Cite As

Brian Weidenbaum (2024). Create Unique Interaction Variables (https://www.mathworks.com/matlabcentral/fileexchange/35981-create-unique-interaction-variables), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2011b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Tables in Help Center and MATLAB Answers
Acknowledgements

Inspired by: allcomb(varargin)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.2.0.0

Changes between 1.0 and 1.1:
-added 'separator' parameter
-changed 'max_nways' parameter to 'range_nways'
-enforced maximum max_varname_length -reduced min number of arguments to 1
-minor performance tweaks

1.0.0.0