JapaneseVowelsTest Dataset explanation? Which features are used for train network

I am working on speech recognization using LSTM Networks. I am trying to fallow Classify Sequence Data Using LSTM Networks example given in MATLAB 2017b. In this example, sample data set is JapaneseVowelsTrain. In the explanation of the dataset only given this.
"This example uses the Japanese Vowels data set as described in [1] and [2]. This example trains an LSTM network to recognize the speaker given time series data representing two Japanese vowels spoken in succession. The training data contains time series data for nine speakers. Each sequence has 12 features and varies in length. The dataset contains 270 training observations and 370 test observations."
My Question is what kind of 12 features are given in X variable? Can I extract these features from my own dataset?

