Main Content


Class: dataset

(Not Recommended) Print summary of dataset array

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.


s = summary(A)


summary(A) prints a summary of a dataset array and the variables that it contains.

s = summary(A) returns a scalar structure s that contains a summary of the dataset A and the variables that A contains. For more information on the fields in s, see Outputs.

Summary information depends on the type of the variables in the data set:

  • For numerical variables, summary computes a five-number summary of the data, giving the minimum, the first quartile, the median, the third quartile, and the maximum.

  • For logical variables, summary counts the number of trues and falses in the data.

  • For categorical variables, summary counts the number of data at each level.

Output Arguments

The following list describes the fields in the structure s:

  • Description — A character array containing the dataset description.

  • Variables — A structure array with one element for each dataset variable in A. Each element has the following fields:

    • Name — A character vector containing the name of the variable.

    • Description — A character vector containing the variable's description.

    • Units — A character vector containing the variable's units.

    • Size — A numeric vector containing the size of the variable.

    • Class — A character vector containing the class of the variable.

    • Data — A scalar structure containing the following fields.

      For numeric variables:

      • Probabilities — A numeric vector containing the probabilities [0.0 .25 .50 .75 1.0] and NaN (if any are present in the corresponding dataset variable).

      • Quantiles — A numeric vector containing the values that correspond to 'Probabilities' for the corresponding dataset variable, and a count of NaNs (if any are present).

      For logical variables:

      • Values — The logical vector [true false].

      • Counts — A numeric vector of counts for each logical value.

      For categorical variables:

      • Levels — A cell array containing the labels for each level of the corresponding dataset variable.

      • Counts — A numeric vector of counts for each level.

      'Data' is empty if variable is not numeric, categorical, or logical. If a dataset variable has more than one column, then the corresponding 'Quantiles' or 'Counts' field is a matrix or an array.


Summarize Fisher's iris data:

load fisheriris
species = nominal(species);
data = dataset(species,meas);
species: [150x1 nominal]
  setosa   versicolor   virginica
      50           50          50
meas: [150x4 double]
  min       4.3000         2         1    0.1000 
  1st Q     5.1000    2.8000    1.6000    0.3000 
  median    5.8000         3    4.3500    1.3000 
  3rd Q     6.4000    3.3000    5.1000    1.8000 
  max       7.9000    4.4000    6.9000    2.5000

Summarize the data in hospital.mat:

load hospital

Dataset array created from the data file hospital.dat.

The first column of the file ("id") is used for observation
names.  Other columns ("sex" and "smoke") have been 
converted from their original coded values into categorical
and logical variables.  Two sets of columns ("sys" and 
"dia", "trial1" through "trial4") have been combined into 
single variables with multivariate observations.  Column 
headers have been replaced with more descriptive variable 
names.  Units have been added where appropriate.

LastName: [100x1 cell array of character vectors]
Sex: [100x1 nominal]
     Female      Male 
         53        47 

Age: [100x1 double, Units = Yrs]
     min      1st Q      median      3rd Q      max
      25         32          39         44       50

Weight: [100x1 double, Units = Lbs]
     min      1st Q         median        3rd Q        max
     111      130.5000      142.5000      180.5000     202

Smoker: [100x1 logical]
     true      false 
       34         66 

BloodPressure: [100x2 double, Units = mm Hg]
     min              109           68 
     1st Q       117.5000      77.5000 
     median           122      81.5000 
     3rd Q       127.5000           89 
     max              138           99 

Trials: [100x1 cell, Units = Counts]
From zero to four measurement trials performed

See Also

| |