# Split Data into Groups and Calculate Statistics

This example shows how to split data from the `patients.mat` data file into groups. Then it shows how to calculate mean weights and body mass indices, and variances in blood pressure readings, for the groups of patients. It also shows how to summarize the results in a table.

Load sample data gathered from 100 patients.

`load patients`

Convert `Gender` and `SelfAssessedHealthStatus` to categorical arrays.

```Gender = categorical(Gender); SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); whos```
``` Name Size Bytes Class Attributes Age 100x1 800 double Diastolic 100x1 800 double Gender 100x1 330 categorical Height 100x1 800 double LastName 100x1 11616 cell Location 100x1 14208 cell SelfAssessedHealthStatus 100x1 560 categorical Smoker 100x1 100 logical Systolic 100x1 800 double Weight 100x1 800 double ```

### Calculate Mean Weights

Split the patients into nonsmokers and smokers using the `Smoker` variable. Calculate the mean weight for each group.

```[G,smoker] = findgroups(Smoker); meanWeight = splitapply(@mean,Weight,G)```
```meanWeight = 2×1 149.9091 161.9412 ```

The `findgroups` function returns `G`, a vector of group numbers created from `Smoker`. The `splitapply` function uses `G` to split `Weight` into two groups. `splitapply` applies the `mean` function to each group and concatenates the mean weights into a vector.

`findgroups` returns a vector of group identifiers as the second output argument. The group identifiers are logical values because `Smoker` contains logical values. The patients in the first group are nonsmokers, and the patients in the second group are smokers.

`smoker`
```smoker = 2x1 logical array 0 1 ```

Split the patient weights by both gender and status as a smoker and calculate the mean weights.

```G = findgroups(Gender,Smoker); meanWeight = splitapply(@mean,Weight,G)```
```meanWeight = 4×1 130.3250 130.9231 180.0385 181.1429 ```

The unique combinations across `Gender` and `Smoker` identify four groups of patients: female nonsmokers, female smokers, male nonsmokers, and male smokers. Summarize the four groups and their mean weights in a table.

```[G,gender,smoker] = findgroups(Gender,Smoker); T = table(gender,smoker,meanWeight)```
```T=4×3 table gender smoker meanWeight ______ ______ __________ Female false 130.32 Female true 130.92 Male false 180.04 Male true 181.14 ```

`T.gender` contains categorical values, and `T.smoker` contains logical values. The data types of these table variables match the data types of `Gender` and `Smoker` respectively.

Calculate body mass index (BMI) for the four groups of patients. Define a function that takes `Height` and `Weight` as its two input arguments, and that calculates BMI.

```meanBMIfcn = @(h,w)mean((w ./ (h.^2)) * 703); BMI = splitapply(meanBMIfcn,Height,Weight,G)```
```BMI = 4×1 21.6721 21.6686 26.5775 26.4584 ```

### Group Patients Based on Self-Reports

Calculate the fraction of patients who report their health as either `Poor` or `Fair`. First, use `splitapply` to count the number of patients in each group: female nonsmokers, female smokers, male nonsmokers, and male smokers. Then, count only those patients who report their health as either `Poor` or `Fair`, using logical indexing on `S` and `G`. From these two sets of counts, calculate the fraction for each group.

```[G,gender,smoker] = findgroups(Gender,Smoker); S = SelfAssessedHealthStatus; I = ismember(S,{'Poor','Fair'}); numPatients = splitapply(@numel,S,G); numPF = splitapply(@numel,S(I),G(I)); numPF./numPatients```
```ans = 4×1 0.2500 0.3846 0.3077 0.1429 ```

Compare the standard deviation in `Diastolic` readings of those patients who report `Poor` or `Fair` health, and those patients who report `Good` or `Excellent` health.

```stdDiastolicPF = splitapply(@std,Diastolic(I),G(I)); stdDiastolicGE = splitapply(@std,Diastolic(~I),G(~I));```

Collect results in a table. For these patients, the female nonsmokers who report `Poor` or `Fair` health show the widest variation in blood pressure readings.

`T = table(gender,smoker,numPatients,numPF,stdDiastolicPF,stdDiastolicGE,BMI)`
```T=4×7 table gender smoker numPatients numPF stdDiastolicPF stdDiastolicGE BMI ______ ______ ___________ _____ ______________ ______________ ______ Female false 40 10 6.8872 3.9012 21.672 Female true 13 5 5.4129 5.0409 21.669 Male false 26 8 4.2678 4.8159 26.578 Male true 21 3 5.6862 5.258 26.458 ```