This example shows how to create a triangular probability distribution object based on sample data, and generate random numbers for use in a simulation.

Input the data vector `time`

, which contains
the observed length of time (in seconds) that 10 different cars stopped
at a highway tollbooth.

time = [6 14 8 7 16 8 23 6 7 15];

The data shows that, while most cars stopped for 6 to 16 seconds, one outlier stopped for 23 seconds.

Estimate the triangular distribution parameters from the sample data.

lower = min(time); peak = median(time); upper = max(time);

A triangular distribution provides a simplistic representation of the probability distribution when sample data is limited. Estimate the lower and upper boundaries of the distribution by finding the minimum and maximum values of the sample data. For the peak parameter, the median might provide a better estimate of the mode than the mean, since the data includes an outlier.

Create a triangular probability distribution object using the estimated parameter values.

pd = makedist('Triangular','a',lower,'b',peak,'c',upper)

pd = TriangularDistribution A = 6, B = 8, C = 23

Compute and plot the pdf of the triangular distribution.

figure; x = 0:.1:230; y = pdf(pd,x); plot(x,y); title('Time Spent at Tollbooth'); xlabel('Time (seconds)');

The plot shows that this triangular distribution is skewed to the right. However, since the estimated peak value is the sample median, the distribution should be symmetrical about the peak. Because of its skew, this model might, for example, generate random numbers that seem unusually high when compared to the initial sample data.

Generate random numbers from this distribution to simulate future traffic flow through the tollbooth.

rng('default'); % For reproducibility r = random(pd,10,1)

r = 16.1265 18.0987 8.0796 18.3001 13.3176 7.8211 9.4360 12.2508 19.7082 20.0078

The returned values in `r`

are the time in
seconds that the next 10 simulated cars spend at the tollbooth. These
values seem high compared to the values in the original data vector `time`

because
the outlier skewed the distribution to the right. Using the second-highest
value as the upper limit parameter might mitigate the effects of the
outlier and generate a set of random numbers more similar to the initial
sample data.

Estimate the upper boundary of the distribution using the second largest value in the sample data.

```
sort_time = sort(time,'descend');
secondLargest = sort_time(2);
```

Create a new triangular probability distribution object using the revised estimated parameters, and plot its pdf.

figure; pd2 = makedist('Triangular','a',lower,'b',peak,'c',secondLargest); y2 = pdf(pd2,x); plot(x,y2,'LineWidth',2); title('Time Spent at Tollbooth'); xlabel('Time (seconds)');

The plot shows that this triangular distribution is still slightly skewed to the right. However, it is much more symmetrical about the peak than the distribution that used the maximum sample data value to estimate the upper limit.

Generate new random numbers from the revised distribution.

rng('default'); % For reproducibility r2 = random(pd2,10,1)

r2 = 12.1501 13.2547 7.5937 13.3675 10.5768 7.3967 8.4026 9.9792 14.1562 14.3240

These new values more closely resemble those in the original
data vector `time`

. They are also closer to the sample
median than the random numbers generated by the distribution that
used the outlier to estimate its upper limit. This example does not
remove the outlier from the sample data when computing the median.
Other options for parameter estimation include removing outliers from
the sample data altogether, or using the mean or mode of the sample
data as the peak value.

Was this topic helpful?