histcounts

Histogram bin counts

Syntax

[N,edges]
= histcounts(X)

[N,edges]
= histcounts(X,nbins)

[N,edges]
= histcounts(X,edges)

[N,edges,bin]
= histcounts(___)

N = histcounts(C)

N = histcounts(C,Categories)

[N,Categories]
= histcounts(___)

[___] = histcounts(___,Name,Value)

Description

[N,edges] = histcounts(X) partitions the X values into bins and returns the bin counts and the bin edges. The histcounts function uses an automatic binning algorithm that returns uniform bins chosen to cover the range of elements in X and reveal the underlying shape of the distribution.

example

[N,edges] = histcounts(X,nbins) uses a number of bins specified by the scalar, nbins.

example

[N,edges] = histcounts(X,edges) sorts X into bins with the bin edges specified by the vector, edges.

example

[N,edges,bin] = histcounts(___) also returns an index array, bin, using any of the previous syntaxes. bin is an array of the same size as X whose elements are the bin indices for the corresponding elements in X. The number of elements in the kth bin is nnz(bin==k), which is the same as N(k).

example

N = histcounts(C), where C is a categorical array, returns a vector, N, that indicates the number of elements in C whose value is equal to each of C’s categories. N has one element for each category in C.

example

N = histcounts(C,Categories) counts only the elements in C whose value is equal to the subset of categories specified by Categories.

[N,Categories] = histcounts(___) also returns the categories that correspond to each count in N using either of the previous syntaxes for categorical arrays.

example

[___] = histcounts(___,Name,Value) specifies additional parameters using one or more name-value arguments. For example, you can specify BinWidth as a scalar to adjust the width of the bins for numeric data.

example

Examples

collapse all

Bin Counts and Bin Edges

Open Live Script

Distribute 100 random values into bins. histcounts automatically chooses an appropriate bin width to reveal the underlying distribution of the data.

X = randn(100,1);
[N,edges] = histcounts(X)

N = 1×7

     2    17    28    32    16     3     2

edges = 1×8

    -3    -2    -1     0     1     2     3     4

Specify Number of Bins

Open Live Script

Distribute 10 numbers into 6 equally spaced bins.

X = [2 3 5 7 11 13 17 19 23 29];
[N,edges] = histcounts(X,6)

N = 1×6

     2     2     2     2     1     1

edges = 1×7

         0    4.9000    9.8000   14.7000   19.6000   24.5000   29.4000

Specify Bin Edges

Open Live Script

Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.

X = randn(1000,1);
edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5];
N = histcounts(X,edges)

N = 1×10

     0    24   149   142   195   200   154   111    25     0

Normalized Bin Counts

Open Live Script

Distribute all of the prime numbers less than 100 into bins. Specify 'Normalization' as 'probability' to normalize the bin counts so that sum(N) is 1. That is, each bin count represents the probability that an observation falls within that bin.

X = primes(100);
[N,edges] = histcounts(X, 'Normalization', 'probability')

N = 1×4

    0.4000    0.2800    0.2800    0.0400

edges = 1×5

     0    30    60    90   120

Determine Bin Placement

Open Live Script

Distribute 100 random integers between -5 and 5 into bins, and specify 'BinMethod' as 'integers' to use unit-width bins centered on integers. Specify a third output for histcounts to return a vector representing the bin indices of the data.

X = randi([-5,5],100,1);
[N,edges,bin] = histcounts(X,'BinMethod','integers');

Find the bin count for the third bin by counting the occurrences of the number 3 in the bin index vector, bin. The result is the same as N(3).

count = nnz(bin==3)

count = 
8

Categorical Bin Counts

Open Live Script

Create a categorical vector that represents votes. The categories in the vector are 'yes', 'no', or 'undecided'.

A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})

C = 1×27 categorical
     no      no      yes      yes      yes      no      no      no      no      undecided      undecided      yes      no      no      no      yes      no      yes      no      yes      no      no      no      yes      yes      yes      yes

Determine the number of elements that fall into each category.

[N,Categories] = histcounts(C)

N = 1×3

    11    14     2

Categories = 1×3 cell
    {'yes'}    {'no'}    {'undecided'}

Input Arguments

collapse all

`X` — Data to distribute among bins
vector | matrix | multidimensional array

Data to distribute among bins, specified as a vector, matrix, or multidimensional array. If X is not a vector, then histcounts treats it as a single column vector, X(:).

histcounts ignores all NaN values. Similarly, histcounts ignores Inf and -Inf values unless the bin edges explicitly specify Inf or -Inf as a bin edge.

`C` — Categorical data
categorical array

Categorical data, specified as a categorical array. histcounts ignores undefined categorical values.

Data Types: categorical

`nbins` — Number of bins
positive integer

Number of bins, specified as a positive integer. If you do not specify nbins, then histcounts automatically calculates how many bins to use based on the values in X.

Example: [N,edges] = histcounts(X,15) uses 15 bins.

`edges` — Bin edges
vector

Bin edges, specified as a vector. edges(1) is the leading edge of the first bin, and edges(end) is the trailing edge of the last bin.

Each bin includes the leading edge, but does not include the trailing edge, except for the last bin which includes both edges.

For datetime and duration data, edges must be a datetime or duration vector in monotonically increasing order.

`Categories` — Categories included in count
all categories (default) | string vector | cell vector of character vectors | `pattern` scalar | categorical vector

Categories included in count, specified as a string vector, cell vector of character vectors, pattern scalar, or categorical vector. By default, histcounts uses a bin for each category in categorical array C. Use Categories to specify a unique subset of the categories instead.

Example: h = histcounts(C,["Large","Small"]) counts only the categorical data in the categories Large and Small.

Example: h = histcounts(C,"Y" + wildcardPattern) counts categorical data in all the categories whose names begin with the letter Y.

Data Types: string | cell | pattern | categorical

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: [N,edges] = histcounts(X,'Normalization','probability') normalizes the bin counts in N, such that sum(N) is 1.

`BinWidth` — Width of bins
positive scalar

Width of bins, specified as a positive scalar. If you specify BinWidth, then histcounts can use a maximum of 65,536 bins (or 2¹⁶). If the specified bin width requires more bins, then histcounts uses a larger bin width corresponding to the maximum number of bins.

For datetime and duration data, BinWidth can be a scalar duration or calendar duration.
If you specify BinWidth with BinMethod, NumBins, or BinEdges, histcounts only honors the last parameter.
This option does not apply to categorical data.

Example: histcounts(X,'BinWidth',5) uses bins with a width of 5.

`BinEdges` — Edges of bins
numeric vector

Edges of bins, specified as a numeric vector. The first element specifies the leading edge of the first bin. The last element specifies the trailing edge of the last bin. The trailing edge is only included for the last bin.

If you do not specify the bin edges, then histcounts automatically determines the bin edges.

If BinCountsMode is "manual", then BinEdges must be a row vector.

If you specify BinEdges with BinMethod, BinWidth, NumBins, or BinLimits, histcounts only honors BinEdges and BinEdges must be specified last.
This option does not apply to categorical data.

`BinLimits` — Bin limits
two-element vector

Bin limits, specified as a two-element vector, [bmin,bmax]. The first element indicates the first bin edge. The second element indicates the last bin edge.

This option computes using only the data that falls within the bin limits inclusively, X>=bmin & X<=bmax.

This option does not apply to categorical data.

Example: histcounts(X,'BinLimits',[1,10]) bins only the values in X that are between 1 and 10 inclusive.

`BinMethod` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'` | ...

Binning algorithm, specified as one of the values in this table.

Value	Description
`'auto'`	The default `'auto'` algorithm chooses a bin width to cover the data range and reveal the shape of the underlying distribution.
`'scott'`	Scott’s rule is optimal if the data is close to being normally distributed. This rule is appropriate for most other distributions, as well. It uses a bin width of `3.5std(X(:))numel(X)^(-1/3)`.
`'fd'`	The Freedman-Diaconis rule is less sensitive to outliers in the data, and might be more suitable for data with heavy-tailed distributions. It uses a bin width of `2iqr(X(:))numel(X)^(-1/3)`, or when `X` contains extreme outliers, `0.2(max(X(:))-min(X(:)))numel(X)^(-1/3)`.
`'integers'`	The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To avoid accidentally creating too many bins, you can use this rule to create a limit of 65536 bins (2¹⁶). If the data range is greater than 65536, then the integer rule uses wider bins instead. `'integers'` does not support datetime or duration data.
`'sturges'`	Sturges’ rule is popular due to its simplicity. It chooses the number of bins to be `ceil(1 + log2(numel(X)))` or `1`, whichever is greater.
`'sqrt'`	The Square Root rule is widely used in other software packages. It chooses the number of bins to be `ceil(sqrt(numel(X)))` or `1`, whichever is greater.

histcounts adjusts the number of bins slightly so that the bin edges fall on "nice" numbers, rather than using these exact formulas.

For datetime or duration data, specify the binning algorithm as one of these units of time.

Value	Description	Data Type
`"second"`	Each bin is 1 second.	`datetime` and `duration`
`"minute"`	Each bin is 1 minute.	`datetime` and `duration`
`"hour"`	Each bin is 1 hour.	`datetime` and `duration`
`"day"`	Each bin is 1 calendar day. This value accounts for daylight saving time shifts.	`datetime` and `duration`
`"week"`	Each bin is 1 calendar week.	`datetime` only
`"month"`	Each bin is 1 calendar month.	`datetime` only
`"quarter"`	Each bin is 1 calendar quarter.	`datetime` only
`"year"`	Each bin is 1 calendar year. This value accounts for leap days.	`datetime` and `duration`
`"decade"`	Each bin is 1 decade (10 calendar years).	`datetime` only
`"century"`	Each bin is 1 century (100 calendar years).	`datetime` only

If you specify BinMethod for datetime or duration data, then histcounts can use a maximum of 65,536 bins (or 2¹⁶). If the specified bin duration requires more bins, then histcounts uses a larger bin width corresponding to the maximum number of bins.
If you specify BinLimits, NumBins, BinEdges, or BinWidth, then BinMethod is set to 'manual'.
If you specify BinMethod with BinWidth, NumBins or BinEdges, histcounts only honors the last parameter.
This option does not apply to categorical data.

Example: histcounts(X,'BinMethod','integers') centers the bins on integers.

`Normalization` — Type of normalization
`'count'` (default) | `'probability'` | `'percentage'` | `'countdensity'` | `'cumcount'` | `'pdf'` | `'cdf'`

Type of normalization, specified as one of the values in this table. For each bin i:

$v_{i}$ is the bin value.
$c_{i}$ is the number of elements in the bin.
$w_{i}$ is the width of the bin.
$N$ is the number of elements in the input data. This value can be greater than the binned data if the data contains missing values, such as NaN, or if some of the data lies outside the bin limits.

Value	Bin Values	Notes
`'count'` (default)	$v_{i} = c_{i}$	Count or frequency of observations. Sum of bin values is at most `numel(X)`, or `sum(ismember(X(:),'Categories'))` for categorical data. The sum is less than this only when some of the input data is not included in the bins.
`'probability'`	$v_{i} = \frac{c_{i}}{N}$	Relative probability. The number of elements in each bin relative to the total number of elements in the input data is at most 1.
`'percentage'`	$v_{i} = 100 * \frac{c_{i}}{N}$	Relative percentage. The percentage of elements in each bin is at most 100.
`'countdensity'`	$v_{i} = \frac{c_{i}}{w_{i}}$	Count or frequency scaled by width of bin. For categorical data, this is the same as `'count'`. `'countdensity'` does not support `datetime` or `duration` data. The sum of the bin areas is at most `numel(X)`.
`'cumcount'`	$v_{i} = \sum_{j = 1}^{i} c_{j}$	Cumulative count, or the number of observations in each bin and all previous bins. `N(end)` is at most `numel(X)`, or `sum(ismember(X(:),'Categories'))` for categorical data.
`'pdf'`	$v_{i} = \frac{c_{i}}{N \cdot w_{i}}$	Probability density function estimate. For categorical data, this is the same as `'probability'`. `'pdf'` does not support `datetime` or `duration` data. The sum of the bin areas is at most `1`.
`'cdf'`	$v_{i} = \sum_{j = 1}^{i} \frac{c_{j}}{N}$	Cumulative distribution function estimate. The count of each bin is equal to the cumulative relative number of observations in the bin and all previous bins. `N(end)` is at most 1.

Example: histcounts(X,'Normalization','pdf') bins the data using an estimate of the probability density function.

`NumBins` — Number of bins
positive integer

Number of bins, specified as a positive integer. If you do not specify NumBins, then histcounts automatically calculates how many bins to use based on the input data.

If you specify NumBins with BinMethod, BinWidth or BinEdges, histcounts only honors the last parameter.
This option does not apply to categorical data.

Output Arguments

collapse all

`N` — Bin counts
row vector

Bin counts, returned as a row vector.

`edges` — Bin edges
vector

Bin edges, returned as a vector. The first element is the leading edge of the first bin. The last element is the trailing edge of the last bin.

`bin` — Bin indices
array

Bin indices, returned as an array of the same size as X. Each element in bin describes which numbered bin contains the corresponding element in X.

A value of 0 in bin indicates an element which does not belong to any of the bins (for example, a NaN value).

`Categories` — Categories included in count
cell vector of character vectors

Categories included in count, returned as a cell vector of character vectors. Categories contains the categories in C that correspond to each count in N.

Tips

The behavior of histcounts is similar to that of the discretize function. Use histcounts to find the number of elements in each bin. On the other hand, use discretize to find which bin each element belongs to (without counting).

Extended Capabilities

expand all

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The histcounts function supports tall arrays with the following usage notes and limitations:

Some input options are not supported. The allowed options are:
- BinWidth
- BinLimits
- Normalization
- BinMethod — The 'auto' and 'scott' bin methods are the same. The 'fd' bin method is not supported.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
The Categories input argument does not support pattern expressions.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Usage notes and limitations:

Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
The Categories input argument does not support pattern expressions.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The histcounts function supports GPU array input with these usage notes and limitations:

64-bit integers are not supported.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2014b

expand all

R2023b: Normalize using percentages

You can normalize histogram values as percentages by specifying the Normalization name-value argument as 'percentage'.

R2023a: Improved performance with small numeric and logical input data

The histcounts function shows improved performance for numeric and logical data due to faster input parsing. The performance improvement is more significant when input parsing is a greater portion of the computation time. This situation occurs when the size of the data to distribute among bins is smaller than 2000 elements.

For example, this code calculates histogram bin counts for a 1000-element vector. The code is about 3x faster than in the previous release.

function timingHistcounts
X = rand(1,1000);
for k = 1:3e3
    histcounts(X,"BinMethod","auto");
end
end

The approximate execution times are:

R2022b: 0.62 s

R2023a: 0.21 s

The code was timed on a Windows^® 10, Intel^® Xeon^® CPU E5-1650 v4 @ 3.60 GHz test system using the timeit function.

timeit(@timingHistcounts)

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

`X` — Data to distribute among bins
vector | matrix | multidimensional array

`C` — Categorical data
categorical array

`nbins` — Number of bins
positive integer

`edges` — Bin edges
vector

`Categories` — Categories included in count
all categories (default) | string vector | cell vector of character vectors | `pattern` scalar | categorical vector

Name-Value Arguments

`BinWidth` — Width of bins
positive scalar

`BinEdges` — Edges of bins
numeric vector

`BinLimits` — Bin limits
two-element vector

`BinMethod` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'` | ...

`Normalization` — Type of normalization
`'count'` (default) | `'probability'` | `'percentage'` | `'countdensity'` | `'cumcount'` | `'pdf'` | `'cdf'`

`NumBins` — Number of bins
positive integer

Output Arguments

`N` — Bin counts
row vector

`edges` — Bin edges
vector

`bin` — Bin indices
array

`Categories` — Categories included in count
cell vector of character vectors

Tips

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023b: Normalize using percentages

R2023a: Improved performance with small numeric and logical input data

See Also

Topics

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

X — Data to distribute among bins vector | matrix | multidimensional array

C — Categorical data categorical array

nbins — Number of bins positive integer

edges — Bin edges vector

Categories — Categories included in count all categories (default) | string vector | cell vector of character vectors | pattern scalar | categorical vector

Name-Value Arguments

BinWidth — Width of bins positive scalar

BinEdges — Edges of bins numeric vector

BinLimits — Bin limits two-element vector

BinMethod — Binning algorithm 'auto' (default) | 'scott' | 'fd' | 'integers' | 'sturges' | 'sqrt' | ...

Normalization — Type of normalization 'count' (default) | 'probability' | 'percentage' | 'countdensity' | 'cumcount' | 'pdf' | 'cdf'

NumBins — Number of bins positive integer

Output Arguments

N — Bin counts row vector

edges — Bin edges vector

bin — Bin indices array

Categories — Categories included in count cell vector of character vectors

Tips

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023b: Normalize using percentages

R2023a: Improved performance with small numeric and logical input data

See Also

Topics

`X` — Data to distribute among bins
vector | matrix | multidimensional array

`C` — Categorical data
categorical array

`nbins` — Number of bins
positive integer

`edges` — Bin edges
vector

`Categories` — Categories included in count
all categories (default) | string vector | cell vector of character vectors | `pattern` scalar | categorical vector

`BinWidth` — Width of bins
positive scalar

`BinEdges` — Edges of bins
numeric vector

`BinLimits` — Bin limits
two-element vector

`BinMethod` — Binning algorithm
`'auto'` (default) | `'scott'` | `'fd'` | `'integers'` | `'sturges'` | `'sqrt'` | ...

`Normalization` — Type of normalization
`'count'` (default) | `'probability'` | `'percentage'` | `'countdensity'` | `'cumcount'` | `'pdf'` | `'cdf'`

`NumBins` — Number of bins
positive integer

`N` — Bin counts
row vector

`edges` — Bin edges
vector

`bin` — Bin indices
array

`Categories` — Categories included in count
cell vector of character vectors

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.