NATSORT Examples

The function NATSORT sorts a string array or a cell array of character vectors, taking into account number values within the strings. This is known as natural order or an alphanumeric order. Note that MATLAB's inbuilt SORT function sorts only by character order, as does SORT in most programming languages.

To sort filenames, foldernames, or filepaths use NATSORTFILES.

To sort the rows of a string/cell array use NATSORTROWS.

Contents

Basic Usage: Integer Numbers

By default NATSORT interprets consecutive digits as being part of a single integer, any remaining substring/s are treated as text:

A = {'a2', 'a10', 'a1'};
sort(A)
natsort(A)
B = {'ver9.10', 'ver9.5', 'ver9.2', 'ver9.10.20', 'ver9.10.8'};
sort(B)
natsort(B)
ans = 
    'a1'    'a10'    'a2'
ans = 
    'a1'    'a2'    'a10'
ans = 
    'ver9.10'    'ver9.10.20'    'ver9.10.8'    'ver9.2'    'ver9.5'
ans = 
    'ver9.2'    'ver9.5'    'ver9.10'    'ver9.10.8'    'ver9.10.20'

Input 1: Array to Sort

The first input must be one of the following array types:

The sorted array is returned as the first output argument.

natsort(categorical(A)) % see also REORDERCATS
ans = 
     a1      a2      a10 

Input 2: Regular Expression

The optional second input argument is a regular expression which specifies the number matching (see "Regular Expressions" sections below for more examples of regular expressions for matching common numbers):

C = {'1.3','1.10','1.2'};
natsort(C) % By default match integers only.
natsort(C, '\d+\.?\d*') % Match decimal fractions.
ans = 
    '1.2'    '1.3'    '1.10'
ans = 
    '1.10'    '1.2'    '1.3'

Input 3+: Case Sensitivity

By default NATSORT provides a case-insensitive sort of the input strings. An optional input argument selects case-sensitive/insensitive sorting:

D = {'a2', 'A20', 'A1', 'a', 'A', 'a10','A2', 'a1'};
natsort(D, [], 'ignorecase') % default
natsort(D, [], 'matchcase')
ans = 
    'a'    'A'    'A1'    'a1'    'a2'    'A2'    'a10'    'A20'
ans = 
    'A'    'A1'    'A2'    'A20'    'a'    'a1'    'a2'    'a10'

Input 3+: Sort Direction

By default NATSORT provides an ascending sort of the input strings. An optional input argument selects the sort direction (note that characters and numbers are either both ascending or both descending):

E = {'2', 'a', '', '10', 'B', '1'};
natsort(E, [], 'ascend') % default
natsort(E, [], 'descend')
ans = 
    ''    '1'    '2'    '10'    'a'    'B'
ans = 
    'B'    'a'    '10'    '2'    '1'    ''

Input 3+: Char/Number Order

By default NATSORT sorts characters after numbers. An optional input argument selects if characters are treated as greater-than or less-than numbers:

natsort(E, [], 'num<char') % default
natsort(E, [], 'char<num')
ans = 
    ''    '1'    '2'    '10'    'a'    'B'
ans = 
    ''    'a'    'B'    '1'    '2'    '10'

Input 3+: NaN/Number Order

By default NATSORT sorts NaN after all other numbers. An optional input argument selects if NaN are treated as greater-than or less-than numbers:

F = {'10', '1', 'NaN', '2'};
natsort(F, '(NaN|\d+)', 'num<NaN') % default
natsort(F, '(NaN|\d+)', 'NaN<num')
ans = 
    '1'    '2'    '10'    'NaN'
ans = 
    'NaN'    '1'    '2'    '10'

Input 3+: SSCANF Format String (Floating Point, Hexadecimal, Octal, Binary, 64 Bit Integer)

The default format string '%f' will correctly parse many common number types: this includes decimal integers, decimal fractions, NaN, Inf, and numbers written in E-notation. For hexadecimal, octal, binary, and 64-bit integers the format string must be specified as an input argument. Supported SSCANF formats are shown in this table:

Format StringNumber Types
%e, %f, %g floating point numbers
%d signed decimal
%i signed decimal, octal, or hexadecimal
%ld, %li signed 64 bit, decimal, octal, or hexadecimal
%u unsigned decimal
%o unsigned octal
%x unsigned hexadecimal
%lu, %lo, %lxunsigned 64-bit decimal, octal, or hexadecimal
%b unsigned binary integer (custom parsing, not SSCANF)

For example large integers can be converted to 64-bit numerics, with their full precision:

G = {'18446744073709551614', '18446744073709551615', '18446744073709551613'};
natsort(G, [], '%lu')
ans = 
    '18446744073709551613'    '18446744073709551614'    '18446744073709551615'

Output 2: Sort Index

The second output argument is a numeric array of the sort indices ndx, such that Y = X(ndx) where Y = natsort(X):

H = {'abc2xyz0', 'abc10xyz', 'abc2xyz', 'abc1xyz'};
[out,ndx] = natsort(H)
out = 
    'abc1xyz'    'abc2xyz'    'abc2xyz0'    'abc10xyz'
ndx =
     4     3     1     2

Output 3: Debugging Array

The third output is a cell array which contains all matched numbers (after converting to numeric using the specified SSCANF format) and all non-number characters. This cell array is intended for visually confirming that the numbers are being correctly identified by the regular expression. Note that the rows of the debugging cell array are linearly indexed from the input cell array.

[~,~,dbg] = natsort(H)
dbg = 
    'abc'    [ 2]    'xyz'    [0]
    'abc'    [10]    'xyz'     []
    'abc'    [ 2]    'xyz'     []
    'abc'    [ 1]    'xyz'     []

Regular Expressions: Decimal Fractions, E-notation, +/- Sign

NATSORT relies on REGEXPI to detect numbers in the strings. In order to match the required number format (e.g. decimal fractions, exponents, or a positive/negative sign, etc.) simply provide a suitable regular expression as an optional input argument:

I = {'x+NaN', 'x11.5', 'x-1.4', 'x', 'x-Inf', 'x+0.3'};
sort(I)
natsort(I, '[-+]?(NaN|Inf|\d+\.?\d*)')
J = {'0.56e007', '', '43E-2', '10000', '9.8'};
sort(J)
natsort(J, '\d+\.?\d*([eE][-+]?\d+)?')
ans = 
    'x'    'x+0.3'    'x+NaN'    'x-1.4'    'x-Inf'    'x11.5'
ans = 
    'x'    'x-Inf'    'x-1.4'    'x+0.3'    'x11.5'    'x+NaN'
ans = 
    ''    '0.56e007'    '10000'    '43E-2'    '9.8'
ans = 
    ''    '43E-2'    '9.8'    '10000'    '0.56e007'

Regular Expressions: Hexadecimal, Octal, Binary Integers

Integers encoded in hexadecimal, octal, or binary may also be parsed and sorted correctly. This requires both an appropriate regular expression to detect the integers and also a suitable SSCANF format string for converting the detected number string into numeric:

K = {'a0X7C4z', 'a0X5z', 'a0X18z', 'a0XFz'};
sort(K)
natsort(K, '0X[0-9A-F]+', '%x') % hexadecimal
L = {'a11111000100z', 'a101z', 'a000000000011000z', 'a1111z'};
sort(L)
natsort(L, '[01]+', '%b') % binary
ans = 
    'a0X18z'    'a0X5z'    'a0X7C4z'    'a0XFz'
ans = 
    'a0X5z'    'a0XFz'    'a0X18z'    'a0X7C4z'
ans = 
    'a000000000011000z'    'a101z'    'a11111000100z'    'a1111z'
ans = 
    'a101z'    'a1111z'    'a000000000011000z'    'a11111000100z'

Bonus: Interactive Regular Expression Tool

Regular expressions are powerful and compact, but getting them right is not always easy. One assistance is to download my interactive tool IREGEXP, which lets you quickly try different regular expressions and see all of REGEXP's outputs displayed and updated as you type.