Main Content

splitGraphemes

Split string into graphemes

Description

newStr = splitGraphemes(str) splits the string str into graphemes. A grapheme (also known as grapheme cluster) is the Unicode term for human-perceived characters.

example

Examples

collapse all

Split text into graphemes using the splitGraphemes function.

A grapheme (also known as grapheme clusters) is the Unicode term for human-perceived characters. Some graphemes contain multiple code units. For example, the "smiling face with sunglasses" emoji (😎 with code point U+1F60E) is a single grapheme but comprises two UTF16 code units "D83D" and "DE0E".

Split the text "Smile! 😎" into graphemes.

str = "Smile! " + compose("\xD83D\xDE0E")
str = 
"Smile! 😎"
newStr = splitGraphemes(str)
newStr = 8x1 string
    "S"
    "m"
    "i"
    "l"
    "e"
    "!"
    " "
    "😎"

Here, the function does not split the emoji into multiple characters.

Input Arguments

collapse all

Input text, specified as a string array, character vector, or cell array of character vectors. For string array and cell array input, each element of str must have the same number of graphemes.

If the number of graphemes is not the same for every element of str, then call the function in a for-loop to split the elements of str one at a time.

Data Types: string | char | cell

Output Arguments

collapse all

Split graphemes, returned as a string array or a cell array of character vectors. If str is a string array, then newStr is also a string array. Otherwise, newStr is a cell array of character vectors.

The size of newStr depends on the input:

  • If str is a string scalar or a character vector, then newStr is an numGraphemes-by-1 string array or cell array, where numGraphemes is the number of graphemes.

  • If str is an M-by-1 string array or cell array, then newStr is a M-by-numGraphemes array.

  • If str is a 1-by-N string array or cell array, then newStr is a 1-by-N-by-numGraphemes array.

For a string array or cell array of any size, the function orients the split graphemes along the first trailing dimension with size 1.

Version History

Introduced in R2019a