textanalytics.unicode.nfkc

Unicode compatibility composed normalized form (NFKC)

Since R2022b

Syntax

``newStr = textanalytics.unicode.nfkc(str)``

Description

example

````newStr = textanalytics.unicode.nfkc(str)` normalizes the string `str` to the Unicode compatibility composed normalized form (NFKC).```

Examples

collapse all

Strings that look identical can have different underlying representations. The Unicode compatibility canonical composition form (NFKC) ensures that equivalent strings have a unique binary representation.

Consider the string `"eﬃcient"`, where the character `"ﬃ"` is represented by the code unit `"\xFB03"`. The string has length 7.

`str = compose("e\xFB03") + "cient"`
```str = "eﬃcient" ```
`strlength(str)`
```ans = 7 ```

Normalize the string using the `textanalytics.unicode.nfkc` function.

`newStr = textanalytics.unicode.nfkc(str)`
```newStr = "efficient" ```

View the length of the normalized string. The normalized representation includes two extra code units. In this case, the function replaces the `"ﬃ"` character with the string `"ffi"`.

`strlength(newStr)`
```ans = 9 ```

Extract the second to fourth code units of the normalized string.

`extractBetween(newStr,2,4)`
```ans = "ffi" ```

Check whether the strings `str` and `newStr` are equal using the `==` operator. The operator returns `0` because the strings have different underlying representations.

`tf = str == newStr`
```tf = logical 0 ```

Input Arguments

collapse all

Input text, specified as a string array, character vector, or cell array of character vectors.

Example: ```["An example of a short sentence."; "A second short sentence."]```

Data Types: `string` | `char` | `cell`

Output Arguments

collapse all

Output text, returned as a string array, character vector, or cell array of character vectors. `str` and `newStr` have the same data type.

collapse all

References

[1] Whistler, Ken, ed. "Unicode Standard Annex #15: Unicode Normalization Forms." Unicode Technical Reports, August 27, 2021. https://unicode.org/reports/tr15/.

Version History

Introduced in R2022b