Update Your Code to Accept Strings

In R2016b, MATLAB® introduced string arrays as a data type for text. In a future release, all MathWorks® products will be compatible with string arrays. Compatible means that if you can specify text as a character vector or a cell array of character vectors, then you also can specify it as a string array. Now you can adopt string arrays as a text data type in your own code.

If you write code for other MATLAB users, then it is to your advantage to update your API to accept string arrays, while maintaining backward compatibility with other text data types. String adoption makes your code consistent with MathWorks products.

If your code has few dependencies, or if you are developing new code, then consider using string arrays as your primary text data type for better performance. In that case, best practice is to write or update your API to accept input arguments that are character vectors, cell arrays of character vectors, or string arrays.

For the definitions of string array and other terms, see Terminology for Character and String Arrays.

What Are String Arrays?

In MATLAB, you can store text data in two ways. One way is to use a character array, which is a sequence of characters, just as a numeric array is a sequence of numbers. Or, starting in R2016b, the other way is to store a sequence of characters in a string. You can store multiple strings in a string array. For more information, see Characters and Strings.

Recommended Approaches for String Adoption in Old APIs

When your code has many dependencies, and you must maintain backward compatibility, follow these approaches for updating functions and classes to present a compatible API.

Functions

  • Accept string arrays as input arguments.

    • If an input argument can be either a character vector or a cell array of character vectors, then update your code so that the argument also can be a string array. For example, consider a function that has an input argument you can specify as a character vector (using single quotes). Best practice is to update the function so that the argument can be specified as either a character vector or a string scalar (using double quotes).

  • Accept strings as both names and values in name-value pair arguments.

    • In name-value pair arguments, allow names to be specified as either character vectors or strings—that is, with either single or double quotes around the name. If a value can be a character vector or cell array of character vectors, then update your code so that it also can be a string array.

  • Do not accept cell arrays of string arrays for text input arguments.

    • A cell array of string arrays has a string array in each cell. For example, {"hello","world"} is a cell array of string arrays. While you can create such a cell array, it is not recommended for storing text. The elements of a string array have the same data type and are stored efficiently. If you store strings in a cell array, then you lose the advantages of using a string array.

      However, if your code accepts heterogeneous cell arrays as inputs, then consider accepting cell arrays that contain strings. You can convert any strings in such a cell array to character vectors.

  • In general, do not change the output type.

    • If your function returns a character vector or cell array of character vectors, then do not change the output type, even if the function accepts string arrays as inputs. For example, the fileread function accepts an input file name specified as either a character vector or a string, but the function returns the file contents as a character vector. By keeping the output type the same, you can maintain backward compatibility.

  • Return the same data type when the function modifies input text.

    • If your function modifies input text and returns the modified text as the output argument, then the input and output arguments should have the same data type. For example, the lower function accepts text as the input argument, converts it to all lowercase letters, and returns it. If the input argument is a character vector, then lower returns a character vector. If the input is a string array, then lower returns a string array.

  • Consider adding a 'TextType' argument to import functions.

    • If your function imports data from files, and at least some of that data can be text, then consider adding an input argument that specifies whether to return text as a character array or a string array. For example, the readtable function provides the 'TextType' name-value pair argument. This argument specifies whether readtable returns a table with text in cell arrays of character vectors or string arrays.

Classes

  • Treat methods as functions.

    • For string adoption, treat methods as though they are functions. Accept string arrays as input arguments, and in general, do not change the data type of the output arguments, as described in the previous section.

  • Do not change the data types of properties.

    • If a property is a character vector or a cell array of character vectors, then do not change its type. When you access such a property, the value that is returned is still a character vector or a cell array of character vectors.

      As an alternative, you can add a new property that is a string, and make it dependent on the old property to maintain compatibility.

  • Set properties using string arrays.

    • If you can set a property using a character vector or cell array of character vectors, then update your class to set that property using a string array too. However, do not change the data type of the property. Instead, convert the input string array to the data type of the property, and then set the property.

  • Add a string method.

    • If your class already has a char and/or a cellstr method, then add a string method. If you can represent an object of your class as a character vector or cell array of character vectors, then represent it as a string array too.

How to Adopt String Arrays in Old APIs

You can adopt strings in old APIs by accepting string arrays as input arguments, and then converting them to character vectors or cell arrays of character vectors. If you perform such a conversion at the start of a function, then you do not need to update the rest of it.

The convertStringsToChars function provides a way to process all input arguments, converting only those arguments that are string arrays. To enable your existing code to accept string arrays as inputs, add a call to convertStringsToChars at the beginnings of your functions and methods.

For example, if you have defined a function myFunc that accepts three input arguments, process all three inputs using convertStringsToChars. Leave the rest of your code unaltered.

function y = myFunc(a,b,c)
    [a,b,c] = convertStringsToChars(a,b,c);
    <line 1 of original code>
    <line 2 of original code>
    ...

In this example, the arguments [a,b,c] overwrite the input arguments in place. If any input argument is not a string array, then it is unaltered.

If myFunc accepts a variable number of input arguments, then process all the arguments specified by varargin.

function y = myFunc(varargin)
    [varargin{:}] = convertStringsToChars(varargin{:});
    ...

Performance Considerations

The convertStringsToChars function is more efficient when converting one input argument. If your function is performance sensitive, then you can convert input arguments one at a time, while still leaving the rest of your code unaltered.

function y = myFunc(a,b,c)
    a = convertStringsToChars(a);
    b = convertStringsToChars(b);
    c = convertStringsToChars(c);
    ...

Recommended Approaches for String Adoption in New Code

When your code has few dependencies, or you are developing entirely new code, consider using strings arrays as the primary text data type. String arrays provide good performance and efficient memory usage when working with large amounts of text. Unlike cell arrays of character vectors, string arrays have a homogenous data type. String arrays make it easier to write maintainable code. To use string arrays while maintaining backward compatibility to other text data types, follow these approaches.

Functions

  • Accept any text data types as input arguments.

    • If an input argument can be a string array, then also allow it to be a character vector or cell array of character vectors.

  • Accept character arrays as both names and values in name-value pair arguments.

    • In name-value pair arguments, allow names to be specified as either character vectors or strings—that is, with either single or double quotes around the name. If a value can be a string array, then also allow it to be a character vector or cell array of character vectors.

  • Do not accept cell arrays of string arrays for text input arguments.

    • A cell array of string arrays has a string array in each cell. While you can create such a cell array, it is not recommended for storing text. If your code uses strings as the primary text data type, store multiple pieces of text in a string array, not a cell array of string arrays.

      However, if your code accepts heterogeneous cell arrays as inputs, then consider accepting cell arrays that contain strings.

  • In general, return strings.

    • If your function returns output arguments that are text, then return them as string arrays.

  • Return the same data type when the function modifies input text.

    • If your function modifies input text and returns the modified text as the output argument, then the input and output arguments should have the same data type.

Classes

  • Treat methods as functions.

    • Accept character vectors and cell arrays of character vectors as input arguments, as described in the previous section. In general, return strings as outputs.

  • Specify properties as string arrays.

    • If a property contains text, then set the property using a string array. When you access the property, return the value as a string array.

How to Maintain Compatibility in New Code

When you write new code, or modify code to use string arrays as the primary text data type, maintain backward compatibility with other text data types. You can accept character vectors or cell arrays of character vectors as input arguments, and then immediately convert them to string arrays. If you perform such a conversion at the start of a function, then the rest of your code can use string arrays only.

The convertCharsToStrings function provides a way to process all input arguments, converting only those arguments that are character vectors or cell arrays of character vectors. To enable your new code to accept these text data types as inputs, add a call to convertCharsToStrings at the beginnings of your functions and methods.

For example, if you have defined a function myFunc that accepts three input arguments, process all three inputs using convertCharsToStrings.

function y = myFunc(a,b,c)
    [a,b,c] = convertCharsToStrings(a,b,c);
    <line 1 of original code>
    <line 2 of original code>
    ...

In this example, the arguments [a,b,c] overwrite the input arguments in place. If any input argument is not a character vector or cell array of character vectors, then it is unaltered.

If myFunc accepts a variable number of input arguments, then process all the arguments specified by varargin.

function y = myFunc(varargin)
    [varargin{:}] = convertCharsToStrings(varargin{:});
    ...

Performance Considerations

The convertCharsToStrings function is more efficient when converting one input argument. If your function is performance sensitive, then you can convert input arguments one at a time, while still leaving the rest of your code unaltered.

function y = myFunc(a,b,c)
    a = convertCharsToStrings(a);
    b = convertCharsToStrings(b);
    c = convertCharsToStrings(c);
    ...

How to Manually Convert Input Arguments

If it is at all possible, avoid manual conversion of input arguments that contain text, and instead use the convertStringsToChars or convertCharsToStrings functions. Checking the data types of input arguments and converting them yourself is a tedious approach, prone to errors.

If you must convert input arguments, then use the functions in this table.

Conversion

Function

String scalar to character vector

char

String array to cell array of character vectors

cellstr

Character vector to string scalar

string

Cell array of character vectors to string array

string

How to Check Argument Data Types

To check the data type of an input argument that could contain text, consider using the patterns shown in this table.

Required Input Argument Type

Old Check

New Check

Character vector or string scalar

ischar(X)

ischar(X) || isStringScalar(X)

validateattributes(X,{'char','string'},{'scalartext'})

Character vector or string scalar

validateattributes(X,{'char'},{'row'})

validateattributes(X,{'char','string'},{'scalartext'})

Nonempty character vector or string scalar

ischar(X) && ~isempty(X)

(ischar(X) || isStringScalar(X)) && strlength(X) ~= 0

(ischar(X) || isStringScalar(X)) && X ~= ""

Cell array of character vectors or string array

iscellstr(X)

iscellstr(X) || isstring(X)

Any text data type

ischar(X) || iscellstr(X)

ischar(X) || iscellstr(X) || isstring(X)

Check for Empty Strings

An empty string is a string with no characters. MATLAB displays an empty string as a pair of double quotes with nothing between them (""). However, an empty string is still a 1-by-1 string array. It is not an empty array.

The recommended way to check whether a string is empty is to use the strlength function.

str = "";
tf = (strlength(str) ~= 0)

Note

Do not use the isempty function to check for an empty string. An empty string has no characters but is still a 1-by-1 string array.

The strlength function returns the length of each string in a string array. If the string must be a string scalar, and also not empty, then check for both conditions.

tf = (isStringScalar(str) && strlength(str) ~= 0)

If str could be either a character vector or string scalar, then you still can use strlength to determine its length. strlength returns 0 if the input argument is an empty character vector ('').

tf = ((ischar(str) || isStringScalar(str)) && strlength(str) ~= 0)

Check for Empty String Arrays

An empty string array is, in fact, an empty array—that is, an array that has at least one dimension whose length is 0.

The recommended way to create an empty string array is to use the strings function, specifying 0 as at least one of the input arguments. The isempty function returns 1 when the input is an empty string array.

str = strings(0);
tf = isempty(str)

The strlength function returns a numeric array that is the same size as the input string array. If the input is an empty string array, then strlength returns an empty array.

str = strings(0);
L = strlength(str)

Check for Missing Strings

String arrays also can contain missing strings. The missing string is the string equivalent to NaN for numeric arrays. It indicates where a string array has missing values. The missing string displays as <missing>, with no quotation marks.

You can create missing strings using the missing function. The recommended way to check for missing strings is to use the ismissing function.

str = string(missing);
tf = ismissing(str)

Note

Do not check for missing strings by comparing a string to the missing string.

The missing string is not equal to itself, just as NaN is not equal to itself.

str = string(missing);
f = (str == missing)

Terminology for Character and String Arrays

MathWorks documentation uses these terms to describe character and string arrays. For consistency, use these terms in your own documentation, error messages, and warnings.

  • Character vector — 1-by-n array of characters, of data type char.

  • Character array — m-by-n array of characters, of data type char.

  • Cell array of character vectors — Cell array in which each cell contains a character vector.

  • String or string scalar — 1-by-1 string array. A string scalar can contain a 1-by-n sequence of characters, but is itself one object. Use the terms "string scalar" and "character vector" alongside each other when to be precise about size and data type. Otherwise, you can use the term "string" in descriptions.

  • String vector — 1-by-n or n-by-1 string array. If only one size is possible, then use it in your description. For example, use "1-by-n string array" to describe an array of that size.

  • String array — m-by-n string array.

  • Empty string — String scalar that has no characters.

  • Empty string array — String array with at least one dimension whose size is 0.

  • Missing string — String scalar that is the missing value (displays as <missing>).

See Also

| | | | | | | | | | | |

Related Topics