How to identify strings that can be converted to float numbers in a cell array?

I can convert a string to double using str2double, but my goal is to identify such cells. Attached is an example file.
A = readcell('test6.xlsx');
As you can see, some cells on the 2nd column meet this criteria.
Many thanks.

 Accepted Answer

I would suggest using readtable instead of readcell
A = readcell('test6.xlsx')
A = 9x6 cell array
{'Cruise ID'} {'Salinity_flag'} {'Longitude_DEC'} {'Temp' } {'Salinity'} {'EXPO' } {'ID12' } {[ 2]} {[ -25.0100]} {[ 27.7690]} {[ 35.5800]} {'33RC1991'} {'ID12' } {[ 2]} {[ -25.0020]} {[ 25.9660]} {[ 35.3670]} {0x0 char } {'ID12' } {0x0 char } {[ -24.9930]} {[ 25.0250]} {[ 35.8480]} {[ 200105]} {0x0 char } {0x0 char } {[ -25.0010]} {[<missing>]} {0x0 char } {[ 456]} {'ID12' } {'2' } {[ -25.0020]} {[ 25.3260]} {[ 36.0880]} {[ 3689]} {'ID12' } {'2' } {[ -25]} {[ 26.1690]} {[ 35.9120]} {[ 3256778]} {'ID12' } {'2' } {[ -25]} {[<missing>]} {[ 36.2220]} {'33RC1991'} {'ID12' } {[ 2]} {[ -25]} {[<missing>]} {[ 36.3660]} {'33RC1991'}
You can define the variable types so Matlab knows which columns (variables) are supposed to be numeric.
opts = detectImportOptions('test6.xlsx');
opts.VariableTypes = { 'char', 'double', 'double', 'double', 'double', 'double' };
T = readtable('test6.xlsx', opts)
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T = 8x6 table
CruiseID Salinity_flag Longitude_DEC Temp Salinity EXPO __________ _____________ _____________ ______ ________ __________ {'ID12' } 2 -25.01 27.769 35.58 NaN {'ID12' } 2 -25.002 25.966 35.367 NaN {'ID12' } NaN -24.993 25.025 35.848 2.001e+05 {0x0 char} NaN -25.001 NaN NaN 456 {'ID12' } 2 -25.002 25.326 36.088 3689 {'ID12' } 2 -25 26.169 35.912 3.2568e+06 {'ID12' } 2 -25 NaN 36.222 NaN {'ID12' } 2 -25 NaN 36.366 NaN
And, if you really want a cell array instead of a table:
C = table2cell(T)
C = 8x6 cell array
{'ID12' } {[ 2]} {[-25.0100]} {[27.7690]} {[35.5800]} {[ NaN]} {'ID12' } {[ 2]} {[-25.0020]} {[25.9660]} {[35.3670]} {[ NaN]} {'ID12' } {[NaN]} {[-24.9930]} {[25.0250]} {[35.8480]} {[ 200105]} {0x0 char} {[NaN]} {[-25.0010]} {[ NaN]} {[ NaN]} {[ 456]} {'ID12' } {[ 2]} {[-25.0020]} {[25.3260]} {[36.0880]} {[ 3689]} {'ID12' } {[ 2]} {[ -25]} {[26.1690]} {[35.9120]} {[3256778]} {'ID12' } {[ 2]} {[ -25]} {[ NaN]} {[36.2220]} {[ NaN]} {'ID12' } {[ 2]} {[ -25]} {[ NaN]} {[36.3660]} {[ NaN]}

9 Comments

Many thanks for the proposed solution.
Unfortunately, I have hundreds of these files, and each of them is like a box of chocolate. I have no idea which column should be read as char or double. I have to write my own program to decipher that. That's why I used "readcell".
So far I'm able to identify these:
A = readcell('test6.xlsx');
% Determine the cells with doubles:
map1 = cellfun('isclass', A, 'double');
% Determine the cells with empty strings, like ' ':
map2 = cellfun(@(x)ischar(x) && isempty(deblank(x)),A);
% Determine the cells with missing values:
map3 = ismissing(string(A));
I need a map4 to identify strings that can potentially be converted into a double.
It is possible to write a regexp pattern that matches floating point numbers, but it is a nuisance to do so. It goes something like
'$\w*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\w*$'
The major nuisance here is that floating point numbers might start with a digit (after optional sign) and that might be followed by an optional period, but floating point numbers might also start directly with a period -- and you need to eliminate the possibility that the floating point number has only a period.
Many thanks. How specifically should I write such a regexp script?
Something like the below? Unfortunately, it does not work for me? The results show all of the cells as 0x0 double.
A = readcell('test6.xlsx');
B = regexp(string(A), '$\w*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\w*$')
A = readcell('test6.xlsx');
mask = cellfun(@(C) ischar(C) && regexp(C, '$\s*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\s*$', 'once'), A);
This includes correction of the two \w to \s
Note that space is not permitted after the optional leading sign. Strictly speaking, the syntax for literals does not permit a space at that point, and "no space" is what is enforced by str2double(). In practice if you were entering numbers into MATLAB, then if the sign is not confused as indicating an operation between adjacent numbers, then the sign would be treated as a unary-minus or unary-plus operation
123 +456 %treated as being plus, giving a scalar result
[123 +456] %treated as a vector of two values with the second inherently positive
+456 %treated as a single literal value that is positive
+ 456 %treated as unary plus applied to 456
Many thanks. Unfortunately, I got the error as below:
Operands to the logical AND (&&) and OR (||) operators must be convertible to logical scalar values. Use the ANY or ALL
functions to reduce operands to logical scalar values.
Error in readExcel_example>@(C)ischar(C)&&regexp(C,'$\s*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\s*$','once') (line 10)
mask = cellfun(@(C) ischar(C) && regexp(C, '$\s*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\s*$', 'once'), A);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in readExcel_example (line 10)
mask = cellfun(@(C) ischar(C) && regexp(C, '$\s*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\s*$', 'once'), A);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note: correction of leading $ in pattern to be ^
A = readcell('test6.xlsx');
mask = cellfun(@(C) ischar(C) && ~isempty(regexp(C, '^\s*[+-]?(\d+(\.\d*)?|\.\d+)([deDE][+-]?\d*)?\s*$', 'once')), A)
mask = 9x6 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Note that str2double will give you the same mask. It returns NaN for anything that it can't convert.
A = readcell('test6.xlsx');
mask = ~isnan(str2double(A))
mask = 9x6 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Yes. You specifically wanted a method that did not use str2double() (for reasons that are not clear to us).

Sign in to comment.

More Answers (1)

Can it be as simple as this?
>> str2double(A)
ans =
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN 2 NaN NaN NaN NaN
NaN 2 NaN NaN NaN NaN
NaN 2 NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
map0 = ~isnan(B)
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 0 0 0
0 1 0 0 0 0
0 1 0 0 0 0
0 0 0 0 0 0

Products

Release

R2024b

Asked:

on 11 Mar 2025

Commented:

on 11 Mar 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!