You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
For consistency with "nan", wouldn't it be nice to be able to issue "missing(3)"?
2 views (last 30 days)
Show older comments
<Missing> is the string counterpart to NaN. One can define a (say) 3x3 array of NaN's. Each NaN can be replaced as the data is generated in the analysis process, which is handy if the data isn't generated simultaneously. At no point is there any confusion between NaN versus a valid zero datum.
It would be nice to be able to do this for strings. Of course, one can issue "string(nan(3))" or "repmat(missing,3,3)". But as code gets more intricate, simplicity becomes more valuable.
11 Comments
Bruno Luong
on 26 Apr 2022
Edited: Bruno Luong
on 26 Apr 2022
No please. The object oriented such as string, missing is slow and does add anything useful IMO on serious programming. Let's stay with array of inambiguous IEEE numbers and NaN. Don't mix both worlds please.
FM
on 26 Apr 2022
The two worlds don't have to mix. A huge bill of human effort goes into preparing data for computational intensive analysis. Strings make Matlab more suitable for data wrangling. The necessary conversions can then be done to make the data faster to manipulate for computationally intensive parts of analysis.
Rik
on 26 Apr 2022
I'm personally not a fan of strings. Maybe I don't use them enough.
I simply don't see the benefit over a cellstr (a cell vector of char vectors). There is a lot of syntactic sugar, but nothing fundamental. You can't do {'x'}+1 (while you can do "x"+1), but I don't see how that reinforces your point.
Bruno Luong
on 26 Apr 2022
But when you do "x" + pi, you are not able to control the number of digits, the length the format, so who really wants that?
FM
on 26 Apr 2022
Edited: FM
on 26 Apr 2022
At least for me, syntatic sugar has a very real impact when it comes to quickly and consistently modifying code during exploratory analysis. I'm using exactly the same equivalence checking syntax as for numbers. When iterating over a horizontal vector of strings, I don't need to unpack the string from a cell. Within general purpose utility functions, I don't need to test a variable to see if it needs special handling. The last thing I need is for a function written months or years ago to break because the code didn't account for the special syntax of strings. I then have to rip my mind away from what it was doing and descend into the minutiae of that code, hoping that I don't break anything else by modifying it.
Just to bring the discussion back on course, this question isn't about whether strings should be supported. It is about making the behaviour cleaner and more consistent with numbers.
But in response to questions about x+pi, you would probably instead do disp("The matching records in JoinTable = "+sum(BinaryMaskingVector)). It's more readable than the traditionan fprintf, though nothing prevents you from resorting to the latter when it is more advantageous.
Rik
on 26 Apr 2022
The better readability is in the eye of the beholder.
I don't really see what you mean with a special syntax. If anything, using strings instead of char (or cellstr) is the more special syntax. In almost every case my functions will convert a string to a cellstr, after which I can use code that works on every release I managed to get to start.
I don't mean you should never use strings. What I do mean is that I view strings as the on the same level as scripts: fine for debugging, but any serious analysis should happen in a function.
FM
on 26 Apr 2022
As I said, exceptional handling means having to remove the cell wrapper, which is different from any numeric data. Also equivalence checking. Consistency also means the code looks the same was with numerics so that when you're filtering based on different combinations of conditions, you can quickly scan all your logic and see what you need to change, which is especially important in exploratory analysis, since it is constant change.
Jan
on 26 Apr 2022
@Rik: I think, cell strings would have been fine enough, but it was a mess, that their elements could be CHAR matrices. As long as all functions for string handling must catch the exception of the CHAR matrices, the mess was too expensive. I've written some dozens of functions for "string" (in the sense of character vectors) handling as Matlab and C-Mex code and simply ignore 2D-char arrays.
A benefit of strings (as a class) is the improved memory consumption. In a cell array, each element has an overhead of about 100 bytes. For a set of 50 million gene sequences this matters.
It would be nice to have a string class using 8-bit ASCII only.
But coming back to the question of the OP: missing(2,3, 'string') looks fine to create a [2 x 3] string array containing undefined strings.
Paul
on 26 Apr 2022
I did not realize that missing is a class
m = missing;
whos m
Name Size Bytes Class Attributes
m 1x1 0 missing
class(m)
ans = 'missing'
However, it's not listed as one of the Fundamental Classes. Are there other types of built-in classes besides fundamental classes? Also, that doc page states that there are 16 fundamental classes, but the table contains 18.
FM
on 27 Apr 2022
@Jan: In 2019a, missing(2,3) and missing(2,3,'string') are not recognized. I'm still waiting to upgrade. The command "doc missing" yields a very sparse documentation page....
Jan
on 27 Apr 2022
@FM: In R2022a this syntax is not working also - you can try this here in the forum's interpreter:
missing(2,3,'string')
Error using missing
Too many input arguments.
Too many input arguments.
So this is a useful enhancement request. Please use the link on the bottom of this page to contact the MathWorks team and suggest this improvement.
Answers (1)
Bruno Luong
on 27 Apr 2022
Why not define your own function
mymissing
ans = missing
<missing>
mymissing(3)
ans = 3×3 missing array
<missing> <missing> <missing>
<missing> <missing> <missing>
<missing> <missing> <missing>
mymissing(2,3)
ans = 2×3 missing array
<missing> <missing> <missing>
<missing> <missing> <missing>
mymissing(2,'string')
ans = 2×2 string array
<missing> <missing>
<missing> <missing>
mymissing(2,3,'double')
ans = 2×3
NaN NaN NaN
NaN NaN NaN
function x = mymissing(varargin)
% x = mymissing(size)
% x = mymissing(n1, n2, ...)
% x = mymissing(..., class)
x = missing;
if ~isempty(varargin)
if ischar(varargin{end}) || isstring(varargin{end})
sz = [varargin{1:end-1}];
if isempty(sz)
sz = 1;
end
cls = varargin{end};
x = repmat(feval(cls,x), sz);
else
sz = [varargin{1:end}];
x = repmat(x, sz);
end
end
end
12 Comments
FM
on 27 Apr 2022
Edited: FM
on 27 Apr 2022
Thanks, Bruno.
I know what I'm going to say is obvious, but a simple way would be missing(nan(3)) or repmat(missing,3,2). The functionality wasn't the point so much as consistency. We have a thing called "missing", and having it behave in the same way as NaN (or zeros or ones) would streamline the language.
It wasn't until Jan posted the "missing" constructor that I realized it has become more complicated after 2019a. The doc page for 2019a says that "missing" is the string counterpart to NaN for doubles, but if missing(2,3,'string') works for later versions of Matlab, then it has become a more general generator of indicators of absent data for more than just strings.
My 2019a doesn't have such constructor behaviour, so I can't play with it and scope out what the behaviour is. I thought the constructor returned a string (in the same way that NaN is considered to be a double), but even on my 2019a, class(missing) returns 'missing'. It is its own special class with its own behaviour.
For example, x=repmat(missing,3,3);x(2,2)=pi causes an error converting pi to "missing", but x=ones(3);x(2,2)=missing is fine.
Because of its exceptional behaviour, I'm not sure if it makes sense to allow missing(3) as opposed to string(nan(3)).
Paul
on 27 Apr 2022
There is no missing constructor:
missing(2,3,'string')
Error using missing
Too many input arguments.
Too many input arguments.
FM
on 27 Apr 2022
Oh. OK, so my 2019a isn't so obsolete after all. But given that missing is its own class, and x=repmat(missing,3,3) creates an array that you can't update with non-missing values, it no longer seems to makes sense to make it behave like NaN. At least for x=nan(3), you fill in the missing data using (say) x(2,2)=pi.
If TMW changes the behaviour of missing to be more consistent with NaN, then it might make sense to introduce constructor behaviour that returns an array (like Nan, zeros, and ones). Currently, it seems to be a meta-type that needs to be augmented with a more concrete type, e.g., string(nan(3)) yields an array of strings that show up as "missing". If strings are the ony class for which missing serves to indicate absent data, then there seems to be no reason why it shouldn't be made to behave like NaN.
Paul
on 27 Apr 2022
No disagreement here. WRT to "If TMW changes the behaviour" ... does the behaviour in question mean being able to assign into an array? If so, keep in mind that NaN is not its own class, and that a line like this
x = nan(3,3);
is actually a function call
which nan(3,3)
built-in (/MATLAB/toolbox/matlab/elmat/nan)
So even though the elements of x have nan values, x is still a double.
In addition to the missing constructor
which missing
/MATLAB/toolbox/matlab/datatypes/missing.m % missing constructor
it seems like it may be useful to also have an overloaded built-in that provides the functionality of @Bruno Luong's mymissing(), perhaps even extending it to include a case like
%mymissing(___,var)
that returns the missing that corresponds to the type of var.
Having said that, how should mymissing() work for user-defined classes?
Bruno Luong
on 27 Apr 2022
Edited: Bruno Luong
on 27 Apr 2022
@FM here is what the current doc states about missing. It is not particularly associated with string.
It looks like the designer (of missing) has in his mind the principal user case of
mydata(rows,cols) = missing;
so to him/her, the possibility of creating an array of missing is not needed. I would argue also that the extension to array brings little to our comfort, but I see also no techincal reason to oppose to the extension of creating missing with array dimension.
May be the place where an array of missing (not casted to other native types) is needed is within a tab or a struct. I believe they can be exported/serialized (writetab, writestruct) and then missing can be understood by other languages (C#, ...).
Bruno Luong
on 27 Apr 2022
Edited: Bruno Luong
on 27 Apr 2022
Frankly hypothetic extension for syntax
mymissingarray = repmat(missing,100,100);
mymissingarray(7,2) = 6;
would be coherent with MATLAB way of assigment, but I have hardtime to imagine how it could be useful in reallfe programming.
FM
on 27 Apr 2022
In my opinion, before implementing any changes, there needs to be some concept of the intended use of "missing". It it is only a string counterpart to double's NaN, then why can't it behave similarly, e.g., x=missing(3) or x=missing(3,2), where class(x) is 'string'. You should be able to assign a string to (say) x(2,2).
If the grand plan is for "missing" to have a more generalized purpose, then then it's up to whoever has this vision to make the case for it. Right now, "doc missing" in 2019a says that "missing" is the string counterpart to NaN, but it certainly doesn't behave in an analogous manner. So my humble opinion is the the purpose needs to be clarified, and then the implementation can follow, be it to streamline it into a string counterpart for "string" or something for grand.
Bruno Luong
on 27 Apr 2022
Edited: Bruno Luong
on 27 Apr 2022
"It it is only a string counterpart to double's NaN,"
Not to me. That's why I send you the link of the current doc, missing is not a counter part of NaN for string.
string class has it own missing value (different than missing class) similar to NaN for double.
"Right now, "doc missing" in 2019a ..."
You can safely remove "Right now" in your sentence, we are in 2022. :)
FM
on 27 Apr 2022
@Bruno Luong:
You're right. Missing can be assigned in a vectorized manner (even though I don't see it in the documentation):
>> x=repmat("dog",3)
x = 3×3 string array
"dog" "dog" "dog"
"dog" "dog" "dog"
"dog" "dog" "dog"
>> x(2:3,2:3)=missing
x = 3×3 string array
"dog" "dog" "dog"
"dog" <missing> <missing>
"dog" <missing> <missing>
As I said, my suggestion to create an array of missing was based on the (mis)impression that it was a string counterpart to NaN, which it clearly isn't. In fact, the documentation clearly shows its use in the context of other types/classes. I was mistaken about the fact that it claims "missing" to be a string counterpart to NaN. The only place where it is described as such is on a string page for handling missing values: https://www.mathworks.com/help/matlab/matlab_prog/test-for-empty-strings-and-missing-values.html#TestForEmptyStringsAndMissingValuesExample-3
Bruno Luong
on 27 Apr 2022
Edited: Bruno Luong
on 27 Apr 2022
"(even though I don't see it in the documentation):"
Well it is always possible to assign a (scalar) to the lhs with indexing, provided the rhs type can be cast in the class of the lhs, this is not specific to "missing" class. It must be written somewhere and not the doc of missing, since it is a generic feature.
I guess in 2019 missing is just brand new implemented started with strings, and the doc at that time is somewhat missleading.
FM
on 27 Apr 2022
@Bruno Luong: No, I should have been clearer. the 2019a documentation for "missing" does not describe it a specific to "string". I got that from the page cited in my last reply. Which is a very specific page for strings and missing values.
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)