Clear Filters
Clear Filters

Changing the type of the built-in function output variable

4 views (last 30 days)
Is it possible to force Matlab to change the type of output variable returned from the built-in function (which I cant control) ?
For example if I have x=randi(255,1e7,1,'uint8'); and want to do [ux,~,id]=unique(x); the variable id will be of double type and occupy unnecessarily huge space before I trim it down with id=uint8(id); since I know max(id)<256. I would like to force id to uint8 type such that it does not create a spike in memory requirement during calling Matlab's unique, which might go over RAM and stall execution.
Any ideas if this is possible?
  9 Comments
Walter Roberson
Walter Roberson on 4 Feb 2021
Would there happen to be a prefix size by which you could be sure that you would have encountered all of the values that will be seen in the table? For example if there were four different values, would it always be the case that all four would be encounted by x(1:1000) ? Or is it likely that some of the values will not occur until late in x ?
Paul
Paul on 5 Feb 2021
After converting to a categorical it looks like the end result does not use doubles for the encoding. Does it use doubles under the hood in an interim step?
> whos
Name Size Bytes Class Attributes
cx 10000000x1 20029990 categorical
id1 10000000x1 80000000 double
id2 10000000x1 20000000 uint16
pool 1x256 512 uint16
ux1 256x1 256 uint8
ux2 256x1 256 uint8
x 10000000x1 10000000 uint8
id1, id2, pool, ux1, ux2 from running the first two parts of Jan's answer.

Sign in to comment.

Accepted Answer

Jan
Jan on 4 Feb 2021
Edited: Jan on 4 Feb 2021
If you cannot edit the built-in function, write your own one:
function YourTest
x = randi([0, 255], 1e7, 1, 'uint8');
% For test without full data:
% x = randi([10, 253], 1e7, 1, 'uint8');
tic;
[ux1, ~, id1] = unique(x);
toc;
tic;
ux2 = unique(x);
pool = zeros(1, 256, 'uint16');
pool(uint16(ux2) + uint16(1)) = uint16(1):uint16(numel(ux2));
id2 = pool(uint16(x) + uint16(1)).';
toc;
tic;
[ux3, id3] = myUniqueUINT8(x);
toc;
isequal(ux1, ux2, ux3) && isequal(id1, id2, id3)
end
function [ux, id] = myUniqueUINT8(x)
% A fast lookup table:
x16 = (uint16(x) + uint16(1));
m = false(1, 256);
m(x16) = true;
if sum(m) == 256 % All elements found:
ux = (uint8(0):uint8(255)).';
id = x16;
else % Some elements are missing:
p = uint8(0):uint8(255);
ux = p(m).';
q = uint16(0):uint16(256);
q(m) = uint16(1):uint16(numel(ux));
id = q(x16).';
end
end
Timings on Matlab R2018b, Core i5 with 2 cores:
  • Elapsed time is 1.191019 seconds. UNIQUE
  • Elapsed time is 0.334867 seconds. UNIQUE with 1 output + get index
  • Elapsed time is 0.131350 seconds. Lookup table (not all elements found)
  • Elapsed time is 0.066704 seconds. Lookup table (all elements found)
This means that for specific input data it can be much more efficient to write an adjusted function than trying to let Matlab convert some output arrays.
  9 Comments
Walter Roberson
Walter Roberson on 8 Feb 2021
I can easily extend it into all integer types including double/single in case they actually are integer: all(ceil(x)==x).
typecast() the floating point values to uint64 or uint32 and you would then have reduced down to the integer case.
Jan
Jan on 8 Feb 2021
@dymitr ruta: You wrote:
You have provided excellent speed-up when the input x is of uint8 type. I can easily extend it into all integer types including double/single in case they actually are integer: all(ceil(x)==x).
Be careful. The shown lookup table works very efficiently for UINT8, because the number of possible elements is tiny. For UINT32 this will be extremely slow and for UINT64 your computer must crash.
A general method to catch different values and classes of inputs is the built-in unique already. It is easy to improve the speed of such a function, when you omit the general applicability. But then this faster method cannot be generalized without considering all special cases. If you do so, you will end up at the original Matlab function.
My function is not exhaustively tested and does not check the inputs. The Matlab typical behavior of considering the orientation of vectors is missing also. As soon as this is implemented, the speed gain will be smaller.

Sign in to comment.

More Answers (1)

Jan
Jan on 4 Feb 2021
Edited: Jan on 4 Feb 2021
And another version which is 5% faster than my other answer, if not all 256 possible values are found. If all values appear in the input, it can be 10 times faster:
function [ux, idx] = UniqueUINT8(x)
% INPUT: x: UINT8 vector
% OUTPUT: ux: sorted unique values of x
% idx: indices such that ux(idx)==x
% The values are used as indices in a lookup table. Shift values by 1 to let 0
% become a valid index. Then 255 becomes 256, such that UINT16 is required:
xx = (uint16(x) + uint16(1));
% Create a table as logical vector:
T = false(1, 256);
% Fill the table by values of xx:
% Short version: T(xx) = true;
% It is faster to do this in chunks. This can be stopped if tab is filled with
% all possible 256 elements already:
chunk = 1e5; % Chuck length
len = numel(xx); % Number of elements
ini = 1; % Initial index of chunk
fin = min(chunk, len); % final index of chunk
while ini < fin && ~all(T) % Loop in chunks
T(xx(ini:fin)) = true; % Fill table
ini = fin + 1; % Advance chunk limits:
fin = min(ini + chunk, len);
end
ux = (uint8(0):uint8(255)).';
if all(T) % Faster version for complete table:
idx = xx; % Shifted values are the indices already
else
ux = ux(T).'; % UINT8(find(T) - 1), the original unique values
if nargout > 1
LUT = zeros(1, 256, 'uint16');
LUT(T) = uint16(1):uint16(numel(ux)); % Look up table of indices
idx = LUT(xx).';
end
end
end

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!