Changing the type of the built-in function output variable

Question

0 votes

Is it possible to force Matlab to change the type of output variable returned from the built-in function (which I cant control) ?

For example if I have x=randi(255,1e7,1,'uint8'); and want to do [ux,~,id]=unique(x); the variable id will be of double type and occupy unnecessarily huge space before I trim it down with id=uint8(id); since I know max(id)<256. I would like to force id to uint8 type such that it does not create a spike in memory requirement during calling Matlab's unique, which might go over RAM and stall execution.

Any ideas if this is possible?

9 Comments
Show 7 older comments Hide 7 older comments

Walter Roberson on 4 Feb 2021

@dymitr ruta

Would there happen to be a prefix size by which you could be sure that you would have encountered all of the values that will be seen in the table? For example if there were four different values, would it always be the case that all four would be encounted by x(1:1000) ? Or is it likely that some of the values will not occur until late in x ?

Paul on 5 Feb 2021

Open in MATLAB Online

After converting to a categorical it looks like the end result does not use doubles for the encoding. Does it use doubles under the hood in an interim step?

> whos
  Name             Size                Bytes  Class          Attributes
  cx        10000000x1              20029990  categorical              
  id1       10000000x1              80000000  double                   
  id2       10000000x1              20000000  uint16                   
  pool             1x256                 512  uint16                   
  ux1            256x1                   256  uint8                    
  ux2            256x1                   256  uint8                    
  x         10000000x1              10000000  uint8           

id1, id2, pool, ux1, ux2 from running the first two parts of Jan's answer.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jan on 4 Feb 2021

Edited: Jan on 4 Feb 2021

Open in MATLAB Online

3 votes

If you cannot edit the built-in function, write your own one:

function YourTest
x = randi([0, 255], 1e7, 1, 'uint8');
% For test without full data:
%   x = randi([10, 253], 1e7, 1, 'uint8');
tic;
[ux1, ~, id1] = unique(x);
toc;
tic;
ux2  = unique(x);
pool = zeros(1, 256, 'uint16');
pool(uint16(ux2) + uint16(1)) = uint16(1):uint16(numel(ux2));
id2  = pool(uint16(x) + uint16(1)).';
toc;
tic;
[ux3, id3] = myUniqueUINT8(x);
toc;
isequal(ux1, ux2, ux3) && isequal(id1, id2, id3)
end
function [ux, id] = myUniqueUINT8(x)
% A fast lookup table:
x16    = (uint16(x) + uint16(1));
m      = false(1, 256);
m(x16) = true;
if sum(m) == 256  % All elements found:
   ux = (uint8(0):uint8(255)).';
   id = x16;
else              % Some elements are missing:
   p  = uint8(0):uint8(255);
   ux = p(m).';
   
   q      = uint16(0):uint16(256);
   q(m)   = uint16(1):uint16(numel(ux));
   id     = q(x16).';
end
end

Timings on Matlab R2018b, Core i5 with 2 cores:

Elapsed time is 1.191019 seconds. UNIQUE
Elapsed time is 0.334867 seconds. UNIQUE with 1 output + get index
Elapsed time is 0.131350 seconds. Lookup table (not all elements found)
Elapsed time is 0.066704 seconds. Lookup table (all elements found)

This means that for specific input data it can be much more efficient to write an adjusted function than trying to let Matlab convert some output arrays.

9 Comments
Show 7 older comments Hide 7 older comments

dymitr ruta on 8 Feb 2021

Open in MATLAB Online

Gents, apologies for lack of sufficient assumptions, but that is because there are none in practise. Overall I am after generic ordinal encoder that gives me sorted uniques ux and ordinally encoded input (id) of the least memory occupant data type (uintx) for any incoming vector variable x without any prior knowledge about it other than whatever you can inspect yourself: i.e. check its size, type/class, test the range etc. You have provided excellent speed-up when the input x is of uint8 type. I can easily extend it into all integer types including double/single in case they actually are integer: all(ceil(x)==x).

For logical inputs this task is not applicable or trivial if enforced for consistency.

The remaining major types of input one could get are floating point doubles/singles or cell arrays of strings and for them I am using simply:

ux=unique(x); [~,id]=ismember(x,ux);

which although quite fast, produces id as a double, which on its own doubles the memory requirement and could stall large double data processing. So all I was asking is a friendly hint if there are any tricks from your extensive experience that you could recommend to speedup the computation time and/or reduce memory requirements for the remaining types of floating point/cellarrays of strings. I will explore the hashing option - many thanks for the suggestion.

Yes I do have the compiler, and mex is always the final speedup opportunity after all algorithmic options are exhausted.

Walter Roberson on 8 Feb 2021

I can easily extend it into all integer types including double/single in case they actually are integer: all(ceil(x)==x).

typecast() the floating point values to uint64 or uint32 and you would then have reduced down to the integer case.

Jan on 8 Feb 2021

@dymitr ruta: You wrote:

You have provided excellent speed-up when the input x is of uint8 type. I can easily extend it into all integer types including double/single in case they actually are integer: all(ceil(x)==x).

Be careful. The shown lookup table works very efficiently for UINT8, because the number of possible elements is tiny. For UINT32 this will be extremely slow and for UINT64 your computer must crash.

A general method to catch different values and classes of inputs is the built-in unique already. It is easy to improve the speed of such a function, when you omit the general applicability. But then this faster method cannot be generalized without considering all special cases. If you do so, you will end up at the original Matlab function.

My function is not exhaustively tested and does not check the inputs. The Matlab typical behavior of considering the orientation of vectors is missing also. As soon as this is implemented, the speed gain will be smaller.

Sign in to comment.

Answer 2

Jan on 4 Feb 2021

Edited: Jan on 4 Feb 2021

Open in MATLAB Online

0 votes

And another version which is 5% faster than my other answer, if not all 256 possible values are found. If all values appear in the input, it can be 10 times faster:

function [ux, idx] = UniqueUINT8(x)
% INPUT:  x:   UINT8 vector
% OUTPUT: ux:  sorted unique values of x
%         idx: indices such that ux(idx)==x
% The values are used as indices in a lookup table. Shift values by 1 to let 0
% become a valid index. Then 255 becomes 256, such that UINT16 is required:
xx = (uint16(x) + uint16(1));
% Create a table as logical vector:
T = false(1, 256);
% Fill the table by values of xx:
% Short version: T(xx) = true;
% It is faster to do this in chunks. This can be stopped if tab is filled with
% all possible 256 elements already:
chunk = 1e5;                % Chuck length
len   = numel(xx);          % Number of elements
ini   = 1;                  % Initial index of chunk
fin   = min(chunk, len);    % final index of chunk
while ini < fin && ~all(T)  % Loop in chunks
   T(xx(ini:fin)) = true;   % Fill table
   
   ini = fin + 1;           % Advance chunk limits:
   fin = min(ini + chunk, len); 
end
ux = (uint8(0):uint8(255)).';
if all(T)                   % Faster version for complete table:
   idx = xx;                % Shifted values are the indices already
else
   ux = ux(T).';            % UINT8(find(T) - 1), the original unique values
   
   if nargout > 1
      LUT    = zeros(1, 256, 'uint16');
      LUT(T) = uint16(1):uint16(numel(ux));  % Look up table of indices
      idx    = LUT(xx).';
   end
end
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Changing the type of the built-in function output variable

9 Comments
Show 7 older comments Hide 7 older comments

Accepted Answer

9 Comments
Show 7 older comments Hide 7 older comments

More Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Changing the type of the built-in function output variable

9 Comments Show 7 older comments Hide 7 older comments

Accepted Answer

9 Comments Show 7 older comments Hide 7 older comments

More Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

9 Comments
Show 7 older comments Hide 7 older comments

9 Comments
Show 7 older comments Hide 7 older comments

0 Comments
Show -2 older comments Hide -2 older comments