How to maximize MATLAB's GPU utility?

5 views (last 30 days)
John Muradeli
John Muradeli on 20 Mar 2019
Edited: John Muradeli on 20 Mar 2019
I've surveyed my GPU's performance against itself and the CPU for varying matrix sizes, and found the opposite of what most GPU literature suggests: the GPU's computing advantage diminishes with array size. Code, results, & specs shown below. Noteworthy observations: . (1) GPU utility remains sub-10%, according to Task Manager (2) ~(50%, 20%) = (RAM, CPU) usage for large (K > 9000) array (3) Considerable speed ratio drop's observed for around K > 8000 (4) Splitting the K > 8000 (= 9000) Xga matrix into four increases vectorized speed two-fold (5) My GPU ranks far higher among GPUs than my CPU (#24 vs. #174); it thus seems an on-par CPU would outperform the GPU for larger arrays (6) Last pic's GPU vs. CPU benchmark supports (5); GPU isn't as vastly superior as expected
What's the culprit - is my code, or MATLAB, or hardware configuration under-utilizing the GPU? How to find out and resolve it? m-files: testrun.zip (testrun compares performance for a single K; testrun0 for multiple)
%% CODE: centroid indexing in K-means algorithm
% size(X) = [16000, 3]
% size(c) = [K, 3]
% Xsg = single(X); csg = single(c);
% Xga = gpuArray(Xsg); cga = gpuArray(csg);
% Speed ratio = t2/t1, if t2 > t1 - else, t1/t2
%% TIMING
f1 = fasterFunction(...); % e.g. vectorized(Xga, cga, K, m)
f2 = slowerFunction(...); % e.g. forVectorized(X, c, m)
t1 = gputimeit(f1) % OR timeit(f1) for non-GPU arrays
t2 = timeit(f2) % OR gputimeit(f2) for GPU arrays
%% FUNCTIONS
function out = vectorized(X, c, K, m)
[~, out] = min(reshape(permute(sum((X-permute(c,[3 2 1])).^2,2), ...
[1 2 3]),m,K),[],2);
end
function out = forVectorized(X, c, m)
out = zeros(m,1);
for j=1:m
[~,out(j)] = min(sum(((X(j,:))'-c').^2));
end
end
function out = forFor(X,c,K,m)
out = zeros(m,1); idxtemp = zeros(K,1);
for i=1:m
for j=1:K
idxtemp(j) = sum((X(i,:)-c(j,:)).^2,2);
end
[~, out(i)] = min(idxtemp);
end
end
%% PLOTS
% GPU vectorized = vectorized(Xga, cga, K, m) for varying K, timed w/ gputimeit
% CPU vectorized = vectorized(Xsg, csg, K, m) for varying K, timed w/ timeit
% for-loop = forFor(Xsg, csg, K, m) for varying K, timed w/ timeit
  5 Comments
John Muradeli
John Muradeli on 20 Mar 2019
@Jan -- Unsure how columns/rows affect CPU computing, but - transposed per your suggestion, and interchanged (i,:) with (i,:) (same w/ j) - results: https://puu.sh/D2Lex/ea9c4d6189.png -- not a significant difference for range of K's tested
John Muradeli
John Muradeli on 20 Mar 2019
Edited: John Muradeli on 20 Mar 2019
@Jan: Very well, I'll clarify below; as for the complete code - there's a tradeoff between conciseness and thoroughness - too much of the latter tends to throw off readers the fastest. This said, would an m-file suffice? The code isn't brief.
"Maximize GPU Utility" - see (1), (2); that is, it seems that majority of GPU resources aren't being utiilzied - and that there may be a way to utilzie them. For example, dividing workload evenly across the entire GPU - rather than have a few take all and most lay idle. I tried one method (see (4)); but strangely, for K <= 8000, computing time increases. Hence, I may be doing it wrong.
@"Doesn't the last diagram tell the opposite?" it's not so much GPU vs CPU as GPU vs GPU: performance slightly decreases after peak (circled) - but not as much as in plots above. I couldn't test for 1e9 per 'Out of Memory'

Sign in to comment.

Answers (0)

Categories

Find more on Mathematics in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!