I am using the parallel computing toolbox, and it seems like copying data to a gpu using gpuArray() generally takes much longer than copying data back to the host using gather(). For example, if I try:
A = rand(500,500, 50);
Then the gather() takes about 0.055 seconds while the gpuArray() takes only 0.018 seconds. Is this behavior expected? Am I using the wrong method to time this?