The problem with your code seems to be that you’re not setting the frames per second parameter. If you don’t set it, the model assumes the FPS to be 30(default). Now when you add audio using the step command, its rushed/played quickly, to finish within the time frame of numFrames/FPS.
Here’s my solution. Hope this helps.
[data, freq] = audioread('pathtoAudioFile');
img = imread('pathtoImage');
audioLength = length(data)/freq;
writerObj = vision.VideoFileWriter('newvideo.avi', 'AudioInputPort',true, 'FrameRate', 1);
for i = 1:audioLength
parsedAudio = data((i-1)*length(data)/audioLength + 1:i*length(data)/audioLength);
step(writerObj, img, parsedAudio);