This is exactly the problem I'm working on for my thesis. Unfortunately, the big HMM training algorithm (Baum-Welch) is recursive and cannot be parallelized (at least not easily). I'm looking into some work that Turin did on parallelizing the BWA here:
Additionally, the Segmental K-Means Algorithm is another way to train a HMM. My understanding is that it is computationally more efficient than BWA, but still suffers the same recursion problem.
Luckily, GPUs will help you if you are training many models or training a model with many states, as these are trivially parallel problems. From my research, you really only get benefits when you are talking about HMMS with thousands of states, as the overhead for GPUs is just too great up until that point.
If you're interested in using GPUs in MATLAB, checkout this free toolbox here:
Shoot me an email or a message if you need additional details, I'm glad to help :)