hmmtrain
Hidden Markov model parameter estimates from emissions
Syntax
[ESTTR,ESTEMIT] = hmmtrain(seq,TRGUESS,EMITGUESS)
hmmtrain(...,'Algorithm',algorithm
)
hmmtrain(...,'Symbols',SYMBOLS)
hmmtrain(...,'Tolerance',tol)
hmmtrain(...,'Maxiterations',maxiter)
hmmtrain(...,'Verbose',true)
hmmtrain(...,'Pseudoemissions',PSEUDOE)
hmmtrain(...,'Pseudotransitions',PSEUDOTR)
Description
[ESTTR,ESTEMIT] = hmmtrain(seq,TRGUESS,EMITGUESS)
estimates
the transition and emission probabilities for a hidden Markov model
using the Baum-Welch algorithm. seq
can be a row
vector containing a single sequence, a matrix with one row per sequence,
or a cell array with each cell containing a sequence. TRGUESS
and EMITGUESS
are
initial estimates of the transition and emission probability matrices. TRGUESS(i,j)
is
the estimated probability of transition from state i
to
state j
. EMITGUESS(i,k)
is the
estimated probability that symbol k
is emitted
from state i
.
hmmtrain(...,'Algorithm',
specifies
the training algorithm. algorithm
)algorithm
can be
either 'BaumWelch'
or 'Viterbi'
.
The default algorithm is 'BaumWelch'
.
hmmtrain(...,'Symbols',SYMBOLS)
specifies the symbols that are emitted.
SYMBOLS
can be a numeric array, a string array, or a cell array of
the names of the symbols. The default symbols are integers 1
through
N
, where N
is the number of possible
emissions.
hmmtrain(...,'Tolerance',tol)
specifies
the tolerance used for testing convergence of the iterative estimation
process. The default tolerance is 1e-4
.
hmmtrain(...,'Maxiterations',maxiter)
specifies
the maximum number of iterations for the estimation process. The default
maximum is 100
.
hmmtrain(...,'Verbose',true)
returns the
status of the algorithm at each iteration.
hmmtrain(...,'Pseudoemissions',PSEUDOE)
specifies
pseudocount emission values for the Viterbi training algorithm. Use
this argument to avoid zero probability estimates for emissions with
very low probability that might not be represented in the sample sequence. PSEUDOE
should
be a matrix of size m-by-n,
where m is the number of states in the hidden Markov
model and n is the number of possible emissions.
If the i→k emission does
not occur in seq
, you can set PSEUDOE(i,k)
to
be a positive number representing an estimate of the expected number
of such emissions in the sequence seq
.
hmmtrain(...,'Pseudotransitions',PSEUDOTR)
specifies
pseudocount transition values for the Viterbi training algorithm.
Use this argument to avoid zero probability estimates for transitions
with very low probability that might not be represented in the sample
sequence. PSEUDOTR
should be a matrix of size m-by-m,
where m is the number of states in the hidden Markov
model. If the i→j transition
does not occur in states
, you can set PSEUDOTR(i,j)
to
be a positive number representing an estimate of the expected number
of such transitions in the sequence states
.
If you know the states corresponding to the sequences, use hmmestimate
to
estimate the model parameters.
Tolerance
The input argument 'tolerance'
controls how
many steps the hmmtrain
algorithm executes before
the function returns an answer. The algorithm terminates when all
of the following three quantities are less than the value that you
specify for tolerance
:
The log likelihood that the input sequence
seq
is generated by the currently estimated values of the transition and emission matricesThe change in the norm of the transition matrix, normalized by the size of the matrix
The change in the norm of the emission matrix, normalized by the size of the matrix
The default value of 'tolerance'
is 1e-6
.
Increasing the tolerance decreases the number of steps the hmmtrain
algorithm
executes before it terminates.
maxiterations
The maximum number of iterations, 'maxiterations'
,
controls the maximum number of steps the algorithm executes before
it terminates. If the algorithm executes maxiter
iterations
before reaching the specified tolerance, the algorithm terminates
and the function returns a warning. If this occurs, you can increase
the value of 'maxiterations'
to make the algorithm
reach the desired tolerance before terminating.
Examples
trans = [0.95,0.05; 0.10,0.90]; emis = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6; 1/10, 1/10, 1/10, 1/10, 1/10, 1/2]; seq1 = hmmgenerate(100,trans,emis); seq2 = hmmgenerate(200,trans,emis); seqs = {seq1,seq2}; [estTR,estE] = hmmtrain(seqs,trans,emis);
References
[1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998.
Version History
Introduced before R2006a