The Forward-Backward algorithm combines the forward step and the backward step to get the probability of being at each state at a specific time. Doing this for all time steps can therefore give us a sequence of individually most likely states at each time (although not guaranteed to be valid sequence, since it considers individual state at each step, and it can happen that the probability p(q_i -> q_j)=0
in the transition model), in other words:
, where
On the other hand, Viterbi algorithm finds the most likely state sequence given an observation sequence, by maximizing a different optimality criterion:
I suggest you refer to this well-known paper for a detailed explanation (see Problem #2):
LAWRENCE R. RABINER, A Tutorial on
Hidden Markov Models and Selected
Applications in Speech Recognition