Given enough resources, you should probably use the Baum-Welch (forward-backward) algorithm over the Viterbi training algorithm (a.k.a. segmental k-means algorithm), which is an alternative parameter estimation process that sacrifices some of Baum-Welch's generality for computational efficiency. In general, the Baum-Welch algorithm will give parameters that lead to better performance, although there are cases where this is not the case. Here is a nice comparative study.
Furthermore, note that you should use the Baum-Welch algorithm to estimate the parameters of the model. This sets the emission probability and transmission probabilities using something similar to the EM algorithm. After you have trained the HMM, you would then use the Viterbi decoding algorithm to compute the most likely sequence of states which could have generated your observations.
Reference-wise I would recommend Speech and Language Processing, Artificial Intelligence a Modern Approach or this paper