How does the HMM model in hmmlearn identifies the hidden states
Asked Answered
A

1

13

I am new to Hidden Markov Models, and to experiment with it I am studying the scenario of sunny/rainy/foggy weather based on the observation of a person carrying or not an umbrella, with the help of the hmmlearn package in Python. The data used in my tests was obtained from this page (the test and output files of "test 1").

I created the simple code presented bellow to fit an unsupervised HMM from the test data, and then compared the prediction to the expected output. The results seem pretty good (7 out of 10 correct predictions).

My question is: how am I supposed to know the mapping of the hidden states handled by the model to the real states in the problem domain? (in other words, how do I relate the responses to the desired states of my problem domain?)

This might be a very naïve question, but if the model was supervised I would understand that the mapping is given by me when providing the Y values for the fit method... yet I simply can't figure out how it works in this case.

Code:

import numpy as np
from hmmlearn import hmm

# Load the data from a CSV file
data = np.genfromtxt('training-data.csv', skip_header=1, delimiter=',',
                         dtype=str)

# Hot encode the 'yes' and 'no' categories of the observation
# (i.e. seeing or not an umbrella)
x = np.array([[1, 0] if i == 'yes' else [0, 1] for i in data[:, 1]])

# Fit the HMM from the data expecting 3 hidden states (the weather on the day:
# sunny, rainy or foggy)
model = hmm.GaussianHMM(n_components=3, n_iter=100, verbose=True)
model.fit(x, [len(x)])

# Test the model
test = ['no', 'no', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'yes']
x_test = np.array([[1, 0] if i == 'yes' else [0, 1] for i in test])
y_test = ['foggy', 'foggy', 'foggy', 'rainy', 'sunny', 'foggy', 'rainy', 'rainy', 'foggy', 'rainy']

y_pred = model.predict(x_test)

mp = {0: 'sunny', 1: 'rainy', 2: 'foggy'} # THIS IS MY ASSUMPTION

print('\n\n\n')

print('Expected:')
print(y_test)
print('Predicted:')
print([mp[i] for i in y_pred])

Result:

Expected:
['foggy', 'foggy', 'foggy', 'rainy', 'sunny', 'foggy', 'rainy', 'rainy', 'foggy', 'rainy']
Predicted:
['foggy', 'foggy', 'sunny', 'rainy', 'foggy', 'sunny', 'rainy', 'rainy', 'foggy', 'rainy']
Allergen answered 16/1, 2017 at 2:37 Comment(3)
Same question, focusing on part-of-speech tagging: Interpretation of hidden states in HMM in the part-of-speech tagging taskTaveda
Thank you, @FranckDernoncourt. It already helped a lot. :)Allergen
I'm surprised this was migrated to SO. The question is clearly not specific to programming.Ingemar
C
15

My question is: how am I supposed to know the mapping of the hidden states handled by the model to the real states in the problem domain? (in other words, how do I relate the responses to the desired states of my problem domain?)

Basically you cannot. The fact that you were able to hand craft this mapping (or even that it exists in the first place) is just a coincidence coming from extreme simplicity of the problem.

HMM (in such learning scenario) tries to find the most probable sequence of (predefined amount of) hidden states, but like any other unsupervised learning that has no guarantee to match whatever is the task at hand. It simply models the reality the best it can, given the constraints (Markov assumption, number of hidden states, observations provided) - it does not magically detect what is the actual question one is asking (like here - sequence of weathers) but simply tries to solve its own, internal optimization problem - which is the most probable sequence of arbitrarly defined hidden states, such that under the Markov assumption (independence from old history), the observations provided are very likely to appear.

In general you will not be able to interpret these states so easily,here the problem is so simple, that simply with the assumptions listed above - this (weather state) is pretty much the most probable thing that will be modeled. In other problems - it can capture anything that makes sense.

As said before - this is not a HMM property, but any unsupervised learning technique - when you cluster data you just find some data partitioning, which can have some relation to what you are looking for - or have none. Similarly here - HMM will find some model of the dynamics, but it can be completely different from what you are after. If you know what you are looking for - you are supposed to use supervised learning, this is literally its definition. Unsupervised learning is to find some structure (here - dynamics), not a specific one.

Catechol answered 20/1, 2017 at 22:12 Comment(4)
Thank you very much for your answer. It was certainly very helpful. In my work I do have the labels and therefore a supervised learning would be indeed the proper way to go. I just posted this question because of two things: 1) hmmlearn is one of the most popular libraries, but it does not have a supervised implementation. I didn't have enough knowledge on the model to judge if perhaps the state order would be fixed by the library somehow, despite being unsupervised.Allergen
2) The alternative library for supervised learning (seqlearn) only has a Multinomial implementation (which I think I can not use, since my observations are not discrete). Neither library has a very detailed documentation for people that do not deeply know the model. So, I was feeling insecure about how to proceed.Allergen
What you are after is "sequence to sequence" prediction, there are many ways of doing that, and many libraries. SO is not a place to request some best ones though - but hopefully with this keyword and with the answer (thus you know that you need supervised method for that) the search will not be that hard. Take a look at CRF (conditional random fields), or RNNs (recurrent neural nets), as they both are very good models for seq2seq problems.Catechol
Thank you again, @lejlot. :)Allergen

© 2022 - 2024 — McMap. All rights reserved.