Intro to Hidden Markov Models

Paul Butler
DataDrivenInvestor
Published in
4 min readFeb 18, 2021

--

Photo by Antoine Dautry on Unsplash

In the last post, we went over Expectation-Maximization (EM). In the example I gave for EM, we have some prior distributions of 2 coins. The coins were placed in a bag and randomly selected from the bag. While we didn’t know what coin was selected, we wanted still to attempt to find the most likely estimate of the probability of heads. Rather than attempting to find the most likely estimate of the probability of heads, what if we wanted to find the most likely sequence of coins for some sequence of flips. As you might have guessed from the title, we can use a Hidden Markov Model (HMM) to calculate the most likely sequence.

Markov Chains

An HMM is based on a Markov chain. A Markov chain is a 3-tuple composed of Q, A, and π, (Q, A, π). Q is the set of states, A is the set of transition probabilities from state i to state j, and π is the set of start state probabilities. There are a few requirements for this 3-tuple. The sum of transition probabilities from a state, q, must sum to 1. The sum of all initial state probabilities must also sum to 1. The sum of Markov chains deals with the probability sequences of random variables. There is one major assumption that is made in Markov chains. Markov assumptions assume that future predictions only rely on the current state.

Here we outline an example to show a Markov chain. Here the states, Q, are sunny and rainy. The transition probabilities are the arrows from the states. While not defined in the graphic, we’ll just define the start state probabilities to be uniform, so 50% hot and 50% cold. We also see a use case of the Markov assumption where state 3 only depends on state 2 rather than states 1 and 2. The Markov assumption again is defined as:

Markov Assumption

Hidden Markov Models

Now that we know the fundamentals of a Markov Chain, let’s get into defining Hidden Markov Models (HMM). An HMM is a 5-tuple composed of Q, A, π, V, and B. Q, A, and π are the same as in the definition of Markov chains. V is the set of possible observations, and B is a set of observation state probabilities.

Why is it called a “hidden” Markov model? The reason this is called “hidden” Markov models is due to the observations being caused by some “hidden” state or a state that is not directly being observed. In the diagram below, Rain and Dry are observations while Cold and Hot are the hidden states, states we don’t directly observe. Let’s see how this works by extending our Markov chain.

All the text and figures in black remain constant from the definition of the Markov chain. The blue text and figures are the new additions for HMMs. In the diagram, the blue boxes are the observations and the blue arrows are the likelihoods of the observations from the states. We haven’t defined T of O yet as we do not have an observation sequence yet.

So, we know that from the Markov assumption, P(q5|q1…q5) = P(q5|q4), but what about for the observation likelihood? Say we have P(o5|q1…q5,o1…04). We know that the only link to o5 is q5, so we can reduce P(o5|q1…q5,o1…04) to P(o5|q5).

How can we use an HMM?

There are 5 main calculations that we can perform with HMMs.

Filtering/State Estimation: Estimates the probability of the current state given a history of observations up to and including the current state’s observation.

Prediction: Estimates the probability of the future states given a history of current observations.

Smoothing: Estimate the probability of the current state given a history of observations up to the current state and some future observations.

Learning: Learning the transition probabilities.

Inference: Computing hidden states or hidden state sequences. An example of inference is the Viterbi algorithm. We will be going over this algorithm in the next post.

Conclusion

Hopefully, this was a helpful introduction to Hidden Markov Models. The next post will be on the Viterbi algorithm which we will use to solve the coin problem mentioned at the beginning of the post. I know I did not dive into the math as much in this post, if you would like me to show how we get the formula for filtering and prediction, drop a comment below. If need be I can create a new post deriving the formulas. As always, if you liked the post, make sure to smash the clap button, and if you like these styles of posts, make sure to follow me.

--

--