...

Bayes Network and HMM Kuo Liu

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Bayes Network and HMM Kuo Liu
Bayes Network and HMM
Kuo Liu
Local Structures & Independencies
❖
❖
❖
B
Common parent
A
C
➢ Fixing B decouples A and C
■ “given the level of gene B, the levels of A and C are independent”
Cascade
A
B
C
➢ Knowing B decouples A and C
■ “given the level of gene B, the level gene A provides no extra
prediction value for the level of gene C”
A
B
V-structure
➢ Knowing C couples A and B
C
■ Because A can “explain away” B w.r.t C
■ “If A correlate to C, then chance for B to also correlate to C will
decrease”
HMM
❖
The parameters in HMM
➢ N: Number of states
➢ M: Number of observations
➢ A = {aij}: State transition matrix (transition probabilities)
■ aij = P(qt=sj|qt-1=si), 1 <= i, j <= N
■ aij >= 0
■ ∑jaij =1
➢ B = {bj(k)}: Emission matrix (Symbol emission probabilities)
■ bj(k) = P(Ot = vk | qt = sj), 1 <= j <= N; 1 <= k <= M
■ bj(k) >= 0
■ ∑kbj(k) = 1
➢ π = {πi}: initial state probability distribution
■ πi = P(q1 = si), 1 <= i <= N
■ πi >= 0
■ ∑ iπ i = 1
HMM
❖
There are three problems is HMM
➢ The Evaluation problem
■ Given the model and a sequence of observations O = o1, …, oT,
compute the probability for the observation sequence
■ Forward Algorithm
■ Backward Algorithm
➢ The Decoding problem
■ Given the model and a sequence of observations O = o1, …, oT,
compute the most likely sequence in the model that produces this
sequence
■ Viterbi Algorithm
➢ The Learning problem
■ Given the observation sequence O = o1, …, oT, estimate the
parameters {A, B, π} to maximize the probability to get the
observation
■ Forward-Backward Algorithm
Evaluation problem
❖
Forward Algorithm
➢ Given the observation sequence O = O1, …, OT
➢ define μ = {A, B, π}
➢ Define αt(i) = P(O1O2...Ot, qt=si | μ)
➢ Procedures
■ Initials:
● α1(i) = πi bi(O1), 1 <= i <= N
■ Follow-ups:
● αt + 1(j) = ( ∑iαt(i)aij ) bj(Ot + 1)
■ Finally:
● P(O | μ) = ∑iαT(i)
Evaluation problem
❖
Backward Algorithm
➢ Given the observation sequence O = O1, …, OT
➢ define μ = {A, B, π}
➢ Define βt(i) = P(Ot+1Ot+2...OT| qt=si, μ)
➢ Procedures
■ Initials:
● βT(i) = 1, 1 <= i <= N
■ Follow-ups:
● βt(i) = ∑jaij bj(Ot + 1) βt+1(j)
■ Finally:
● P(O | μ) = ∑i πiβ1(i)bi(O1)
Evaluation Problem
❖
More generally, a combination between forward algorithm and backward
algorithm
➢ P(O, qt = si | μ) = αt(i) βt(i)
➢ P(O | μ) = ∑iαt(i) βt(i), 1 <= t <= T
Decoding Problem
❖
Viterbi Algorithm
➢ define μ = {A, B, π}
➢ Define δt(i) = max q1, q2, …, qt-1P(q1, q2, …, qt=si, O1O2...Ot | μ)
➢ Define ψt(i) to memorize the state on this path for t-1
➢ Procedure
■ Initial:
● δ1(i) = πibi(O1) , 1 <= i <= N
● ψ1(i) = 0
■ Follow-ups:
● δt(j) = max1<=i<=N[δt-1(i) * aij] * bj(Ot) , 1 <= t <= T; 1 <= j <= N
● ψt(j) = arg max1<=i<=N[δt-1(i) * aij] * bj(Ot)
■ Finally:
● QT = arg max 1 <= i <= N [δT(i)]
● P(QT) = max 1 <= i <= N [δT(i)]
◆ You can get the path recursively using the formula
◆ qt = ψt + 1(qt + 1) , t = T-1, T-2, …, 1
Learning Problem
❖
Sometimes, when you know the hidden status sequence and the training
dataset is large enough, you can use MLE to estimate the model μ = {A, B,
π}
❖
In fact, you don’t know the corresponding hidden status sequence for a
given observation sequence. In this case, we use EM algorithm to estimate
the model μ = {A, B, π}, this algorithm is called forward-backward
algorithm or Baum-Welch algorithm.
Learning Problem
❖
❖
Forward-backward Algorithm
E-step
➢ Define ξt(i, j) = P(qt=si, qt+1=sj | O, μ)
➢ Define γt(i) = ∑j ξt(i, j), that is the probability to have state si at time t
➢ ξt(i, j) = P(qt=si, qt+1=sj , O | μ) / P(O | μ)
= αt(i) aij bj(Ot+1) βt+1(j) / P(O | μ)
➢
= αt(i) aij bj(Ot+1) βt+1(j) / ∑i∑jαt(i) aij bj(Ot+1) βt+1(j)
γt(i) = ∑jξt(i, j)
Learning Problem
❖
❖
Forward-backward Algorithm
M-step
➢ πi = P(q1=si | O, μ) = γt(i)
➢ aij = ∑tξt(i, j) / ∑tγt(i)
➢ bj(k) = ∑tγt(j) * δ(Ot, vk) / ∑tγt(j)
■ δ(x, y) is Kronecker function, δ(x, y) = 1 if x = y else 0
Learning Problem
❖
❖
Forward-backward Algorithm
Steps
➢ Initial:
■ Randomly get an initial model under the following limitations
● ∑ iπ i = 1
● ∑jaij =1
● ∑kbj(k) = 1
➢ Iterate using the E-step and M-step until converging
Where you can apply HMM
❖ Word Segmentation on Asian language
❖ Named Entity Recognition
Fly UP