E_step

Module Contents

E_step.get_leaf_Normalizing_Factors(leaves_idx: npt.NDArray[np.uintp], MSD: npt.NDArray[np.float64], EL: npt.NDArray[np.float64]) → npt.NDArray[np.float64]

Normalizing factor (NF) matrix and base case at the leaves.

Each element in this N by 1 matrix is the normalizing factor for each beta value calculation for each node. This normalizing factor is essentially the marginal observation distribution for a node.

This function gets the normalizing factor for the upward recursion only for the leaves. We first calculate the joint probability using the definition of conditional probability:

\(P(x_n = x | z_n = k) * P(z_n = k) = P(x_n = x , z_n = k)\), where n are the leaf nodes.

We can then sum this joint probability over k, which are the possible states z_n can be, and through the law of total probability, obtain the marginal observation distribution \(P(x_n = x) = sum_k ( P(x_n = x , z_n = k) ) = P(x_n = x)\).

Parameters
  • EL – The emissions likelihood

  • MSD – The marginal state distribution P(z_n = k)

Returns

normalizing factor. The marginal observation distribution P(x_n = x)

E_step.get_MSD(cell_to_parent: np.ndarray, pi: npt.NDArray[np.float64], T: npt.NDArray[np.float64]) → npt.NDArray[np.float64]

Marginal State Distribution (MSD) matrix by upward recursion. This is the probability that a hidden state variable \(z_n\) is of state k, that is, each value in the N by K MSD array for each lineage is the probability

\(P(z_n = k)\),

for all \(z_n\) in the hidden state tree and for all k in the total number of discrete states. Each MSD array is an N by K array (an entry for each cell and an entry for each state), and each lineage has its own MSD array.

Every element in MSD matrix is essentially sum over all transitions from any state to state j (from parent to daughter):

\(P(z_u = k) = \sum_j(Transition(j -> k) * P(parent_{cell_u}) = j)\)

Parameters
  • pi – Initial probabilities vector

  • T – State transitions matrix

Returns

The marginal state distribution

E_step.np_apply_along_axis(func1d, axis, arr)
E_step.get_beta(leaves_idx: npt.NDArray[np.uintp], cell_to_daughters: npt.NDArray[np.intp], T: npt.NDArray[np.float64], MSD: npt.NDArray[np.float64], EL: npt.NDArray[np.float64], NF: npt.NDArray[np.float64]) → npt.NDArray[np.float64]

Beta matrix and base case at the leaves.

Each element in this N by K matrix is the beta value for each cell and at each state. In particular, this value is derived from the Marginal State Distributions (MSD), the Emission Likelihoods (EL), and the Normalizing Factors (NF). Each beta value for the leaves is exactly the probability

\(beta[n,k] = P(z_n = k | x_n = x)\).

Using Bayes Theorem, we see that the above equals

numerator = \(P(x_n = x | z_n = k) * P(z_n = k)\) denominator = \(P(x_n = x)\) \(beta[n,k] = numerator / denominator\)

For non-leaf cells, the first value in the numerator is the Emission Likelihoods. The second value in the numerator is the Marginal State Distributions. The value in the denominator is the Normalizing Factor.

Traverses upward through each tree and calculates the beta value for each non-leaf cell. The normalizing factors (NFs) are also calculated as an intermediate for determining each beta term. Helper functions are called to determine one of the terms in the NF equation. This term is also used in the calculation of the betas.

Parameters
  • tHMMobj – A class object with properties of the lineages of cells

  • MSD – The marginal state distribution P(z_n = k)

  • EL – The emissions likelihood

  • NF – normalizing factor. The marginal observation distribution P(x_n = x)

Returns

beta values. The conditional probability of states, given observations of the sub-tree rooted in cell_n

E_step.get_gamma(cell_to_daughters: npt.NDArray[np.uintp], T: npt.NDArray[np.float64], MSD: npt.NDArray[np.float64], beta: npt.NDArray[np.float64]) → npt.NDArray[np.float64]

Get the gammas using downward recursion from the root nodes. The conditional probability of states, given observation of the whole tree P(z_n = k | X_bar = x_bar) x_bar is the observations for the whole tree. gamma_1 (k) = P(z_1 = k | X_bar = x_bar) gamma_n (k) = P(z_n = k | X_bar = x_bar)

Parameters
  • MSD – The marginal state distribution P(z_n = k)

  • betas – beta values. The conditional probability of states, given observations of the sub-tree rooted in cell_n