iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤔

Notes on the Calculation Direction of Expectation DP

に公開

Overview

In explanations of Expected Value DP, I often see calculations being performed starting from the final state of state transitions, which seems to be in the reverse direction of the transitions. I have also heard that "while calculation from the initial state is sometimes possible, the implementation becomes complex."

Originally, I felt there was a slight gap between the update formula for Expected Value DP and the simple definition of expected value, and I wondered, "I'm formulating this based on intuition, but does it really hold true?"

Therefore, I will study the derivation of the update formula for Expected Value DP and the differences depending on the direction of calculation.

Conclusion

  • I was able to transform the simple definition of expected value into the update formula for Expected Value DP.
  • It seems more logical to calculate Expected Value DP from the final state, moving in the opposite direction of state transitions.
  • The reason for this is the difference in the complexity of probability calculation between the transition destination and the transition source.
    • States (the set of states) transitioning from a certain state:
      • The sum of transition probabilities is 1.
    • States (the set of states) transitioning to a certain state:
      • The transition probability itself likely needs to be calculated using Probability DP.
  • Since a probabilistic state transition is "moving from one starting state to one of several possible states," probability calculations tend to be simpler if we fix the source state and group the destination states for each source.

Definitions

First, I will define notation for probability and state transitions based on graphs.

Basics

  • V: Set of states
  • E \subseteq V \times V: Set of directed edges. (u, v) \in E represents the existence of a transition from state u to v.
  • w(u, v) \in \R: Cost incurred when transitioning from u to v. Assume (u, v) \in E.
  • P(u, v) \in [0, 1]: Probability of transitioning from u to v. Assume (u, v) \in E.
  • s \in V: Initial state
  • t \in V: Final state (for simplicity, assumed to be a single state)

Extension to Paths

Extend w and P based on the concept of paths.

  • \rho = v_0 v_1 \ldots v_k: A path. Assume 0 \le \forall i < k, (v_i, v_{i + 1}) \in E holds.
    • Let \rho_i := v_i.
    • Let |\rho| := k.
  • w(\rho) \in \R: Cost incurred when transitioning along path \rho from state \rho_0.
    • w(\rho) = \sum_{i = 0}^{|\rho| - 1} w(\rho_i, \rho_{i + 1})
  • P(\rho) \in [0, 1]: Probability of transitioning along path \rho from state \rho_0.
    • P(\rho) = \Pi_{i = 0}^{|\rho| - 1} P(\rho_i, \rho_{i + 1})
    • Is it necessary to be a Markov chain?

Others

To handle sets of vertices before and after transitions, define the following:

  • \delta^+(v) = \left \{ u \in V \mid (v, u) \in E \right \}: Set of states u that can be reached from v in one step.
  • \delta^-(v) = \left \{ u \in V \mid (u, v) \in E \right \}: Set of states u that can reach v in one step.

Note

In this article, I am considering relatively simple problem settings for Expected Value DP.
The following are points to note:

This covers cases where expected values for source/destination states can be obtained in topological sort order.
Cases with complex loops are outside the scope.

I might add information about this in the future.
(Example: https://atcoder.jp/contests/abc350/tasks/abc350_e)

DP Calculation from the Final State

We consider an approach where the DP calculation is performed from the final state, which is the reverse direction of the state transitions.
(Example: dp[i]: The expected number of times a die is rolled from square i to reach square N. dp[N] = 0)

Definitions

  • \mathcal P(v) \subseteq V^*: The set of all paths from v \in V to the final state t.
    • \mathcal P(v) = \left \{ \rho \in V^* \mid \rho_0 = v \land \rho_{|\rho|} = t \right \}
  • E[v] \in \R: The expected cost to reach the final state t from v \in V.
    • E[v] = \sum_{\rho \in \mathcal P(v) } P(\rho) \cdot w(\rho)

Expected Value Calculation

The expected cost of transitioning from the initial state s to the final state t can be expressed as E[s].
Below, we start from the simple definition of expected value and transform it into the familiar form used in Expected Value DP.

If s = t, then E[s] = E[t] = 0.
If s \ne t, we decompose the path from s to t by focusing on the state u reached in one step from s: \rho = s \rho', \rho'_0 = u.

Now, let's perform the formula transformation.

\begin{aligned} E[s] &= \sum_{\rho \in \mathcal P(s) } P(\rho) \cdot w(\rho) \\ &= \sum_{s \rho' \in \mathcal P(s) } P(s \rho') \cdot w(s \rho') \quad (\because \rho = s \rho') \\ &= \sum_{s \rho' \in \mathcal P(s) } (P(s, \rho'_0) \cdot P(\rho')) \cdot (w(s, \rho'_0) + w(\rho')) \\ &= \sum_{u \in \delta^+(s)} \sum_{\rho' \in \mathcal P(u) } (P(s, u) \cdot P(\rho')) \cdot (w(s, u) + w(\rho')) \quad (u := \rho'_0)\\ &= \sum_{u \in \delta^+(s)} P(s, u) \sum_{\rho' \in \mathcal P(u) } P(\rho') \cdot (w(s, u) + w(\rho')) \\ &= \sum_{u \in \delta^+(s)} P(s, u) \left ( w(s, u) \sum_{\rho' \in \mathcal P(u) } P(\rho') + \sum_{\rho' \in \mathcal P(u) } P(\rho') w(\rho') \right ) \\ &= \sum_{u \in \delta^+(s)} P(s, u) \left ( w(s, u) \sum_{\rho' \in \mathcal P(u) } P(\rho') + E[u] \right ) \quad (\because E[u] = \sum_{\rho' \in \mathcal P(u) } P(\rho') w(\rho'))\\ \end{aligned}

Considering that E[u] is calculated recursively, the expected value for any state u can be expressed as follows.

E[u] = \sum_{v \in \delta^+(u)} P(u, v) \left ( w(u, v) \sum_{\rho' \in \mathcal P(v) } P(\rho') + E[v] \right )

What Always Holds True

\sum_{v \in \delta^+(u)} P(u, v) = 1

The sum of transition probabilities to all states v that can be reached from a fixed state u in one step is 1.


In the following, we will further transform the formula for E[u] by adding conditions to the problem setting.

1. When the final state is always reachable from any state

Cases where a "state from which the final state cannot be reached" is never encountered during transitions.

  • (Is the expected value even defined if this doesn't hold?)

In this case, the sum of probabilities for all paths from v to t is 1:

\sum_{\rho' \in \mathcal P(v) } P(\rho') = 1

Therefore,

E[u] = \sum_{v \in \delta^+(u)} P(u, v) \left ( w(u, v) + E[v] \right )

2. When the cost of any transition is 1

Cases where the cost for a single transition is always 1, such as the "number of times" a die is rolled.
w(u, v) = 1 holds.

Therefore,

\begin{aligned} E[u] &= \sum_{v \in \delta^+(u)} P(u, v) \left ( \sum_{\rho' \in \mathcal P(v) } P(\rho') + E[v] \right ) \end{aligned}

When both 1 and 2 hold

\begin{aligned} E[u] &= \sum_{v \in \delta^+(u)} P(u, v) \left ( 1 + E[u] \right ) \\ &= \sum_{v \in \delta^+(u)} P(u, v) + \sum_{v \in \delta^+(u)} P(u, v) E[v] \\ &= 1 + \sum_{v \in \delta^+(u)} P(u, v) E[v] \quad (\because \sum_{v \in \delta^+(u)} P(u, v) = 1) \end{aligned}

This has become the form commonly seen in typical problems.

DP Calculation from the Initial State

We consider an approach where the DP calculation is performed from the initial state, following the same direction as the state transitions.
(Example: dp[i]: The expected number of times a die is rolled to reach square i from square 0. dp[0] = 0)

Definitions

  • \mathcal P(v) \subseteq V^*: The set of all paths from the initial state s to v \in V.
    • \mathcal P(v) = \left \{ \rho \in V^* \mid \rho_0 = s \land \rho_{|\rho|} = v \right \}
  • E[v] \in \R: The expected cost to reach v \in V from the initial state s.
    • E[v] = \sum_{\rho \in \mathcal P(v) } P(\rho) \cdot w(\rho)

Expected Value Calculation

The expected cost of transitioning from the initial state s to the final state t can be expressed as E[t]. Here too, we transform the formula starting from the simple definition of expected value.

If s = t, then E[s] = E[t] = 0.
If s \ne t, we decompose the path from s to t by focusing on the state u that is one step prior to t: \rho = \rho' t, \rho'_{|\rho'|} = u

Now, let's perform the formula transformation.

\begin{aligned} E[t] \u0026= \sum_{\rho \in \mathcal P(t) } P(\rho) \cdot w(\rho) \\ \u0026= \sum_{\rho' t \in \mathcal P(t) } P(\rho' t) \cdot w(\rho' t) \quad (\because \rho = \rho' t) \\ \u0026= \sum_{\rho' t \in \mathcal P(t) } (P(\rho') \cdot P(\rho'_{|\rho'|}, t)) \cdot (w(\rho') + w(\rho'_{|\rho'|}, t)) \\ \u0026= \sum_{u \in \delta^-(t)} \sum_{\rho' \in \mathcal P(u) } (P(\rho') \cdot P(u, t)) \cdot (w(\rho') + w(u, t)) \quad (u := \rho'_{|\rho'|})\\ \u0026= \sum_{u \in \delta^-(s)} P(u, t) \sum_{\rho' \in \mathcal P(u) } P(\rho') \cdot (w(\rho') + w(u, t)) \\ \u0026= \sum_{u \in \delta^-(s)} P(u, t) \left ( \sum_{\rho' \in \mathcal P(u) } P(\rho') w(\rho') + w(u, t) \sum_{\rho' \in \mathcal P(u) } P(\rho') \right ) \\ \u0026= \sum_{u \in \delta^-(s)} P(u, t) \left ( E[u] + w(u, t) \sum_{\rho' \in \mathcal P(u) } P(\rho') \right ) \quad (\because E[u] = \sum_{\rho' \in \mathcal P(u) } P(\rho') w(\rho'))\\ \end{aligned}

Considering calculating E[u] recursively, the expected value for any state u can be expressed as follows.

E[u] = \sum_{v \in \delta^-(u)} P(v, u) \left ( E[v] + w(v, u) \sum_{\rho' \in \mathcal P(v) } P(\rho') \right )

Differences from "DP Calculation from the Final State"

1. \sum_{v \in \delta^-(u)} P(v, u) is not necessarily 1.

In the case of "DP Calculation from the Final State," a similar term \sum_{v \in \delta^+(u)} P(u, v) appeared. This was the sum of transition probabilities from a fixed u to each possible state it transitions to, which was 1.

On the other hand, \sum_{v \in \delta^-(u)} P(v, u) is the sum of transition probabilities from states that can transition into a fixed u, and it is not necessarily 1 (it could even exceed 1).

2. \sum_{\rho' \in \mathcal P(v) } P(\rho') is not necessarily 1.

This is the sum of probabilities of all paths leading from the initial state s to v. Since it is possible to reach the final state without passing through v, this value can be less than 1.

Reasons Why Implementation Becomes Complex

Unlike calculating in the reverse direction from the final state, there are terms that cannot be inherently simplified, so those terms must be calculated alongside the expected value calculation.

E[u] = \sum_{v \in \delta^-(u)} P(v, u) \left ( E[v] + w(v, u) \sum_{\rho' \in \mathcal P(v) } P(\rho') \right )

Specifically, the term \sum_{\rho' \in \mathcal P(v) } P(\rho') in the above equation is the part that requires additional calculation and management. This is the sum of transition probabilities of paths from the initial state to v, and it will likely need to be calculated sequentially from the initial state as Probability DP.

Discussion