From ac540f1294276a941e6abbcef30847d0ec9d012d Mon Sep 17 00:00:00 2001 From: jalaneunos Date: Mon, 5 Jan 2026 17:47:40 +0800 Subject: [PATCH] fix: correct terminal state in 19.4, fix typo --- Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb b/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb index 8a98c79..e8f5e0d 100644 --- a/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb +++ b/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb @@ -265,7 +265,7 @@ "\n", "In this icy environment the penguin is at one of the discrete cells in the gridworld. The agent starts each episode on a randomly chosen cell. The environment state dynamics are captured by the transition probabilities $Pr(s_{t+1} |s_t, a_t)$ where $s_t$ is the current state, $a_t$ is the action chosen, and $s_{t+1}$ is the next state at decision stage t. At each decision stage, the penguin can move in one of four directions: $a=0$ means try to go upward, $a=1$, right, $a=2$ down and $a=3$ left.\n", "\n", - "However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a differenct action, which is uniformly sampled from the other available actions.\n", + "However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a different action, which is uniformly sampled from the other available actions.\n", "\n", "The rewards are deterministic; the penguin will receive a reward of +3 if it reaches the fish, -2 if it slips into a hole and 0 otherwise.\n", "\n", @@ -470,7 +470,7 @@ "\n", " # Return the reward -- here the reward is for arriving at the state\n", " reward = reward_structure[new_state]\n", - " is_terminal = new_state in [terminal_states]\n", + " is_terminal = new_state in terminal_states\n", "\n", " return new_state, reward, action, is_terminal" ]