From ac540f1294276a941e6abbcef30847d0ec9d012d Mon Sep 17 00:00:00 2001
From: jalaneunos <xaviergohzr@outlook.com>
Date: Mon, 5 Jan 2026 17:47:40 +0800
Subject: [PATCH] fix: correct terminal state in 19.4, fix typo

---
 Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb b/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb
index 8a98c79..e8f5e0d 100644
--- a/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb
+++ b/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb
@@ -265,7 +265,7 @@
         "\n",
         "In this icy environment the penguin is at one of the discrete cells in the gridworld. The agent starts each episode on a randomly chosen cell. The environment state dynamics are captured by the transition probabilities $Pr(s_{t+1} |s_t, a_t)$ where $s_t$ is the current state, $a_t$ is the action chosen, and $s_{t+1}$ is the next state at decision stage t. At each decision stage, the penguin can move in one of four directions: $a=0$ means try to go upward, $a=1$, right, $a=2$ down and $a=3$ left.\n",
         "\n",
-        "However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a differenct action, which is uniformly sampled from the other available actions.\n",
+        "However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a different action, which is uniformly sampled from the other available actions.\n",
         "\n",
         "The rewards are deterministic; the penguin will receive a reward of +3 if it reaches the fish, -2 if it slips into a hole and 0 otherwise.\n",
         "\n",
@@ -470,7 +470,7 @@
         "\n",
         "  # Return the reward -- here the reward is for arriving at the state\n",
         "  reward = reward_structure[new_state]\n",
-        "  is_terminal = new_state in [terminal_states]\n",
+        "  is_terminal = new_state in terminal_states\n",
         "\n",
         "  return new_state, reward, action, is_terminal"
       ]