Merge pull request #311 from jalaneunos/main
Fix terminal state check in 19.3 and 19.4, fix typo in 19.4
This commit is contained in:
@@ -437,7 +437,7 @@
|
||||
" new_state = np.random.choice(a=np.arange(0,transition_probabilities_given_action.shape[0]),p = transition_probabilities_given_action[:,state,action])\n",
|
||||
" # Return the reward\n",
|
||||
" reward = reward_structure[new_state]\n",
|
||||
" is_terminal = new_state in [terminal_states]\n",
|
||||
" is_terminal = new_state in terminal_states\n",
|
||||
"\n",
|
||||
" return new_state, reward, action, is_terminal"
|
||||
]
|
||||
|
||||
@@ -265,7 +265,7 @@
|
||||
"\n",
|
||||
"In this icy environment the penguin is at one of the discrete cells in the gridworld. The agent starts each episode on a randomly chosen cell. The environment state dynamics are captured by the transition probabilities $Pr(s_{t+1} |s_t, a_t)$ where $s_t$ is the current state, $a_t$ is the action chosen, and $s_{t+1}$ is the next state at decision stage t. At each decision stage, the penguin can move in one of four directions: $a=0$ means try to go upward, $a=1$, right, $a=2$ down and $a=3$ left.\n",
|
||||
"\n",
|
||||
"However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a differenct action, which is uniformly sampled from the other available actions.\n",
|
||||
"However, the ice is slippery, so we don't always go the direction we want to: every time the agent chooses an action, with 0.25 probability, the environment changes the action taken to a different action, which is uniformly sampled from the other available actions.\n",
|
||||
"\n",
|
||||
"The rewards are deterministic; the penguin will receive a reward of +3 if it reaches the fish, -2 if it slips into a hole and 0 otherwise.\n",
|
||||
"\n",
|
||||
@@ -470,7 +470,7 @@
|
||||
"\n",
|
||||
" # Return the reward -- here the reward is for arriving at the state\n",
|
||||
" reward = reward_structure[new_state]\n",
|
||||
" is_terminal = new_state in [terminal_states]\n",
|
||||
" is_terminal = new_state in terminal_states\n",
|
||||
"\n",
|
||||
" return new_state, reward, action, is_terminal"
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user