diff --git a/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb b/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb
index a96c3ae..d1e8f0c 100644
--- a/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb
+++ b/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb
@@ -185,7 +185,7 @@
     {
       "cell_type": "code",
       "source": [
-        "# Return probability under normal distribution for input x\n",
+        "# Return probability under normal distribution for input y\n",
         "def normal_distribution(y, mu, sigma):\n",
         "    # TODO-- write in the equation for the normal distribution\n",
         "    # Equation 5.7 from the notes (you will need np.sqrt() and np.exp(), and math.pi)\n",
@@ -329,7 +329,7 @@
         "mu_pred = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
         "# Set the standard deviation to something reasonable\n",
         "sigma = 0.2\n",
-        "# Compute the log likelihood\n",
+        "# Compute the negative log likelihood\n",
         "nll = compute_negative_log_likelihood(y_train, mu_pred, sigma)\n",
         "# Let's double check we get the right answer before proceeding\n",
         "print(\"Correct answer = %9.9f, Your answer = %9.9f\"%(11.452419564,nll))"
@@ -388,7 +388,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Now let's investigate finding the maximum likelihood / minimum log likelihood / least squares solution.  For simplicity, we'll assume that all the parameters are correct except one and look at how the likelihood, log likelihood, and sum of squares change as we manipulate the last parameter.  We'll start with overall y offset, beta_1 (formerly phi_0)"
+        "Now let's investigate finding the maximum likelihood / minimum negative log likelihood / least squares solution.  For simplicity, we'll assume that all the parameters are correct except one and look at how the likelihood, negative log likelihood, and sum of squares change as we manipulate the last parameter.  We'll start with overall y offset, beta_1 (formerly phi_0)"
       ],
       "metadata": {
         "id": "OgcRojvPWh4V"
@@ -530,7 +530,7 @@
     {
       "cell_type": "code",
       "source": [
-        "# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the standard divation sigma\n",
+        "# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the standard deviation sigma\n",
         "fig, ax = plt.subplots(1,2)\n",
         "fig.set_size_inches(10.5, 5.5)\n",
         "fig.tight_layout(pad=10.0)\n",
@@ -581,7 +581,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Obviously, to fit the full neural model we would vary all of the 10 parameters of the network in $\\boldsymbol\\beta_{0},\\boldsymbol\\omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\omega_{1}$ (and maybe $\\sigma$) until we find the combination that have the maximum likelihood / minimum negative log likelihood / least squares.<br><br>\n",
+        "Obviously, to fit the full neural model we would vary all of the 10 parameters of the network in $\\boldsymbol\\beta_{0},\\boldsymbol\\Omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\Omega_{1}$ (and maybe $\\sigma$) until we find the combination that have the maximum likelihood / minimum negative log likelihood / least squares.<br><br>\n",
         "\n",
         "Here we just varied one at a time as it is easier to see what is going on.  This is known as **coordinate descent**.\n"
       ],