From 07bcc98a85736834ac1b2c18377a4bf5d719eb42 Mon Sep 17 00:00:00 2001
From: udlbook <110402648+udlbook@users.noreply.github.com>
Date: Thu, 1 Feb 2024 20:19:34 +0000
Subject: [PATCH] Created using Colaboratory

---
 .../5_3_Multiclass_Cross_entropy_Loss.ipynb   | 39 +++++++------------
 1 file changed, 15 insertions(+), 24 deletions(-)
diff --git a/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb b/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb
index 9d550a0..dbab0a1 100644
--- a/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb
+++ b/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb
@@ -1,18 +1,16 @@
 {
   "cells": [
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
       },
       "source": [
         "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "jSlFkICHwHQF"
@@ -142,7 +140,6 @@
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "PsgLZwsPxauP"
@@ -209,13 +206,12 @@
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "MvVX6tl9AEXF"
       },
       "source": [
-        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue)   The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue).  The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
       ]
     },
     {
@@ -226,7 +222,7 @@
       },
       "outputs": [],
       "source": [
-        "# Return probability under Categorical distribution for input x\n",
+        "# Return probability under categorical distribution for observed class y\n",
         "# Just take value from row k of lambda param where y =k,\n",
         "def categorical_distribution(y, lambda_param):\n",
         "    return np.array([lambda_param[row, i] for i, row in enumerate (y)])"
@@ -248,7 +244,6 @@
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "R5z_0dzQMF35"
@@ -286,7 +281,7 @@
       "source": [
         "# Let's test this\n",
         "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
-        "# Use our neural network to predict the mean of the Gaussian\n",
+        "# Use our neural network to predict the parameters of the categorical distribution\n",
         "model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
         "lambda_train = softmax(model_out)\n",
         "# Compute the likelihood\n",
@@ -296,7 +291,6 @@
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "HzphKgPfOvlk"
@@ -318,7 +312,7 @@
       "source": [
         "# Return the negative log likelihood of the data under the model\n",
         "def compute_negative_log_likelihood(y_train, lambda_param):\n",
-        "  # TODO -- compute the likelihood of the data -- don't use the likelihood function above -- compute the negative sum of the log probabilities\n",
+        "  # TODO -- compute the negative log likelihood of the data -- don't use the likelihood function above -- compute the negative sum of the log probabilities\n",
         "  # You will need np.sum(), np.log()\n",
         "  # Replace the line below\n",
         "  nll = 0\n",
@@ -336,24 +330,23 @@
       "source": [
         "# Let's test this\n",
         "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
-        "# Use our neural network to predict the mean of the Gaussian\n",
+        "# Use our neural network to predict the parameters of the categorical distribution\n",
         "model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
         "# Pass the outputs through the softmax function\n",
         "lambda_train = softmax(model_out)\n",
-        "# Compute the log likelihood\n",
+        "# Compute the negative log likelihood\n",
         "nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
         "# Let's double check we get the right answer before proceeding\n",
         "print(\"Correct answer = %9.9f, Your answer = %9.9f\"%(17.015457867,nll))"
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "OgcRojvPWh4V"
       },
       "source": [
-        "Now let's investigate finding the maximum likelihood / minimum log likelihood solution.  For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and log likelihood change as we manipulate the last parameter.  We'll start with overall y_offset, $\\beta_1$ (formerly $\\phi_0$)"
+        "Now let's investigate finding the maximum likelihood / minimum negative log likelihood solution.  For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and negative log likelihood change as we manipulate the last parameter.  We'll start with overall y_offset, $\\beta_1$ (formerly $\\phi_0$)"
       ]
     },
     {
@@ -378,7 +371,7 @@
         "  # Run the network with new parameters\n",
         "  model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
         "  lambda_train = softmax(model_out)\n",
-        "  # Compute and store the three values\n",
+        "  # Compute and store the two values\n",
         "  likelihoods[count] = compute_likelihood(y_train,lambda_train)\n",
         "  nlls[count] = compute_negative_log_likelihood(y_train, lambda_train)\n",
         "  # Draw the model for every 20th parameter setting\n",
@@ -397,7 +390,7 @@
       },
       "outputs": [],
       "source": [
-        "# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the offset beta1\n",
+        "# Now let's plot the likelihood and negative log likelihood as a function of the value of the offset beta1\n",
         "fig, ax = plt.subplots()\n",
         "fig.tight_layout(pad=5.0)\n",
         "likelihood_color = 'tab:red'\n",
@@ -440,7 +433,6 @@
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "771G8N1Vk5A2"
@@ -448,16 +440,15 @@
       "source": [
         "They both give the same answer. But you can see from the likelihood above that the likelihood is very small unless the parameters are almost correct.  So in practice, we would work with the negative log likelihood.<br><br>\n",
         "\n",
-        "Again, to fit the full neural model we would vary all of the 16 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
+        "Again, to fit the full neural model we would vary all of the 16 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\Omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\Omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
         "\n"
       ]
     }
   ],
   "metadata": {
     "colab": {
-      "authorship_tag": "ABX9TyOPv/l+ToaApJV7Nz+8AtpV",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
     },
     "kernelspec": {
       "display_name": "Python 3",
@@ -469,4 +460,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}
\ No newline at end of file