Merge branch 'main' into main

2023-12-10 17:11:38 +00:00
parent aea371dc7d aacf54fb8b
commit 16ef8a7333
26 changed files with 87 additions and 65 deletions
@@ -105,7 +105,7 @@
      "cell_type": "code",
      "source": [
        "\n",
-        "# TODO Create a model with the folowing layers\n",
+        "# TODO Create a model with the following layers\n",
        "# 1. Convolutional layer, (input=length 40 and 1 channel, kernel size 3x3, stride 2, padding=\"valid\", 15 output channels ) \n",
        "# 2. ReLU\n",
        "# 3. Convolutional layer, (input=length 19 and 15 channels, kernel size 3x3, stride 2, padding=\"valid\", 15 output channels )\n",
@@ -120,7 +120,7 @@
        "# https://pytorch.org/docs/1.13/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear\n",
        "\n",
        "# Replace the following function which just runs a standard fully connected network\n",
-        "# The flatten at the beginning is becuase we are passing in the data in a slightly different format.\n",
+        "# The flatten at the beginning is because we are passing in the data in a slightly different format.\n",
        "model = nn.Sequential(\n",
        "nn.Flatten(),\n",
        "nn.Linear(40, 100),\n",
@@ -148,7 +148,7 @@
        "# 8. A flattening operation\n",
        "# 9. A fully connected layer mapping from (whatever dimensions we are at-- find out using .shape) to 50 \n",
        "# 10. A ReLU\n",
-        "# 11. A fully connected layer mappiing from 50 to 10 dimensions\n",
+        "# 11. A fully connected layer mapping from 50 to 10 dimensions\n",
        "# 12. A softmax function.\n",
        "\n",
        "# Replace this class which implements a minimal network (which still does okay)\n",
@@ -32,7 +32,7 @@
      "source": [
        "# Gradients II: Backpropagation algorithm\n",
        "\n",
-        "In this practical, we'll investigate the backpropagation algoritithm.  This computes the gradients of the loss with respect to all of the parameters (weights and biases) in the network.  We'll use these gradients when we run stochastic gradient descent."
+        "In this practical, we'll investigate the backpropagation algorithm.  This computes the gradients of the loss with respect to all of the parameters (weights and biases) in the network.  We'll use these gradients when we run stochastic gradient descent."
      ],
      "metadata": {
        "id": "L6chybAVFJW2"
@@ -53,7 +53,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "First let's define a neural network.  We'll just choose the weights and biaes randomly for now"
+        "First let's define a neural network.  We'll just choose the weights and biases randomly for now"
      ],
      "metadata": {
        "id": "nnUoI0m6GyjC"
@@ -178,7 +178,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now let's define a loss function.  We'll just use the least squaures loss function. We'll also write a function to compute dloss_doutpu"
+        "Now let's define a loss function.  We'll just use the least squares loss function. We'll also write a function to compute dloss_doutpu"
      ],
      "metadata": {
        "id": "SxVTKp3IcoBF"
@@ -53,7 +53,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "First let's define a neural network.  We'll just choose the weights and biaes randomly for now"
+        "First let's define a neural network.  We'll just choose the weights and biases randomly for now"
      ],
      "metadata": {
        "id": "nnUoI0m6GyjC"
@@ -204,7 +204,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now let's define a loss function.  We'll just use the least squaures loss function. We'll also write a function to compute dloss_doutput\n"
+        "Now let's define a loss function.  We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n"
      ],
      "metadata": {
        "id": "SxVTKp3IcoBF"
@@ -176,7 +176,7 @@
        "# Color represents y value (brighter = higher value)\n",
        "# Black = -10 or less, White = +10 or more\n",
        "# 0 = mid orange\n",
-        "# Lines are conoturs where value is equal\n",
+        "# Lines are contours where value is equal\n",
        "draw_2D_function(x1,x2,y)\n",
        "\n",
        "# TODO\n",
@@ -215,7 +215,7 @@
        "# Color represents y value (brighter = higher value)\n",
        "# Black = -10 or less, White = +10 or more\n",
        "# 0 = mid orange\n",
-        "# Lines are conoturs where value is equal\n",
+        "# Lines are contours where value is equal\n",
        "draw_2D_function(x1,x2,y)\n",
        "\n",
        "# TODO\n",
@@ -36,7 +36,7 @@
        "\n",
        "We'll compute loss functions for maximum likelihood, minimum negative log likelihood, and least squares and show that they all imply that we should use the same parameter values\n",
        "\n",
-        "In part II, we'll investigate binary classification (where the output data is 0 or 1).  This will be based on the Bernouilli distribution\n",
+        "In part II, we'll investigate binary classification (where the output data is 0 or 1).  This will be based on the Bernoulli distribution\n",
        "\n",
        "In part III we'll investigate multiclass classification (where the output data is 0,1, or, 2).  This will be based on the categorical distribution."
      ],
@@ -178,7 +178,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The blue line i sthe mean prediction of the model and the gray area represents plus/minus two standardard deviations.  This model fits okay, but could be improved. Let's compute the loss.  We'll compute the  the least squares error, the likelihood, the negative log likelihood."
+        "The blue line is the mean prediction of the model and the gray area represents plus/minus two standard deviations.  This model fits okay, but could be improved. Let's compute the loss.  We'll compute the  the least squares error, the likelihood, the negative log likelihood."
      ],
      "metadata": {
        "id": "MvVX6tl9AEXF"
@@ -276,7 +276,7 @@
        "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
        "# Use our neural network to predict the mean of the Gaussian\n",
        "mu_pred = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
-        "# Set the standard devation to something reasonable\n",
+        "# Set the standard deviation to something reasonable\n",
        "sigma = 0.2\n",
        "# Compute the likelihood\n",
        "likelihood = compute_likelihood(y_train, mu_pred, sigma)\n",
@@ -292,7 +292,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
+        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of several probabilities, which are all quite small themselves.\n",
        "This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
        "\n",
        "This is why we use negative log likelihood"
@@ -326,7 +326,7 @@
        "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
        "# Use our neural network to predict the mean of the Gaussian\n",
        "mu_pred = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
-        "# Set the standard devation to something reasonable\n",
+        "# Set the standard deviation to something reasonable\n",
        "sigma = 0.2\n",
        "# Compute the log likelihood\n",
        "nll = compute_negative_log_likelihood(y_train, mu_pred, sigma)\n",
@@ -397,7 +397,7 @@
      "source": [
        "# Define a range of values for the parameter\n",
        "beta_1_vals = np.arange(0,1.0,0.01)\n",
-        "# Create some arrays to store the likelihoods, negative log likehoos and sum of squares\n",
+        "# Create some arrays to store the likelihoods, negative log likelihoods and sum of squares\n",
        "likelihoods = np.zeros_like(beta_1_vals)\n",
        "nlls = np.zeros_like(beta_1_vals)\n",
        "sum_squares = np.zeros_like(beta_1_vals)\n",
@@ -482,7 +482,7 @@
      "source": [
        "# Define a range of values for the parameter\n",
        "sigma_vals = np.arange(0.1,0.5,0.005)\n",
-        "# Create some arrays to store the likelihoods, negative log likehoos and sum of squares\n",
+        "# Create some arrays to store the likelihoods, negative log likelihoods and sum of squares\n",
        "likelihoods = np.zeros_like(sigma_vals)\n",
        "nlls = np.zeros_like(sigma_vals)\n",
        "sum_squares = np.zeros_like(sigma_vals)\n",
@@ -34,7 +34,7 @@
        "\n",
        "This practical investigates loss functions.  In part I we investigated univariate regression (where the output data $y$ is continuous.  Our formulation was based on the normal/Gaussian distribution.\n",
        "\n",
-        "In this notebook, we investigate binary classification (where the output data is 0 or 1).  This will be based on the Bernouilli distribution\n",
+        "In this notebook, we investigate binary classification (where the output data is 0 or 1).  This will be based on the Bernoulli distribution\n",
        "\n",
        "In part III we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K.\n",
        "\n",
@@ -199,7 +199,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probabiilty, that y=1.  The black dots show the training data.  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1.  The black dots show the training data.  We'll compute the the likelihood and the negative log likelihood."
      ],
      "metadata": {
        "id": "MvVX6tl9AEXF"
@@ -210,7 +210,7 @@
      "source": [
        "# Return probability under Bernoulli distribution for input x\n",
        "def bernoulli_distribution(y, lambda_param):\n",
-        "    # TODO-- write in the equation for the Bernoullid distribution \n",
+        "    # TODO-- write in the equation for the Bernoulli distribution \n",
        "    # Equation 5.17 from the notes (you will need np.power)\n",
        "    # Replace the line below\n",
        "    prob = np.zeros_like(y)\n",
@@ -249,7 +249,7 @@
      "source": [
        "# Return the likelihood of all of the data under the model\n",
        "def compute_likelihood(y_train, lambda_param):\n",
-        "  # TODO -- compute the likelihood of the data -- the product of the Bernoullis probabilities for each data point\n",
+        "  # TODO -- compute the likelihood of the data -- the product of the Bernoulli's probabilities for each data point\n",
        "  # Top line of equation 5.3 in the notes\n",
        "  # You will need np.prod() and the bernoulli_distribution function you used above\n",
        "  # Replace the line below\n",
@@ -284,7 +284,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
+        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of several probabilities, which are all quite small themselves.\n",
        "This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
        "\n",
        "This is why we use negative log likelihood"
@@ -317,7 +317,7 @@
        "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
        "# Use our neural network to predict the mean of the Gaussian\n",
        "model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
-        "# Set the standard devation to something reasonable\n",
+        "# Set the standard deviation to something reasonable\n",
        "lambda_train = sigmoid(model_out)\n",
        "# Compute the log likelihood\n",
        "nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
@@ -362,7 +362,7 @@
      "source": [
        "# Define a range of values for the parameter\n",
        "beta_1_vals = np.arange(-2,6.0,0.1)\n",
-        "# Create some arrays to store the likelihoods, negative log likehoods\n",
+        "# Create some arrays to store the likelihoods, negative log likelihoods\n",
        "likelihoods = np.zeros_like(beta_1_vals)\n",
        "nlls = np.zeros_like(beta_1_vals)\n",
        "\n",
@@ -33,7 +33,7 @@
        "# Loss functions part III\n",
        "\n",
        "This practical investigates loss functions.  In part I we investigated univariate regression (where the output data $y$ is continuous.  Our formulation was based on the normal/Gaussian distribution.\n",
-        "In part II we investigated binary classification (where the output data is 0 or 1).  This will be based on the Bernouilli distribution.<br><br>\n",
+        "In part II we investigated binary classification (where the output data is 0 or 1).  This will be based on the Bernoulli distribution.<br><br>\n",
        "\n",
        "Now we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K, which is based on the categorical distribution\n",
        "\n",
@@ -218,7 +218,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probabiilty, that y=0 (red), 1 (green) and 2 (blue)   The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue)   The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
      ],
      "metadata": {
        "id": "MvVX6tl9AEXF"
@@ -228,7 +228,7 @@
      "cell_type": "code",
      "source": [
        "# Return probability under Bernoulli distribution for input x\n",
-        "# Complicated code to commpute it but just take value from row k of lambda param where y =k, \n",
+        "# Complicated code to compute it but just take value from row k of lambda param where y =k, \n",
        "def categorical_distribution(y, lambda_param):\n",
        "    prob = np.zeros_like(y)\n",
        "    for row_index in range(lambda_param.shape[0]):\n",
@@ -305,7 +305,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
+        "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well.  This is because it is the product of several probabilities, which are all quite small themselves.\n",
        "This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
        "\n",
        "This is why we use negative log likelihood"
@@ -338,7 +338,7 @@
        "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
        "# Use our neural network to predict the mean of the Gaussian\n",
        "model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
-        "# Set the standard devation to something reasonable\n",
+        "# Set the standard deviation to something reasonable\n",
        "lambda_train = softmax(model_out)\n",
        "# Compute the log likelihood\n",
        "nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
@@ -365,7 +365,7 @@
      "source": [
        "# Define a range of values for the parameter\n",
        "beta_1_vals = np.arange(-2,6.0,0.1)\n",
-        "# Create some arrays to store the likelihoods, negative log likehoods\n",
+        "# Create some arrays to store the likelihoods, negative log likelihoods\n",
        "likelihoods = np.zeros_like(beta_1_vals)\n",
        "nlls = np.zeros_like(beta_1_vals)\n",
        "\n",
@@ -233,7 +233,7 @@
        "# TODO\n",
        "# 1. Predict what effect changing phi_0 will have on the network.  \n",
        "#   Answer:\n",
-        "# 2. Predict what effect multplying phi_1, phi_2, phi_3 by 0.5 would have.  Check if you are correct\n",
+        "# 2. Predict what effect multiplying phi_1, phi_2, phi_3 by 0.5 would have.  Check if you are correct\n",
        "#   Answer:\n",
        "# 3. Predict what effect multiplying phi_1 by -1 will have.  Check if you are correct.\n",
        "#   Answer:\n",
@@ -500,7 +500,7 @@
        "print(\"Loss = %3.3f\"%(loss))\n",
        "\n",
        "# TODO.  Manipulate the parameters (by hand!) to make the function \n",
-        "# fit the data better and try to reduct the loss to as small a number \n",
+        "# fit the data better and try to reduce the loss to as small a number \n",
        "# as possible.  The best that I could do was 0.181\n",
        "# Tip... start by manipulating phi_0.\n",
        "# It's not that easy, so don't spend too much time on this!"
@@ -108,7 +108,7 @@
      "source": [
        "def line_search(loss_function, thresh=.0001, max_iter = 10, draw_flag = False):\n",
        "\n",
-        "    # Initialize four points along the rnage we are going to search\n",
+        "    # Initialize four points along the range we are going to search\n",
        "    a = 0\n",
        "    b = 0.33\n",
        "    c = 0.66\n",
@@ -139,7 +139,7 @@
        "        # Rule #2 If point b is less than point c then\n",
        "        #                     then point d becomes point c, and\n",
        "        #                     point b becomes 1/3 between a and new d\n",
-        "        #                     point c beocome 2/3 between a and new d \n",
+        "        #                     point c becomes 2/3 between a and new d \n",
        "        # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
        "        if (0):\n",
        "          continue;\n",
@@ -147,7 +147,7 @@
        "        # Rule #3 If point c is less than point b then\n",
        "        #                     then point a becomes point b, and\n",
        "        #                     point b becomes 1/3 between new a and d\n",
-        "        #                     point c beocome 2/3 between new a and d \n",
+        "        #                     point c becomes 2/3 between new a and d \n",
        "        # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
        "        if(0):\n",
        "          continue\n",
@@ -114,7 +114,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Initialize the parmaeters and draw the model\n",
+        "# Initialize the parameters and draw the model\n",
        "phi = np.zeros((2,1))\n",
        "phi[0] = 0.6      # Intercept\n",
        "phi[1] = -0.2      # Slope\n",
@@ -314,7 +314,7 @@
        "  return compute_loss(data[0,:], data[1,:], model, phi_start+ gradient * dist_prop)\n",
        "\n",
        "def line_search(data, model, phi, gradient, thresh=.00001, max_dist = 0.1, max_iter = 15, verbose=False):\n",
-        "    # Initialize four points along the rnage we are going to search\n",
+        "    # Initialize four points along the range we are going to search\n",
        "    a = 0\n",
        "    b = 0.33 * max_dist\n",
        "    c = 0.66 * max_dist\n",
@@ -345,7 +345,7 @@
        "        # Rule #2 If point b is less than point c then\n",
        "        #                     then point d becomes point c, and\n",
        "        #                     point b becomes 1/3 between a and new d\n",
-        "        #                     point c beocome 2/3 between a and new d \n",
+        "        #                     point c becomes 2/3 between a and new d \n",
        "        if lossb < lossc:\n",
        "          d = c\n",
        "          b = a+ (d-a)/3\n",
@@ -355,7 +355,7 @@
        "        # Rule #2 If point c is less than point b then\n",
        "        #                     then point a becomes point b, and\n",
        "        #                     point b becomes 1/3 between new a and d\n",
-        "        #                     point c beocome 2/3 between new a and d \n",
+        "        #                     point c becomes 2/3 between new a and d \n",
        "        a = b\n",
        "        b = a+ (d-a)/3\n",
        "        c = a+ 2*(d-a)/3\n",
@@ -340,7 +340,7 @@
        "  return compute_loss(data[0,:], data[1,:], model, phi_start+ gradient * dist_prop)\n",
        "\n",
        "def line_search(data, model, phi, gradient, thresh=.00001, max_dist = 0.1, max_iter = 15, verbose=False):\n",
-        "    # Initialize four points along the rnage we are going to search\n",
+        "    # Initialize four points along the range we are going to search\n",
        "    a = 0\n",
        "    b = 0.33 * max_dist\n",
        "    c = 0.66 * max_dist\n",
@@ -371,7 +371,7 @@
        "        # Rule #2 If point b is less than point c then\n",
        "        #                     then point d becomes point c, and\n",
        "        #                     point b becomes 1/3 between a and new d\n",
-        "        #                     point c beocome 2/3 between a and new d \n",
+        "        #                     point c becomes 2/3 between a and new d \n",
        "        if lossb < lossc:\n",
        "          d = c\n",
        "          b = a+ (d-a)/3\n",
@@ -381,7 +381,7 @@
        "        # Rule #2 If point c is less than point b then\n",
        "        #                     then point a becomes point b, and\n",
        "        #                     point b becomes 1/3 between new a and d\n",
-        "        #                     point c beocome 2/3 between new a and d \n",
+        "        #                     point c becomes 2/3 between new a and d \n",
        "        a = b\n",
        "        b = a+ (d-a)/3\n",
        "        c = a+ 2*(d-a)/3\n",
@@ -175,7 +175,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# TODO Modify the code below by changeing the number of tokens generated and the initial sentence\n",
+        "# TODO Modify the code below by changing the number of tokens generated and the initial sentence\n",
        "# to get a feel for how well this works.  Since I didn't reset the seed, it will give a different\n",
        "# answer every time that you run it.\n",
        "\n",
@@ -253,7 +253,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# TODO Modify the code below by changeing the number of tokens generated and the initial sentence\n",
+        "# TODO Modify the code below by changing the number of tokens generated and the initial sentence\n",
        "# to get a feel for how well this works.  \n",
        "\n",
        "# TODO Experiment with changing this line:\n",
@@ -471,7 +471,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# This routine reutnrs the k'th most likely next token.\n",
+        "# This routine returns the k'th most likely next token.\n",
        "# If k =0 then it returns the most likely token, if k=1 it returns the next most likely and so on\n",
        "# We will need this for beam search\n",
        "def get_kth_most_likely_token(input_tokens, model, tokenizer, k):\n",
@@ -83,7 +83,7 @@
      "source": [
        "# Plot the 1D linear function\n",
        "\n",
-        "# Define an array of x values from 0 to 10 with increments of 0.1\n",
+        "# Define an array of x values from 0 to 10 with increments of 0.01\n",
        "# https://numpy.org/doc/stable/reference/generated/numpy.arange.html\n",
        "x = np.arange(0.0,10.0, 0.01)\n",
        "# Compute y using the function you filled in above\n",
@@ -171,7 +171,7 @@
        "# Color represents y value (brighter = higher value)\n",
        "# Black = -10 or less, White = +10 or more\n",
        "# 0 = mid orange\n",
-        "# Lines are conoturs where value is equal\n",
+        "# Lines are contours where value is equal\n",
        "draw_2D_function(x1,x2,y)\n",
        "\n",
        "# TODO\n",
@@ -308,7 +308,7 @@
      "source": [
        "# Draw the exponential function\n",
        "\n",
-        "# Define an array of x values from -5 to 5 with increments of 0.1\n",
+        "# Define an array of x values from -5 to 5 with increments of 0.01\n",
        "x = np.arange(-5.0,5.0, 0.01)\n",
        "y = np.exp(x) ;\n",
        "\n",
@@ -354,7 +354,7 @@
      "source": [
        "# Draw the logarithm function\n",
        "\n",
-        "# Define an array of x values from -5 to 5 with increments of 0.1\n",
+        "# Define an array of x values from -5 to 5 with increments of 0.01\n",
        "x = np.arange(0.01,5.0, 0.01)\n",
        "y = np.log(x) ;\n",
        "\n",
@@ -182,7 +182,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now we'll extend this model to have two outputs $y_1$ and $y_2$, each of which can be visualized with a separate heatmap.  You will now have sets of parameters $\\phi_{10}, \\phi_{11},\\phi_{12}$ and $\\phi_{20}, \\phi_{21},\\phi_{22}$ that correspond to each of these outputs."
+    "Now we'll extend this model to have two outputs $y_1$ and $y_2$, each of which can be visualized with a separate heatmap.  You will now have sets of parameters $\\phi_{10}, \\phi_{11}, \\phi_{12}$, $\\phi_{13} and $\\phi_{20}, \\phi_{21}, \\phi_{22}$, \\phi_{23}$ that correspond to each of these outputs."
      ],
      "metadata": {
        "id": "Xl6LcrUyM7Lh"
@@ -79,7 +79,7 @@
      "source": [
        "def number_regions(Di, D):\n",
        "  # TODO -- implement Zaslavsky's formula\n",
-        "  # You can use math.com() https://www.w3schools.com/python/ref_math_comb.asp\n",
+        "  # You can use math.comb() https://www.w3schools.com/python/ref_math_comb.asp\n",
        "  # Replace this code\n",
        "  N = 1;\n",
        "\n",
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPTidpnPhn4O5QF011gt0cz",
+      "authorship_tag": "ABX9TyML7rfAGE4gvmNUEiK5x3PS",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -41,6 +41,17 @@
        "id": "el8l05WQEO46"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "NOTE!!\n",
        "\n",
        "If you have the first edition of the printed book, it mistakenly refers to a convolutional filter with no spaces between the elements (i.e. a normal filter without dilation) as having dilation zero.  Actually, the convention is (weirdly) that this has dilation one.  And when there is one space between the elements, this is dilation two.   This notebook reflects the correct convention and so will be out of sync with the printed book.  If this is confusing, check the [errata](https://github.com/udlbook/udlbook/blob/main/UDL_Errata.pdf) document."
      ],
      "metadata": {
        "id": "ggQrHkFZcUiV"
      }
    },
    {
      "cell_type": "code",
      "source": [
@@ -50,7 +61,7 @@
      "metadata": {
        "id": "nw7k5yCtOzoK"
      },
-      "execution_count": null,
+      "execution_count": 1,
      "outputs": []
    },
    {
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyN1v/yg9PtdSVOWlYJ7bgkz",
+      "authorship_tag": "ABX9TyN1qtywBuyezaVMnc9MI7x2",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -141,6 +141,9 @@
        "# https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html\n",
        "# https://pytorch.org/docs/1.13/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear\n",
        "\n",
        "# NOTE THAT THE CONVOLUTIONAL LAYERS NEED TO TAKE THE NUMBER OF INPUT CHANNELS AS A PARAMETER\n",
        "# AND NOT THE INPUT SIZE.\n",
        "\n",
        "# Replace the following function:\n",
        "model = nn.Sequential(\n",
        "nn.Flatten(),\n",
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMmbD0cKYvIHXbKX4AupA1x",
+      "authorship_tag": "ABX9TyNDaU2KKZDyY9Ea7vm/fNxo",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -114,6 +114,11 @@
        "    # Create output\n",
        "    out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)\n",
        "\n",
        "    # !!!!!! NOTE THERE IS A SUBTLETY HERE !!!!!!!!\n",
        "    # I have padded the image with zeros above, so it is surrouned by a \"ring\" of zeros\n",
        "    # That means that the image indexes are all off by one\n",
        "    # This actually makes your code simpler\n",
        "\n",
        "    for c_y in range(imageHeightOut):\n",
        "      for c_x in range(imageWidthOut):\n",
        "        for c_kernel_y in range(kernelHeight):\n",
@@ -31,7 +31,7 @@
      "source": [
        "# **Notebook 12.1: Self Attention**\n",
        "\n",
-        "This notebook builds a self-attnetion mechanism from scratch, as discussed in section 12.2 of the book.\n",
+        "This notebook builds a self-attention mechanism from scratch, as discussed in section 12.2 of the book.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
@@ -407,7 +407,7 @@
        "    # 1. For each x (value in x_plot_vals):\n",
        "      # 2. Compute the mean and variance of the diffusion kernel at time t\n",
        "      # 3. Compute pdf of this Gaussian at every x_plot_val\n",
-        "      # 4. Weight Gaussian by probability at position x and by 0.01 to compensate for bin size\n",
+        "      # 4. Weight Gaussian by probability at position x and by 0.01 to componensate for bin size\n",
        "      # 5. Accumulate weighted Gaussian in marginal at time t.\n",
        "    # 6. Multiply result by 0.01 to compensate for bin size\n",
        "    # Replace this line:\n",
@@ -31,7 +31,7 @@
      "source": [
        "# **Notebook 21.1: Bias mitigation**\n",
        "\n",
-        "This notebook investigates a post-processing method for bias mitigation (see figure 21.2 in the book). It based on this [blog](https://www.borealisai.com/research-blogs/tutorial1-bias-and-fairness-ai/) that I wrote for Borealis AI in 2019, which itself was derirved from [this blog](https://research.google.com/bigpicture/attacking-discrimination-in-ml/) by Wattenberg, Viégas, and Hardt.\n",
+        "This notebook investigates a post-processing method for bias mitigation (see figure 21.2 in the book). It based on this [blog](https://www.borealisai.com/research-blogs/tutorial1-bias-and-fairness-ai/) that I wrote for Borealis AI in 2019, which itself was derived from [this blog](https://research.google.com/bigpicture/attacking-discrimination-in-ml/) by Wattenberg, Viégas, and Hardt.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
@@ -172,7 +172,7 @@
      "source": [
        "# Blindness to protected attribute\n",
        "\n",
-        "We'll first do the simplest possible thing.  We'll choose the same threshold for both blue and yellow populations so that $\\tau_0$ = $\\tau_1$.  Basically, we'll ingore what we know about the group membership.  Let's see what the ramifications of that."
+        "We'll first do the simplest possible thing.  We'll choose the same threshold for both blue and yellow populations so that $\\tau_0$ = $\\tau_1$.  Basically, we'll ignore what we know about the group membership.  Let's see what the ramifications of that."
      ],
      "metadata": {
        "id": "bE7yPyuWoSUy"
@@ -195,7 +195,7 @@
      "source": [
        "def compute_probability_get_loan(credit_scores, frequencies, threshold):\n",
        "  # TODO - Write this function\n",
-        "  # Return the probability that somemone from this group loan based on the frequencies of each\n",
+        "  # Return the probability that someone from this group loan based on the frequencies of each\n",
        "  # credit score for this group\n",
        "  # Replace this line:\n",
        "  prob = 0.5\n",
@@ -297,7 +297,7 @@
        "\n",
        "This criterion is clearly not great.  The blue and yellow groups get given loans at different rates overall, and (for this threshold), the false alarms and true positives are also different, so it's not even fair when we consider whether the loans really were paid back.  \n",
        "\n",
-        "TODO -- investigate setting a different threshols $\\tau_{0}=\\tau_{1}$.  Is it possible to make the overall rates that loans are given the same?  Is it possible to make the false alarm rates the same?  Is it possible to make the true positive rates the same?"
+        "TODO -- investigate setting a different threshold $\\tau_{0}=\\tau_{1}$.  Is it possible to make the overall rates that loans are given the same?  Is it possible to make the false alarm rates the same?  Is it possible to make the true positive rates the same?"
      ],
      "metadata": {
        "id": "UCObTsa57uuC"
@@ -400,7 +400,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "This model is easilly intepretable.  The k'th coeffeicient tells us the how much (and in which direction) changing the value of the k'th input will change the output.  This is only valid in the vicinity of the input $x$.\n",
+        "This model is easily interpretable.  The k'th coefficient tells us the how much (and in which direction) changing the value of the k'th input will change the output.  This is only valid in the vicinity of the input $x$.\n",
        "\n",
        "Note that a more sophisticated version of LIME would weight the training points according to how close they are to the original data point of interest."
      ],
@@ -11,7 +11,7 @@
    <div>
        <h1 style="margin: 0; font-size: 36px">Understanding Deep Learning</h1>
        by Simon J.D. Prince
-        <br>To be published by MIT Press Dec 5th 2023.<br>
+        <br>Published by MIT Press Dec 5th 2023.<br>
        <ul>
            <li>
                <p style="font-size: larger; margin-bottom: 0">Download draft PDF Chapters 1-21 <a
@@ -19,7 +19,10 @@
                </p>2023-11-24. CC-BY-NC-ND license<br>
                <img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
            </li>
-            <li> Report errata via <a href="https://github.com/udlbook/udlbook/issues">github</a>
+            <li> Order your copy from  <a href="https://mitpress.mit.edu/9780262048644/understanding-deep-learning/">here </a></li>
            <li> Known errata can be found here:   <a
            href="https://github.com/udlbook/udlbook/raw/main/UDL_Errata.pdf">PDF</a></li>
            <li> Report new errata via <a href="https://github.com/udlbook/udlbook/issues">github</a>
                or contact me directly at udlbookmail@gmail.com
            <li> Follow me on <a href="https://twitter.com/SimonPrinceAI">Twitter</a> or <a
                    href="https://www.linkedin.com/in/simon-prince-615bb9165/">LinkedIn</a> for updates.