Created using Colaboratory

2023-10-12 18:42:59 +01:00 · 2023-10-12 18:37:05 +01:00 · 2023-10-12 18:07:26 +01:00 · 2023-10-12 17:39:48 +01:00 · 2023-10-12 17:25:40 +01:00 · 2023-10-12 15:15:13 +01:00
22 changed files with 4400 additions and 54 deletions
--- a/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb
+++ b/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb
@@ -67,7 +67,7 @@
      "source": [
        "# Define a linear function with just one input, x\n",
        "def linear_function_1D(x,beta,omega):\n",
-        "  # TODO -- replace the code lin below with formula for 1D linear equation\n",
+        "  # TODO -- replace the code line below with formula for 1D linear equation\n",
        "  y = x\n",
        "\n",
        "  return y"
@@ -332,9 +332,7 @@
        "2. What is $\\mbox{exp}[1]$?\n",
        "3. What is $\\mbox{exp}[-\\infty]$?\n",
        "4. What is $\\mbox{exp}[+\\infty]$?\n",
-        "5. A function is convex if we can draw a straight line between any two points on the\n",
-        "function, and this line always lies above the function. Similarly, a function is concave\n",
-        "if a straight line between any two points always lies below the function.  Is the exponential function convex or concave or neither?\n"
+        "5. A function is convex if we can draw a straight line between any two points on the function, and this line always lies above the function. Similarly, a function is concave if a straight line between any two points always lies below the function.  Is the exponential function convex or concave or neither?\n"
      ]
    },
    {
@@ -343,7 +341,7 @@
        "id": "R6A4e5IxIWCu"
      },
      "source": [
-        "Now let's consider the logarithm function $y=\\log[x]$. Throughout the book we always use natural (base $e$) logarithms. The log funcction maps non-negative numbers $[0,\\infty]$ to real numbers $[-\\infty,\\infty]$.  It is the inverse of the exponential function.  So when we compute $\\log[x]$ we are really asking \"What is the number $y$ so that $e^y=x$?\""
+        "Now let's consider the logarithm function $y=\\log[x]$. Throughout the book we always use natural (base $e$) logarithms. The log function maps non-negative numbers $[0,\\infty]$ to real numbers $[-\\infty,\\infty]$.  It is the inverse of the exponential function.  So when we compute $\\log[x]$ we are really asking \"What is the number $y$ so that $e^y=x$?\""
      ]
    },
    {
@@ -384,15 +382,6 @@
        "6. What is $\\mbox{log}[-1]$?\n",
        "7. Is the logarithm function concave or convex?\n"
      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [],
-      "metadata": {
-        "id": "XG0CKLiPJI7I"
-      },
-      "execution_count": null,
-      "outputs": []
    }
  ],
  "metadata": {
--- a/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb
+++ b/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb
--- a/Notebooks/Chap03/3_2_Shallow_Networks_II.ipynb
+++ b/Notebooks/Chap03/3_2_Shallow_Networks_II.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPFqKOqd6BjlymOawCRkmfn",
+      "authorship_tag": "ABX9TyPD+qTkgmZCe+VessXM/kIU",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -238,7 +238,7 @@
        "def shallow_2_2_3(x1,x2, activation_fn, phi_10,phi_11,phi_12,phi_13, phi_20,phi_21,phi_22,phi_23, theta_10, theta_11,\\\n",
        "                  theta_12, theta_20, theta_21, theta_22, theta_30, theta_31, theta_32):\n",
        "\n",
-        "  # TODO -- write this function -- replace the dummy code blow\n",
+        "  # TODO -- write this function -- replace the dummy code below\n",
        "  pre_1 = np.zeros_like(x1)\n",
        "  pre_2 = np.zeros_like(x1)\n",
        "  pre_3 = np.zeros_like(x1)\n",
--- a/Notebooks/Chap03/3_3_Shallow_Network_Regions.ipynb
+++ b/Notebooks/Chap03/3_3_Shallow_Network_Regions.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMhLSGU8+odPS/CoW5PwKna",
+      "authorship_tag": "ABX9TyMdflMfWi9hu9ZEg/80HCd8",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -62,7 +62,7 @@
      "source": [
        "The number of regions $N$ created by a shallow neural network with $D_i$ inputs and $D$ hidden units is given by Zaslavsky's formula:\n",
        "\n",
-        "\\begin{equation}N = \\sum_{j=1}^{D_{i}}\\binom{D}{j}=\\sum_{j=1}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} <br>\n",
+        "\\begin{equation}N = \\sum_{j=0}^{D_{i}}\\binom{D}{j}=\\sum_{j=0}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} <br>\n",
        "\n"
      ],
      "metadata": {
@@ -115,7 +115,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "This works but there is a complication. If the number of hidden units $D$ is fewer than the number of hidden dimensions $D_i$ , the formula will fail.  When this is the case, there are just $2^D$ regions (see figure 3.10 to understand why).\n",
+        "This works but there is a complication. If the number of hidden units $D$ is fewer than the number of input dimensions $D_i$ , the formula will fail.  When this is the case, there are just $2^D$ regions (see figure 3.10 to understand why).\n",
        "\n",
        "Let's demonstrate this:"
      ],
@@ -142,7 +142,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Let's do the calculation properly when D<Di\n",
+        "# Let's do the calculation properly when D<Di (see figure 3.10 from the book)\n",
        "D = 8; Di = 10\n",
        "N = np.power(2,D)\n",
        "# We can equivalently do this by calling number_regions with the D twice\n",
@@ -210,7 +210,7 @@
      "source": [
        "# Now let's test the code\n",
        "N = number_parameters(10, 8)\n",
-        "print(f\"Di=10, D=8, Number of parameters = {int(N)}, True value = 90\")"
+        "print(f\"Di=10, D=8, Number of parameters = {int(N)}, True value = 97\")"
      ],
      "metadata": {
        "id": "VbhDmZ1gwkQj"
@@ -233,7 +233,7 @@
        "    for c_hidden in range(1, 200):\n",
        "        # Iterate over different ranges of number hidden variables for different input sizes\n",
        "        D = int(c_hidden * 500 / D_i)\n",
-        "        params[c_dim, c_hidden] =  D_i * D +1 + D +1\n",
+        "        params[c_dim, c_hidden] =  D_i * D +D + D +1\n",
        "        regions[c_dim, c_hidden] = number_regions(np.min([D_i,D]), D)\n",
        "\n",
        "fig, ax = plt.subplots()\n",
--- a/Notebooks/Chap03/3_4_Activation_Functions.ipynb
+++ b/Notebooks/Chap03/3_4_Activation_Functions.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOu5BvK3aFb7ZEQKG5vfOZ1",
+      "authorship_tag": "ABX9TyPmra+JD+dm2M3gCqx3bMak",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -185,7 +185,7 @@
        "The ReLU isn't the only kind of activation function.  For a long time, people used sigmoid functions.  A logistic sigmoid function is defined by the equation\n",
        "\n",
        "\\begin{equation}\n",
-        "f[h] = \\frac{1}{1+\\exp{[-10 z ]}}\n",
+        "f[z] = \\frac{1}{1+\\exp{[-10 z ]}}\n",
        "\\end{equation}\n",
        "\n",
        "(Note that the factor of 10 is not standard -- but it allow us to plot on the same axes as the ReLU examples)"
--- a/Notebooks/Chap07/7_1_Backpropagation_in_Toy_Model.ipynb
+++ b/Notebooks/Chap07/7_1_Backpropagation_in_Toy_Model.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOwOpROPBel8eYGzp5DGRkt",
+      "authorship_tag": "ABX9TyP5wHK5E7/el+vxU947K3q8",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -55,7 +55,7 @@
        "This is a composition of the functions $\\cos[\\bullet],\\exp[\\bullet],\\sin[\\bullet]$.   I chose these just because you probably already know the derivatives of these functions:\n",
        "\n",
        "\\begin{eqnarray*}\n",
-        " \\frac{\\partial \\cos[z]}{\\partial z} = -\\sin[z] \\quad\\quad \\frac{\\partial \\exp[z]}{\\partial z} = \\exp[z] \\quad\\quad \\frac{\\partial \\sin[z]}{\\partial z} = -\\cos[z].\n",
+        " \\frac{\\partial \\cos[z]}{\\partial z} = -\\sin[z] \\quad\\quad \\frac{\\partial \\exp[z]}{\\partial z} = \\exp[z] \\quad\\quad \\frac{\\partial \\sin[z]}{\\partial z} = \\cos[z].\n",
        "\\end{eqnarray*}\n",
        "\n",
        "Suppose that we have a least squares loss function:\n",
@@ -107,8 +107,8 @@
        "  return beta3+omega3 * np.cos(beta2 + omega2 * np.exp(beta1 + omega1 * np.sin(beta0 + omega0 * x)))\n",
        "\n",
        "def likelihood(x, y, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3):\n",
-        "  diff = fn(x, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3) - y ;\n",
-        "  return diff * diff ;"
+        "  diff = fn(x, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3) - y\n",
+        "  return diff * diff"
      ]
    },
    {
@@ -123,8 +123,8 @@
    {
      "cell_type": "code",
      "source": [
-        "beta0 = 1.0; beta1 = 2.0; beta2 = -3.0; beta3 = 0.4;\n",
-        "omega0 = 0.1; omega1 = -0.4; omega2 = 2.0; omega3 = 3.0;\n",
+        "beta0 = 1.0; beta1 = 2.0; beta2 = -3.0; beta3 = 0.4\n",
+        "omega0 = 0.1; omega1 = -0.4; omega2 = 2.0; omega3 = 3.0\n",
        "x = 2.3; y =2.0\n",
        "l_i_func = likelihood(x,y,beta0,beta1,beta2,beta3,omega0,omega1,omega2,omega3)\n",
        "print('l_i=%3.3f'%l_i_func)"
--- a/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
+++ b/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
@@ -0,0 +1,392 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyMrF4rB2hTKq7XzLuYsURdL",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 11.1: Shattered gradients**\n",
+        "\n",
+        "This notebook investigates the phenomenon of shattered gradients as discussed in section 11.1.1.  It replicates some of the experiments in [Balduzzi et al. (2017)](https://arxiv.org/abs/1702.08591).\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "pOZ6Djz0dhoy"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "iaFyNGhU21VJ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "First let's define a neural network. We'll initialize both the weights and biases randomly with Glorot initialization (He initialization without the factor of two)"
+      ],
+      "metadata": {
+        "id": "YcNlAxnE3XXn"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# K is width, D is number of hidden units in each layer\n",
+        "def init_params(K, D):\n",
+        "  # Set seed so we always get the same random numbers\n",
+        "  np.random.seed(1)\n",
+        "\n",
+        "  # Input layer\n",
+        "  D_i = 1\n",
+        "  # Output layer\n",
+        "  D_o = 1\n",
+        "\n",
+        "  # Glorot initialization\n",
+        "  sigma_sq_omega = 1.0/D\n",
+        "\n",
+        "  # Make empty lists\n",
+        "  all_weights = [None] * (K+1)\n",
+        "  all_biases = [None] * (K+1)\n",
+        "\n",
+        "  # Create parameters for input and output layers\n",
+        "  all_weights[0] = np.random.normal(size=(D, D_i))*np.sqrt(sigma_sq_omega)\n",
+        "  all_weights[-1] = np.random.normal(size=(D_o, D)) * np.sqrt(sigma_sq_omega)\n",
+        "  all_biases[0] = np.random.normal(size=(D,1))* np.sqrt(sigma_sq_omega)\n",
+        "  all_biases[-1]= np.random.normal(size=(D_o,1))* np.sqrt(sigma_sq_omega)\n",
+        "\n",
+        "  # Create intermediate layers\n",
+        "  for layer in range(1,K):\n",
+        "    all_weights[layer] = np.random.normal(size=(D,D))*np.sqrt(sigma_sq_omega)\n",
+        "    all_biases[layer] = np.random.normal(size=(D,1))* np.sqrt(sigma_sq_omega)\n",
+        "\n",
+        "  return all_weights, all_biases"
+      ],
+      "metadata": {
+        "id": "kr-q7hc23Bn9"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The next two functions define the forward pass of the algorithm"
+      ],
+      "metadata": {
+        "id": "kwcn5z7-dq_1"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the Rectified Linear Unit (ReLU) function\n",
+        "def ReLU(preactivation):\n",
+        "  activation = preactivation.clip(0.0)\n",
+        "  return activation\n",
+        "\n",
+        "def forward_pass(net_input, all_weights, all_biases):\n",
+        "\n",
+        "  # Retrieve number of layers\n",
+        "  K = len(all_weights) -1\n",
+        "\n",
+        "  # We'll store the pre-activations at each layer in a list \"all_f\"\n",
+        "  # and the activations in a second list[all_h].\n",
+        "  all_f = [None] * (K+1)\n",
+        "  all_h = [None] * (K+1)\n",
+        "\n",
+        "  #For convenience, we'll set\n",
+        "  # all_h[0] to be the input, and all_f[K] will be the output\n",
+        "  all_h[0] = net_input\n",
+        "\n",
+        "  # Run through the layers, calculating all_f[0...K-1] and all_h[1...K]\n",
+        "  for layer in range(K):\n",
+        "      # Update preactivations and activations at this layer according to eqn 7.5\n",
+        "      all_f[layer] = all_biases[layer] + np.matmul(all_weights[layer], all_h[layer])\n",
+        "      all_h[layer+1] = ReLU(all_f[layer])\n",
+        "\n",
+        "  # Compute the output from the last hidden layer\n",
+        "  all_f[K] = all_biases[K] + np.matmul(all_weights[K], all_h[K])\n",
+        "\n",
+        "  # Retrieve the output\n",
+        "  net_output = all_f[K]\n",
+        "\n",
+        "  return net_output, all_f, all_h"
+      ],
+      "metadata": {
+        "id": "_2w-Tr7G3sYq"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The next two functions compute the gradient of the output with respect to the input using the back propagation algorithm."
+      ],
+      "metadata": {
+        "id": "aM2l7QafeC8T"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# We'll need the indicator function\n",
+        "def indicator_function(x):\n",
+        "  x_in = np.array(x)\n",
+        "  x_in[x_in>=0] = 1\n",
+        "  x_in[x_in<0] = 0\n",
+        "  return x_in\n",
+        "\n",
+        "# Main backward pass routine\n",
+        "def calc_input_output_gradient(x_in, all_weights, all_biases):\n",
+        "\n",
+        "  # Run the forward pass\n",
+        "  y, all_f, all_h = forward_pass(x_in, all_weights, all_biases)\n",
+        "\n",
+        "  # We'll store the derivatives dl_dweights and dl_dbiases in lists as well\n",
+        "  all_dl_dweights = [None] * (K+1)\n",
+        "  all_dl_dbiases = [None] * (K+1)\n",
+        "  # And we'll store the derivatives of the loss with respect to the activation and preactivations in lists\n",
+        "  all_dl_df = [None] * (K+1)\n",
+        "  all_dl_dh = [None] * (K+1)\n",
+        "  # Again for convenience we'll stick with the convention that all_h[0] is the net input and all_f[k] in the net output\n",
+        "\n",
+        "  # Compute derivatives of net output with respect to loss\n",
+        "  all_dl_df[K] = np.ones_like(all_f[K])\n",
+        "\n",
+        "  # Now work backwards through the network\n",
+        "  for layer in range(K,-1,-1):\n",
+        "    all_dl_dbiases[layer] = np.array(all_dl_df[layer])\n",
+        "    all_dl_dweights[layer] = np.matmul(all_dl_df[layer], all_h[layer].transpose())\n",
+        "\n",
+        "    all_dl_dh[layer] = np.matmul(all_weights[layer].transpose(), all_dl_df[layer])\n",
+        "\n",
+        "    if layer > 0:\n",
+        "      all_dl_df[layer-1] = indicator_function(all_f[layer-1]) * all_dl_dh[layer]\n",
+        "\n",
+        "\n",
+        "  return all_dl_dh[0],y"
+      ],
+      "metadata": {
+        "id": "DwR3eGMgV8bl"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Double check we have the gradient correct using finite differences"
+      ],
+      "metadata": {
+        "id": "Ar_VmraReSWe"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "D = 200; K = 3\n",
+        "# Initialize parameters\n",
+        "all_weights, all_biases = init_params(K,D)\n",
+        "\n",
+        "x = np.ones((1,1))\n",
+        "dydx,y = calc_input_output_gradient(x, all_weights, all_biases)\n",
+        "\n",
+        "# Offset for finite gradients\n",
+        "delta = 0.00000001\n",
+        "x1 = x\n",
+        "y1,*_ = forward_pass(x1, all_weights, all_biases)\n",
+        "x2 = x+delta\n",
+        "y2,*_ = forward_pass(x2, all_weights, all_biases)\n",
+        "# Finite difference calculation\n",
+        "dydx_fd = (y2-y1)/delta\n",
+        "\n",
+        "print(\"Gradient calculation=%f, Finite difference gradient=%f\"%(dydx,dydx_fd))\n"
+      ],
+      "metadata": {
+        "id": "KJpQPVd36Haq"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Helper function that computes the derivatives for a 1D array of input values and plots them."
+      ],
+      "metadata": {
+        "id": "YC-LAYRKtbxp"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def plot_derivatives(K, D):\n",
+        "\n",
+        "  # Initialize parameters\n",
+        "  all_weights, all_biases = init_params(K,D)\n",
+        "\n",
+        "  x_in = np.arange(-2,2, 4.0/256.0)\n",
+        "  x_in = np.resize(x_in, (1,len(x_in)))\n",
+        "  dydx,y = calc_input_output_gradient(x_in, all_weights, all_biases)\n",
+        "\n",
+        "  fig,ax = plt.subplots()\n",
+        "  ax.plot(np.squeeze(x_in), np.squeeze(dydx), 'b-')\n",
+        "  ax.set_xlim(-2,2)\n",
+        "  ax.set_xlabel('Input, $x$')\n",
+        "  ax.set_ylabel('Gradient, $dy/dx$')\n",
+        "  ax.set_title('No layers = %d'%(K))\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "uJr5eDe648jF"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Build a model with one hidden layer and 200 neurons and plot derivatives\n",
+        "D = 200; K = 1\n",
+        "plot_derivatives(K,D)\n",
+        "\n",
+        "# TODO -- Interpret this result\n",
+        "# Why does the plot have some flat regions?\n",
+        "\n",
+        "# TODO -- Add code to plot the derivatives for models with 24 and 50 hidden layers\n",
+        "# with 200 neurons per layer\n",
+        "\n",
+        "# TODO -- Why does this graph not have visible flat regions?\n",
+        "\n",
+        "# TODO -- Why does the magnitude of the gradients decrease as we increase the number\n",
+        "# of hidden layers\n",
+        "\n",
+        "# TODO -- Do you find this a convincing replication of the experiment in the original paper? (I don't)\n",
+        "# Can you help me find why I have failed to replicate this result?  udlbookmail@gmail.com"
+      ],
+      "metadata": {
+        "id": "56gTMTCb49KO"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's look at the autocorrelation function now"
+      ],
+      "metadata": {
+        "id": "f_0zjQbxuROQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def autocorr(dydx):\n",
+        "    # TODO -- compute the autocorrelation function\n",
+        "    # Use the numpy function \"correlate\" with the mode set to \"same\"\n",
+        "    # Replace this line:\n",
+        "    ac = np.ones((256,1))\n",
+        "\n",
+        "    return ac"
+      ],
+      "metadata": {
+        "id": "ggnO8hfoRN1e"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Helper function to plot the autocorrelation function and normalize so correlation is one with offset of zero"
+      ],
+      "metadata": {
+        "id": "EctWSV1RuddK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def plot_autocorr(K, D):\n",
+        "\n",
+        "  # Initialize parameters\n",
+        "  all_weights, all_biases = init_params(K,D)\n",
+        "\n",
+        "  x_in = np.arange(-2.0,2.0, 4.0/256)\n",
+        "  x_in = np.resize(x_in, (1,len(x_in)))\n",
+        "  dydx,y = calc_input_output_gradient(x_in, all_weights, all_biases)\n",
+        "  ac = autocorr(np.squeeze(dydx))\n",
+        "  ac = ac / ac[128]\n",
+        "\n",
+        "  y = ac[128:]\n",
+        "  x = np.squeeze(x_in)[128:]\n",
+        "  fig,ax = plt.subplots()\n",
+        "  ax.plot(x,y, 'b-')\n",
+        "  ax.set_xlim([0,2])\n",
+        "  ax.set_xlabel('Distance')\n",
+        "  ax.set_ylabel('Autocorrelation')\n",
+        "  ax.set_title('No layers = %d'%(K))\n",
+        "  plt.show()\n"
+      ],
+      "metadata": {
+        "id": "2LKlZ9u_WQXN"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Plot the autocorrelation functions\n",
+        "D = 200; K =1\n",
+        "plot_autocorr(K,D)\n",
+        "D = 200; K =50\n",
+        "plot_autocorr(K,D)\n",
+        "\n",
+        "# TODO -- Do you find this a convincing replication of the experiment in the original paper? (I don't)\n",
+        "# Can you help me find why I have failed to replicate this result?"
+      ],
+      "metadata": {
+        "id": "RD9JTdjNWw6p"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/Notebooks/Chap11/11_2_Residual_Networks.ipynb
+++ b/Notebooks/Chap11/11_2_Residual_Networks.ipynb
@@ -0,0 +1,277 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyObut1y9atNUuowPT6dMY+I",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_2_Residual_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 11.2: Residual Networks**\n",
+        "\n",
+        "This notebook adapts the networks for MNIST1D to use residual connections.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
+        "!git clone https://github.com/greydanus/mnist1d"
+      ],
+      "metadata": {
+        "id": "D5yLObtZCi9J"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import os\n",
+        "import torch, torch.nn as nn\n",
+        "from torch.utils.data import TensorDataset, DataLoader\n",
+        "from torch.optim.lr_scheduler import StepLR\n",
+        "import matplotlib.pyplot as plt\n",
+        "import mnist1d\n",
+        "import random"
+      ],
+      "metadata": {
+        "id": "YrXWAH7sUWvU"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "args = mnist1d.data.get_dataset_args()\n",
+        "data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
+        "\n",
+        "# The training and test input and outputs are in\n",
+        "# data['x'], data['y'], data['x_test'], and data['y_test']\n",
+        "print(\"Examples in training set: {}\".format(len(data['y'])))\n",
+        "print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
+        "print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
+      ],
+      "metadata": {
+        "id": "twI72ZCrCt5z"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load in the data\n",
+        "train_data_x = data['x'].transpose()\n",
+        "train_data_y = data['y']\n",
+        "val_data_x = data['x_test'].transpose()\n",
+        "val_data_y = data['y_test']\n",
+        "# Print out sizes\n",
+        "print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
+        "print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
+      ],
+      "metadata": {
+        "id": "8bKADvLHbiV5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Define the network"
+      ],
+      "metadata": {
+        "id": "_sFvRDGrl4qe"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# There are 40 input dimensions and 10 output dimensions for this data\n",
+        "# The inputs correspond to the 40 offsets in the MNIST1D template.\n",
+        "D_i = 40\n",
+        "# The outputs correspond to the 10 digits\n",
+        "D_o = 10\n",
+        "\n",
+        "\n",
+        "# We will adapt this model to have residual connections around the linear layers\n",
+        "# This is the same model we used in practical 8.1, but we can't use the sequential\n",
+        "# class for residual networks (which aren't strictly sequential).  Hence, I've rewritten\n",
+        "# it as a model that inherits from a base class\n",
+        "\n",
+        "class ResidualNetwork(torch.nn.Module):\n",
+        "  def __init__(self, input_size, output_size, hidden_size=100):\n",
+        "    super(ResidualNetwork, self).__init__()\n",
+        "    self.linear1 = nn.Linear(input_size, hidden_size)\n",
+        "    self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear4 = nn.Linear(hidden_size, output_size)\n",
+        "    print(\"Initialized MLPBase model with {} parameters\".format(self.count_params()))\n",
+        "\n",
+        "  def count_params(self):\n",
+        "    return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
+        "\n",
+        "# # TODO -- Add residual connections to this model\n",
+        "# # The order of operations should similar to figure 11.5b\n",
+        "# # linear1 first, ReLU+linear2 in first residual block, ReLU+linear3 in second residual block), linear4 at end\n",
+        "# # Replace this function\n",
+        "  def forward(self, x):\n",
+        "    h1 = self.linear1(x).relu()\n",
+        "    h2 = self.linear2(h1).relu()\n",
+        "    h3 = self.linear3(h2).relu()\n",
+        "    return self.linear4(h3)\n"
+      ],
+      "metadata": {
+        "id": "FslroPJJffrh"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# He initialization of weights\n",
+        "def weights_init(layer_in):\n",
+        "  if isinstance(layer_in, nn.Linear):\n",
+        "    nn.init.kaiming_uniform_(layer_in.weight)\n",
+        "    layer_in.bias.data.fill_(0.0)"
+      ],
+      "metadata": {
+        "id": "YgLaex1pfhqz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#Define the model\n",
+        "model = ResidualNetwork(40, 10)\n",
+        "\n",
+        "# choose cross entropy loss function (equation 5.24 in the loss notes)\n",
+        "loss_function = nn.CrossEntropyLoss()\n",
+        "# construct SGD optimizer and initialize learning rate and momentum\n",
+        "optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
+        "# object that decreases learning rate by half every 20 epochs\n",
+        "scheduler = StepLR(optimizer, step_size=20, gamma=0.5)\n",
+        "# convert data to torch tensors\n",
+        "x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
+        "y_train = torch.tensor(train_data_y.astype('long'))\n",
+        "x_val= torch.tensor(val_data_x.transpose().astype('float32'))\n",
+        "y_val = torch.tensor(val_data_y.astype('long'))\n",
+        "\n",
+        "# load the data into a class that creates the batches\n",
+        "data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
+        "\n",
+        "# Initialize model weights\n",
+        "model.apply(weights_init)\n",
+        "\n",
+        "# loop over the dataset n_epoch times\n",
+        "n_epoch = 100\n",
+        "# store the loss and the % correct at each epoch\n",
+        "losses_train = np.zeros((n_epoch))\n",
+        "errors_train = np.zeros((n_epoch))\n",
+        "losses_val = np.zeros((n_epoch))\n",
+        "errors_val = np.zeros((n_epoch))\n",
+        "\n",
+        "for epoch in range(n_epoch):\n",
+        "  # loop over batches\n",
+        "  for i, data in enumerate(data_loader):\n",
+        "    # retrieve inputs and labels for this batch\n",
+        "    x_batch, y_batch = data\n",
+        "    # zero the parameter gradients\n",
+        "    optimizer.zero_grad()\n",
+        "    # forward pass -- calculate model output\n",
+        "    pred = model(x_batch)\n",
+        "    # compute the loss\n",
+        "    loss = loss_function(pred, y_batch)\n",
+        "    # backward pass\n",
+        "    loss.backward()\n",
+        "    # SGD update\n",
+        "    optimizer.step()\n",
+        "\n",
+        "  # Run whole dataset to get statistics -- normally wouldn't do this\n",
+        "  pred_train = model(x_train)\n",
+        "  pred_val = model(x_val)\n",
+        "  _, predicted_train_class = torch.max(pred_train.data, 1)\n",
+        "  _, predicted_val_class = torch.max(pred_val.data, 1)\n",
+        "  errors_train[epoch] = 100 - 100 * (predicted_train_class == y_train).float().sum() / len(y_train)\n",
+        "  errors_val[epoch]= 100 - 100 * (predicted_val_class == y_val).float().sum() / len(y_val)\n",
+        "  losses_train[epoch] = loss_function(pred_train, y_train).item()\n",
+        "  losses_val[epoch]= loss_function(pred_val, y_val).item()\n",
+        "  print(f'Epoch {epoch:5d}, train loss {losses_train[epoch]:.6f}, train error {errors_train[epoch]:3.2f},  val loss {losses_val[epoch]:.6f}, percent error {errors_val[epoch]:3.2f}')\n",
+        "\n",
+        "  # tell scheduler to consider updating learning rate\n",
+        "  scheduler.step()"
+      ],
+      "metadata": {
+        "id": "NYw8I_3mmX5c"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Plot the results\n",
+        "fig, ax = plt.subplots()\n",
+        "ax.plot(errors_train,'r-',label='train')\n",
+        "ax.plot(errors_val,'b-',label='test')\n",
+        "ax.set_ylim(0,100); ax.set_xlim(0,n_epoch)\n",
+        "ax.set_xlabel('Epoch'); ax.set_ylabel('Error')\n",
+        "ax.set_title('TrainError %3.2f, Val Error %3.2f'%(errors_train[-1],errors_val[-1]))\n",
+        "ax.legend()\n",
+        "plt.show()"
+      ],
+      "metadata": {
+        "id": "CcP_VyEmE2sv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The primary motivation of residual networks is to allow training of much deeper networks.   \n",
+        "\n",
+        "TODO: Try running this network with and without the residual connections.  Does adding the residual connections change the performance?"
+      ],
+      "metadata": {
+        "id": "wMmqhmxuAx0M"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap11/11_3_Batch_Normalization.ipynb
+++ b/Notebooks/Chap11/11_3_Batch_Normalization.ipynb
@@ -0,0 +1,328 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyOoGS+lY+EhGthebSO4smpj",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_3_Batch_Normalization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 11.3: Batch normalization**\n",
+        "\n",
+        "This notebook investigates the use of batch normalization in residual networks.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
+        "!git clone https://github.com/greydanus/mnist1d"
+      ],
+      "metadata": {
+        "id": "D5yLObtZCi9J"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import os\n",
+        "import torch, torch.nn as nn\n",
+        "from torch.utils.data import TensorDataset, DataLoader\n",
+        "from torch.optim.lr_scheduler import StepLR\n",
+        "import matplotlib.pyplot as plt\n",
+        "import mnist1d\n",
+        "import random"
+      ],
+      "metadata": {
+        "id": "YrXWAH7sUWvU"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "args = mnist1d.data.get_dataset_args()\n",
+        "data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
+        "\n",
+        "# The training and test input and outputs are in\n",
+        "# data['x'], data['y'], data['x_test'], and data['y_test']\n",
+        "print(\"Examples in training set: {}\".format(len(data['y'])))\n",
+        "print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
+        "print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
+      ],
+      "metadata": {
+        "id": "twI72ZCrCt5z"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load in the data\n",
+        "train_data_x = data['x'].transpose()\n",
+        "train_data_y = data['y']\n",
+        "val_data_x = data['x_test'].transpose()\n",
+        "val_data_y = data['y_test']\n",
+        "# Print out sizes\n",
+        "print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
+        "print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
+      ],
+      "metadata": {
+        "id": "8bKADvLHbiV5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def print_variance(name, data):\n",
+        "  # First dimension(rows) is batch elements\n",
+        "  # Second dimension(columns) is neurons.\n",
+        "  np_data = data.detach().numpy()\n",
+        "  # Compute variance across neurons and average these variances over members of the batch\n",
+        "  neuron_variance = np.mean(np.var(np_data, axis=0))\n",
+        "  # Print out the name and the variance\n",
+        "  print(\"%s variance=%f\"%(name,neuron_variance))"
+      ],
+      "metadata": {
+        "id": "3bBpJIV-N-lt"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# He initialization of weights\n",
+        "def weights_init(layer_in):\n",
+        "  if isinstance(layer_in, nn.Linear):\n",
+        "    nn.init.kaiming_uniform_(layer_in.weight)\n",
+        "    layer_in.bias.data.fill_(0.0)"
+      ],
+      "metadata": {
+        "id": "YgLaex1pfhqz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def run_one_step_of_model(model, x_train, y_train):\n",
+        "  # choose cross entropy loss function (equation 5.24 in the loss notes)\n",
+        "  loss_function = nn.CrossEntropyLoss()\n",
+        "  # construct SGD optimizer and initialize learning rate and momentum\n",
+        "  optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
+        "\n",
+        "  # load the data into a class that creates the batches\n",
+        "  data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=200, shuffle=True, worker_init_fn=np.random.seed(1))\n",
+        "\n",
+        "  # Initialize model weights\n",
+        "  model.apply(weights_init)\n",
+        "\n",
+        "  # Get a batch\n",
+        "  for i, data in enumerate(data_loader):\n",
+        "    # retrieve inputs and labels for this batch\n",
+        "    x_batch, y_batch = data\n",
+        "    # zero the parameter gradients\n",
+        "    optimizer.zero_grad()\n",
+        "    # forward pass -- calculate model output\n",
+        "    pred = model(x_batch)\n",
+        "    # compute the loss\n",
+        "    loss = loss_function(pred, y_batch)\n",
+        "    # backward pass\n",
+        "    loss.backward()\n",
+        "    # SGD update\n",
+        "    optimizer.step()\n",
+        "    # Break out of this loop -- we just want to see the first\n",
+        "    # iteration, but usually we would continue\n",
+        "    break"
+      ],
+      "metadata": {
+        "id": "DFlu45pORQEz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# convert training data to torch tensors\n",
+        "x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
+        "y_train = torch.tensor(train_data_y.astype('long'))"
+      ],
+      "metadata": {
+        "id": "i7Q0ScWgRe4G"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# This is a simple residual model with 5 residual branches in a row\n",
+        "class ResidualNetwork(torch.nn.Module):\n",
+        "  def __init__(self, input_size, output_size, hidden_size=100):\n",
+        "    super(ResidualNetwork, self).__init__()\n",
+        "    self.linear1 = nn.Linear(input_size, hidden_size)\n",
+        "    self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear4 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear5 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear6 = nn.Linear(hidden_size, output_size)\n",
+        "\n",
+        "  def count_params(self):\n",
+        "    return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
+        "\n",
+        "  def forward(self, x):\n",
+        "    print_variance(\"Input\",x)\n",
+        "    f = self.linear1(x)\n",
+        "    print_variance(\"First preactivation\",f)\n",
+        "    res1 = f+ self.linear2(f.relu())\n",
+        "    print_variance(\"After first residual connection\",res1)\n",
+        "    res2 = res1 + self.linear3(res1.relu())\n",
+        "    print_variance(\"After second residual connection\",res2)\n",
+        "    res3 = res2 + self.linear4(res2.relu())\n",
+        "    print_variance(\"After third residual connection\",res3)\n",
+        "    res4 = res3 + self.linear4(res3.relu())\n",
+        "    print_variance(\"After fourth residual connection\",res4)\n",
+        "    res5 = res4 + self.linear4(res4.relu())\n",
+        "    print_variance(\"After fifth residual connection\",res5)\n",
+        "    return self.linear6(res5)"
+      ],
+      "metadata": {
+        "id": "FslroPJJffrh"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the model and run for one step\n",
+        "# Monitoring the variance at each point in the network\n",
+        "n_hidden = 100\n",
+        "n_input = 40\n",
+        "n_output = 10\n",
+        "model = ResidualNetwork(n_input, n_output, n_hidden)\n",
+        "run_one_step_of_model(model, x_train, y_train)"
+      ],
+      "metadata": {
+        "id": "NYw8I_3mmX5c"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Notice that the variance roughly doubles at each step so it increases exponentially as in figure 11.6b in the book."
+      ],
+      "metadata": {
+        "id": "0kZUlWkkW8jE"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Adapt the residual network below to add a batch norm operation\n",
+        "# before the contents of each residual link as in figure 11.6c in the book\n",
+        "# Use the torch function nn.BatchNorm1d\n",
+        "class ResidualNetworkWithBatchNorm(torch.nn.Module):\n",
+        "  def __init__(self, input_size, output_size, hidden_size=100):\n",
+        "    super(ResidualNetworkWithBatchNorm, self).__init__()\n",
+        "    self.linear1 = nn.Linear(input_size, hidden_size)\n",
+        "    self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear4 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear5 = nn.Linear(hidden_size, hidden_size)\n",
+        "    self.linear6 = nn.Linear(hidden_size, output_size)\n",
+        "\n",
+        "  def count_params(self):\n",
+        "    return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
+        "\n",
+        "  def forward(self, x):\n",
+        "    print_variance(\"Input\",x)\n",
+        "    f = self.linear1(x)\n",
+        "    print_variance(\"First preactivation\",f)\n",
+        "    res1 = f+ self.linear2(f.relu())\n",
+        "    print_variance(\"After first residual connection\",res1)\n",
+        "    res2 = res1 + self.linear3(res1.relu())\n",
+        "    print_variance(\"After second residual connection\",res2)\n",
+        "    res3 = res2 + self.linear4(res2.relu())\n",
+        "    print_variance(\"After third residual connection\",res3)\n",
+        "    res4 = res3 + self.linear4(res3.relu())\n",
+        "    print_variance(\"After fourth residual connection\",res4)\n",
+        "    res5 = res4 + self.linear4(res4.relu())\n",
+        "    print_variance(\"After fifth residual connection\",res5)\n",
+        "    return self.linear6(res5)"
+      ],
+      "metadata": {
+        "id": "5JvMmaRITKGd"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the model\n",
+        "n_hidden = 100\n",
+        "n_input = 40\n",
+        "n_output = 10\n",
+        "model = ResidualNetworkWithBatchNorm(n_input, n_output, n_hidden)\n",
+        "run_one_step_of_model(model, x_train, y_train)"
+      ],
+      "metadata": {
+        "id": "2U3DnlH9Uw6c"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Note that the variance now increases linearly as in figure 11.6c."
+      ],
+      "metadata": {
+        "id": "R_ucFq9CXq8D"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap12/12_1_Self_Attention.ipynb
+++ b/Notebooks/Chap12/12_1_Self_Attention.ipynb
@@ -0,0 +1,375 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyOKrX9gmuhl9+KwscpZKr3u",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_1_Self_Attention.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 12.1: Self Attention**\n",
+        "\n",
+        "This notebook builds a self-attnetion mechanism from scratch, as discussed in section 12.2 of the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$.  \n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "9OJkkoNqCVK2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set seed so we get the same random numbers\n",
+        "np.random.seed(3)\n",
+        "# Number of inputs\n",
+        "N = 3\n",
+        "# Number of dimensions of each input\n",
+        "D = 4\n",
+        "# Create an empty list\n",
+        "all_x = []\n",
+        "# Create elements x_n and append to list\n",
+        "for n in range(N):\n",
+        "  all_x.append(np.random.normal(size=(D,1)))\n",
+        "# Print out the list\n",
+        "print(all_x)\n"
+      ],
+      "metadata": {
+        "id": "oAygJwLiCSri"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll also need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4)"
+      ],
+      "metadata": {
+        "id": "W2iHFbtKMaDp"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set seed so we get the same random numbers\n",
+        "np.random.seed(0)\n",
+        "\n",
+        "# Choose random values for the parameters\n",
+        "omega_q = np.random.normal(size=(D,D))\n",
+        "omega_k = np.random.normal(size=(D,D))\n",
+        "omega_v = np.random.normal(size=(D,D))\n",
+        "beta_q = np.random.normal(size=(D,1))\n",
+        "beta_k = np.random.normal(size=(D,1))\n",
+        "beta_v = np.random.normal(size=(D,1))"
+      ],
+      "metadata": {
+        "id": "79TSK7oLMobe"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's compute the queries, keys, and values for each input"
+      ],
+      "metadata": {
+        "id": "VxaKQtP3Ng6R"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Make three lists to store queries, keys, and values\n",
+        "all_queries = []\n",
+        "all_keys = []\n",
+        "all_values = []\n",
+        "# For every input\n",
+        "for x in all_x:\n",
+        "  # TODO -- compute the keys, queries and values.\n",
+        "  # Replace these three lines\n",
+        "  query = np.ones_like(x)\n",
+        "  key = np.ones_like(x)\n",
+        "  value = np.ones_like(x)\n",
+        "\n",
+        "  all_queries.append(query)\n",
+        "  all_keys.append(key)\n",
+        "  all_values.append(value)"
+      ],
+      "metadata": {
+        "id": "TwDK2tfdNmw9"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll need a softmax function (equation 12.5) -- here, it will take a list of arbirtrary numbers and return a list where the elements are non-negative and sum to one\n"
+      ],
+      "metadata": {
+        "id": "Se7DK6PGPSUk"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def softmax(items_in):\n",
+        "\n",
+        "  # TODO Compute the elements of items_out\n",
+        "  # Replace this line\n",
+        "  items_out = items_in.copy()\n",
+        "\n",
+        "  return items_out ;"
+      ],
+      "metadata": {
+        "id": "u93LIcE5PoiM"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now compute the self attention values:"
+      ],
+      "metadata": {
+        "id": "8aJVhbKDW7lm"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Create emptymlist for output\n",
+        "all_x_prime = []\n",
+        "\n",
+        "# For each output\n",
+        "for n in range(N):\n",
+        "  # Create list for dot products of query N with all keys\n",
+        "  all_km_qn = []\n",
+        "  # Compute the dot products\n",
+        "  for key in all_keys:\n",
+        "    # TODO -- compute the appropriate dot product\n",
+        "    # Replace this line\n",
+        "    dot_product = 1\n",
+        "\n",
+        "    # Store dot product\n",
+        "    all_km_qn.append(dot_product)\n",
+        "\n",
+        "  # Compute dot product\n",
+        "  attention = softmax(all_km_qn)\n",
+        "  # Print result (should be positive sum to one)\n",
+        "  print(\"Attentions for output \", n)\n",
+        "  print(attention)\n",
+        "\n",
+        "  # TODO: Compute a weighted sum of all of the values according to the attention\n",
+        "  # (equation 12.3)\n",
+        "  # Replace this line\n",
+        "  x_prime = np.zeros((D,1))\n",
+        "\n",
+        "  all_x_prime.append(x_prime)\n",
+        "\n",
+        "\n",
+        "# Print out true values to check you have it correct\n",
+        "print(\"x_prime_0_calculated:\", all_x_prime[0].transpose())\n",
+        "print(\"x_prime_0_true: [[ 0.94744244 -0.24348429 -0.91310441 -0.44522983]]\")\n",
+        "print(\"x_prime_1_calculated:\", all_x_prime[1].transpose())\n",
+        "print(\"x_prime_1_true: [[ 1.64201168 -0.08470004  4.02764044  2.18690791]]\")\n",
+        "print(\"x_prime_2_calculated:\", all_x_prime[2].transpose())\n",
+        "print(\"x_prime_2_true: [[ 1.61949281 -0.06641533  3.96863308  2.15858316]]\")\n"
+      ],
+      "metadata": {
+        "id": "yimz-5nCW6vQ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's compute the same thing, but using matrix calculations.  We'll store the $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ in the columns of a $D\\times N$ matrix, using equations 12.6 and 12.7/8.\n",
+        "\n",
+        "Note:  The book uses column vectors (for compatibility with the rest of the text), but in the wider literature it is more normal to store the inputs in the rows of a matrix;  in this case, the computation is the same, but all the matrices are transposed and the operations proceed in the reverse order."
+      ],
+      "metadata": {
+        "id": "PJ2vCQ_7C38K"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define softmax operation that works independently on each column\n",
+        "def softmax_cols(data_in):\n",
+        "  # Exponentiate all of the values\n",
+        "  exp_values = np.exp(data_in) ;\n",
+        "  # Sum over columns\n",
+        "  denom = np.sum(exp_values, axis = 0);\n",
+        "  # Replicate denominator to N rows\n",
+        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
+        "  # Compute softmax\n",
+        "  softmax = exp_values / denom\n",
+        "  # return the answer\n",
+        "  return softmax"
+      ],
+      "metadata": {
+        "id": "obaQBdUAMXXv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        " # Now let's compute self attention in matrix form\n",
+        "def self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k):\n",
+        "\n",
+        "  # TODO -- Write this function\n",
+        "  # 1. Compute queries, keys, and values\n",
+        "  # 2. Compute dot products\n",
+        "  # 3. Apply softmax to calculate attentions\n",
+        "  # 4. Weight values by attentions\n",
+        "  # Replace this line\n",
+        "  X_prime = np.zeros_like(X);\n",
+        "\n",
+        "\n",
+        "  return X_prime"
+      ],
+      "metadata": {
+        "id": "gb2WvQ3SiH8r"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Copy data into matrix\n",
+        "X = np.zeros((D, N))\n",
+        "X[:,0] = np.squeeze(all_x[0])\n",
+        "X[:,1] = np.squeeze(all_x[1])\n",
+        "X[:,2] = np.squeeze(all_x[2])\n",
+        "\n",
+        "# Run the self attention mechanism\n",
+        "X_prime = self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k)\n",
+        "\n",
+        "# Print out the results\n",
+        "print(X_prime)"
+      ],
+      "metadata": {
+        "id": "MUOJbgJskUpl"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "If you did this correctly, the values should be the same as above.\n",
+        "\n",
+        "TODO:  \n",
+        "\n",
+        "Print out the attention matrix\n",
+        "You will see that the values are quite extreme (one is very close to one and the others are very close to zero.  Now we'll fix this problem by using scaled dot-product attention."
+      ],
+      "metadata": {
+        "id": "as_lRKQFpvz0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's compute self attention in matrix form\n",
+        "def scaled_dot_product_self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k):\n",
+        "\n",
+        "  # TODO -- Write this function\n",
+        "  # 1. Compute queries, keys, and values\n",
+        "  # 2. Compute dot products\n",
+        "  # 3. Scale the dot products as in equation 12.9\n",
+        "  # 4. Apply softmax to calculate attentions\n",
+        "  # 5. Weight values by attentions\n",
+        "  # Replace this line\n",
+        "  X_prime = np.zeros_like(X);\n",
+        "\n",
+        "  return X_prime"
+      ],
+      "metadata": {
+        "id": "kLU7PUnnqvIh"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Run the self attention mechanism\n",
+        "X_prime = scaled_dot_product_self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k)\n",
+        "\n",
+        "# Print out the results\n",
+        "print(X_prime)"
+      ],
+      "metadata": {
+        "id": "n18e3XNzmVgL"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO -- Investigate whether the self-attention mechanism is covariant with respect to permulation.\n",
+        "If it is, when we permute the columns of the input matrix $\\mathbf{X}$, the columns of the output matrix $\\mathbf{X}'$ will also be permuted.\n"
+      ],
+      "metadata": {
+        "id": "QDEkIrcgrql-"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
+++ b/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
@@ -0,0 +1,212 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyMSk8qTqDYqFnRJVZKlsue0",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 12.1: Multhead Self-Attention**\n",
+        "\n",
+        "This notebook builds a multihead self-attentionm mechanism as in figure 12.6\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The multihead self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$.  \n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "9OJkkoNqCVK2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set seed so we get the same random numbers\n",
+        "np.random.seed(3)\n",
+        "# Number of inputs\n",
+        "N = 6\n",
+        "# Number of dimensions of each input\n",
+        "D = 8\n",
+        "# Create an empty list\n",
+        "X = np.random.normal(size=(D,N))\n",
+        "# Print X\n",
+        "print(X)"
+      ],
+      "metadata": {
+        "id": "oAygJwLiCSri"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll use two heads.  We'll need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4).  We'll use two heads, and (as in the figure), we'll make the queries keys and values of size D/H"
+      ],
+      "metadata": {
+        "id": "W2iHFbtKMaDp"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Number of heads\n",
+        "H = 2\n",
+        "# QDV dimension\n",
+        "H_D = int(D/H)\n",
+        "\n",
+        "# Set seed so we get the same random numbers\n",
+        "np.random.seed(0)\n",
+        "\n",
+        "# Choose random values for the parameters for the first head\n",
+        "omega_q1 = np.random.normal(size=(H_D,D))\n",
+        "omega_k1 = np.random.normal(size=(H_D,D))\n",
+        "omega_v1 = np.random.normal(size=(H_D,D))\n",
+        "beta_q1 = np.random.normal(size=(H_D,1))\n",
+        "beta_k1 = np.random.normal(size=(H_D,1))\n",
+        "beta_v1 = np.random.normal(size=(H_D,1))\n",
+        "\n",
+        "# Choose random values for the parameters for the second head\n",
+        "omega_q2 = np.random.normal(size=(H_D,D))\n",
+        "omega_k2 = np.random.normal(size=(H_D,D))\n",
+        "omega_v2 = np.random.normal(size=(H_D,D))\n",
+        "beta_q2 = np.random.normal(size=(H_D,1))\n",
+        "beta_k2 = np.random.normal(size=(H_D,1))\n",
+        "beta_v2 = np.random.normal(size=(H_D,1))\n",
+        "\n",
+        "# Choose random values for the parameters\n",
+        "omega_c = np.random.normal(size=(D,D))"
+      ],
+      "metadata": {
+        "id": "79TSK7oLMobe"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's compute the multiscale self-attention"
+      ],
+      "metadata": {
+        "id": "VxaKQtP3Ng6R"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define softmax operation that works independently on each column\n",
+        "def softmax_cols(data_in):\n",
+        "  # Exponentiate all of the values\n",
+        "  exp_values = np.exp(data_in) ;\n",
+        "  # Sum over columns\n",
+        "  denom = np.sum(exp_values, axis = 0);\n",
+        "  # Replicate denominator to N rows\n",
+        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
+        "  # Compute softmax\n",
+        "  softmax = exp_values / denom\n",
+        "  # return the answer\n",
+        "  return softmax"
+      ],
+      "metadata": {
+        "id": "obaQBdUAMXXv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        " # Now let's compute self attention in matrix form\n",
+        "def multihead_scaled_self_attention(X,omega_v1, omega_q1, omega_k1, beta_v1, beta_q1, beta_k1, omega_v2, omega_q2, omega_k2, beta_v2, beta_q2, beta_k2, omega_c):\n",
+        "\n",
+        "  # TODO Write the multihead scaled self-attention mechanism.\n",
+        "  # Replace this line\n",
+        "  X_prime = np.zeros_like(X) ;\n",
+        "\n",
+        "\n",
+        "  return X_prime"
+      ],
+      "metadata": {
+        "id": "gb2WvQ3SiH8r"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Run the self attention mechanism\n",
+        "X_prime = multihead_scaled_self_attention(X,omega_v1, omega_q1, omega_k1, beta_v1, beta_q1, beta_k1, omega_v2, omega_q2, omega_k2, beta_v2, beta_q2, beta_k2, omega_c)\n",
+        "\n",
+        "# Print out the results\n",
+        "np.set_printoptions(precision=3)\n",
+        "print(\"Your answer:\")\n",
+        "print(X_prime)\n",
+        "\n",
+        "print(\"True values:\")\n",
+        "print(\"[[-21.207  -5.373 -20.933  -9.179 -11.319 -17.812]\")\n",
+        "print(\" [ -1.995   7.906 -10.516   3.452   9.863  -7.24 ]\")\n",
+        "print(\" [  5.479   1.115   9.244   0.453   5.656   7.089]\")\n",
+        "print(\" [ -7.413  -7.416   0.363  -5.573  -6.736  -0.848]\")\n",
+        "print(\" [-11.261  -9.937  -4.848  -8.915 -13.378  -5.761]\")\n",
+        "print(\" [  3.548  10.036  -2.244   1.604  12.113  -2.557]\")\n",
+        "print(\" [  4.888  -5.814   2.407   3.228  -4.232   3.71 ]\")\n",
+        "print(\" [  1.248  18.894  -6.409   3.224  19.717  -5.629]]\")\n",
+        "\n",
+        "# If your answers don't match, then make sure that you are doing the scaling, and make sure the scaling value is correct"
+      ],
+      "metadata": {
+        "id": "MUOJbgJskUpl"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/Notebooks/Chap12/12_3_Tokenization.ipynb
+++ b/Notebooks/Chap12/12_3_Tokenization.ipynb
@@ -0,0 +1,341 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyP0/KodWM9Dtr2x+8MdXXH1",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_3_Tokenization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 12.3: Tokenization**\n",
+        "\n",
+        "This notebook builds set of tokens from a text string as in figure 12.8 of the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "I adapted this code from *SOMEWHERE*.  If anyone recognizes it, can you let me know and I will give the proper attribution or rewrite if the license is not permissive.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import re, collections"
+      ],
+      "metadata": {
+        "id": "3_WkaFO3OfLi"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "text = \"a sailor went to sea sea sea \"+\\\n",
+        "                  \"to see what he could see see see \"+\\\n",
+        "                  \"but all that he could see see see \"+\\\n",
+        "                  \"was the bottom of the deep blue sea sea sea\""
+      ],
+      "metadata": {
+        "id": "tVZVuauIXmJk"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Tokenize the input sentence To begin with the tokens are the individual letters and the </w> whitespace token. So, we represent each word in terms of these tokens with spaces between the tokens to delineate them.\n",
+        "\n",
+        "The tokenized text is stored in a structure that represents each word as tokens together with the count of how often that word occurs.  We'll call this the *vocabulary*."
+      ],
+      "metadata": {
+        "id": "fF2RBrouWV5w"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def initialize_vocabulary(text):\n",
+        "  vocab = collections.defaultdict(int)\n",
+        "  words = text.strip().split()\n",
+        "  for word in words:\n",
+        "      vocab[' '.join(list(word)) + ' </w>'] += 1\n",
+        "  return vocab"
+      ],
+      "metadata": {
+        "id": "OfvXkLSARk4_"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "vocab = initialize_vocabulary(text)\n",
+        "print('Vocabulary: {}'.format(vocab))\n",
+        "print('Size of vocabulary: {}'.format(len(vocab)))"
+      ],
+      "metadata": {
+        "id": "aydmNqaoOpSm"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Find all the tokens in the current vocabulary and their frequencies"
+      ],
+      "metadata": {
+        "id": "fJAiCjphWsI9"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def get_tokens_and_frequencies(vocab):\n",
+        "  tokens = collections.defaultdict(int)\n",
+        "  for word, freq in vocab.items():\n",
+        "      word_tokens = word.split()\n",
+        "      for token in word_tokens:\n",
+        "          tokens[token] += freq\n",
+        "  return tokens"
+      ],
+      "metadata": {
+        "id": "qYi6F_K3RYsW"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "tokens = get_tokens_and_frequencies(vocab)\n",
+        "print('Tokens: {}'.format(tokens))\n",
+        "print('Number of tokens: {}'.format(len(tokens)))"
+      ],
+      "metadata": {
+        "id": "Y4LCVGnvXIwp"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Find each pair of adjacent tokens in the vocabulary\n",
+        "and count them.  We will subsequently merge the most frequently occurring pair."
+      ],
+      "metadata": {
+        "id": "_-Rh1mD_Ww3b"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def get_pairs_and_counts(vocab):\n",
+        "    pairs = collections.defaultdict(int)\n",
+        "    for word, freq in vocab.items():\n",
+        "        symbols = word.split()\n",
+        "        for i in range(len(symbols)-1):\n",
+        "            pairs[symbols[i],symbols[i+1]] += freq\n",
+        "    return pairs"
+      ],
+      "metadata": {
+        "id": "OqJTB3UFYubH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "pairs = get_pairs_and_counts(vocab)\n",
+        "print('Pairs: {}'.format(pairs))\n",
+        "print('Number of distinct pairs: {}'.format(len(pairs)))\n",
+        "\n",
+        "most_frequent_pair = max(pairs, key=pairs.get)\n",
+        "print('Most frequent pair: {}'.format(most_frequent_pair))"
+      ],
+      "metadata": {
+        "id": "d-zm0JBcZSjS"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Merge the instances of the best pair in the vocabulary"
+      ],
+      "metadata": {
+        "id": "pcborzqIXQFS"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def merge_pair_in_vocabulary(pair, vocab_in):\n",
+        "    vocab_out = {}\n",
+        "    bigram = re.escape(' '.join(pair))\n",
+        "    p = re.compile(r'(?<!\\S)' + bigram + r'(?!\\S)')\n",
+        "    for word in vocab_in:\n",
+        "        word_out = p.sub(''.join(pair), word)\n",
+        "        vocab_out[word_out] = vocab_in[word]\n",
+        "    return vocab_out"
+      ],
+      "metadata": {
+        "id": "xQI6NALdWQZX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "vocab = merge_pair_in_vocabulary(most_frequent_pair, vocab)\n",
+        "print('Vocabulary: {}'.format(vocab))\n",
+        "print('Size of vocabulary: {}'.format(len(vocab)))"
+      ],
+      "metadata": {
+        "id": "TRYeBZI3ZULu"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Update the tokens, which now include the best token 'se'"
+      ],
+      "metadata": {
+        "id": "bkhUx3GeXwba"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "tokens = get_tokens_and_frequencies(vocab)\n",
+        "print('Tokens: {}'.format(tokens))\n",
+        "print('Number of tokens: {}'.format(len(tokens)))"
+      ],
+      "metadata": {
+        "id": "Fqj-vQWeXxQi"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's write the full tokenization routine"
+      ],
+      "metadata": {
+        "id": "K_hKp2kSXXS1"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO -- write this routine by filling in this missing parts,\n",
+        "# calling the above routines\n",
+        "def tokenize(text, num_merges):\n",
+        "  # Initialize the vocabulary from the input text\n",
+        "  # vocab = (your code here)\n",
+        "\n",
+        "  for i in range(num_merges):\n",
+        "    # Find the tokens and how often they occur in the vocabulary\n",
+        "    # tokens = (your code here)\n",
+        "\n",
+        "    # Find the pairs of adjacent tokens and their counts\n",
+        "    # pairs = (your code here)\n",
+        "\n",
+        "    # Find the most frequent pair\n",
+        "    # most_frequent_pair = (your code here)\n",
+        "    print('Most frequent pair: {}'.format(most_frequent_pair))\n",
+        "\n",
+        "    # Merge the code in the vocabulary\n",
+        "    # vocab = (your code here)\n",
+        "\n",
+        "  # Find the tokens and how often they occur in the vocabulary one last time\n",
+        "  # tokens = (your code here)\n",
+        "\n",
+        "  return tokens, vocab"
+      ],
+      "metadata": {
+        "id": "U_1SkQRGQ8f3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "tokens, vocab = tokenize(text, num_merges=22)"
+      ],
+      "metadata": {
+        "id": "w0EkHTrER_-I"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print('Tokens: {}'.format(tokens))\n",
+        "print('Number of tokens: {}'.format(len(tokens)))\n",
+        "print('Vocabulary: {}'.format(vocab))\n",
+        "print('Size of vocabulary: {}'.format(len(vocab)))"
+      ],
+      "metadata": {
+        "id": "moqDtTzIb-NG"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO - Consider the input text:\n",
+        "\n",
+        "\"How much wood could a woodchuck chuck if a woodchuck could chuck wood\"\n",
+        "\n",
+        "How many tokens will there be initially and what will they be?\n",
+        "How many tokens will there be if we run the tokenization routine for the maximum number of iterations (merges)?\n",
+        "\n",
+        "When you've made your predictions, run the code and see if you are correct."
+      ],
+      "metadata": {
+        "id": "jOW_HJtMdAxd"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap13/13_1_Graph_Representation.ipynb
+++ b/Notebooks/Chap13/13_1_Graph_Representation.ipynb
@@ -0,0 +1,159 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyMuzP1/oqTRTw4Xs/R4J/M3",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_1_Graph_Representation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 13.1: Graph representation**\n",
+        "\n",
+        "This notebook investigates representing graphs with matrices as illustrated in figure 13.4 from the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "import networkx as nx"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Routine to draw graph structure\n",
+        "def draw_graph_structure(adjacency_matrix):\n",
+        "\n",
+        "  G = nx.Graph()\n",
+        "  n_node = adjacency_matrix.shape[0]\n",
+        "  for i in range(n_node):\n",
+        "    for j in range(i):\n",
+        "      if adjacency_matrix[i,j]:\n",
+        "          G.add_edge(i,j)\n",
+        "\n",
+        "  nx.draw(G, nx.spring_layout(G, seed = 0), with_labels=True)\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "O1QMxC7X4vh9"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define a graph\n",
+        "# Note that the nodes are labelled from 0 rather than 1 as in the book\n",
+        "A = np.array([[0,1,0,1,0,0,0,0],\n",
+        "     [1,0,1,1,1,0,0,0],\n",
+        "     [0,1,0,0,1,0,0,0],\n",
+        "     [1,1,0,0,1,0,0,0],\n",
+        "     [0,1,1,1,0,1,0,1],\n",
+        "     [0,0,0,0,1,0,1,1],\n",
+        "     [0,0,0,0,0,1,0,0],\n",
+        "     [0,0,0,0,1,1,0,0]]);\n",
+        "print(A)\n",
+        "draw_graph_structure(A)"
+      ],
+      "metadata": {
+        "id": "TIrihEw-7DRV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO -- find algorithmically how many walks of length three are between nodes 3 and 7\n",
+        "# Replace this line\n",
+        "print(\"Number of  walks between nodes three and seven = ???\")"
+      ],
+      "metadata": {
+        "id": "PzvfUpkV4zCj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO -- find algorithmically what the minimum path distance between nodes 0 and 6 is\n",
+        "# (i.e. what is the first walk length with non-zero count between 0 and 6)\n",
+        "# Replace this line\n",
+        "print(\"Minimum distance = ???\")\n",
+        "\n",
+        "\n",
+        "# What is the worst case complexity of your method?"
+      ],
+      "metadata": {
+        "id": "MhhJr6CgCRb5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's represent node 0 as a vector\n",
+        "x = np.array([[1],[0],[0],[0],[0],[0],[0],[0]]);\n",
+        "print(x)"
+      ],
+      "metadata": {
+        "id": "lCQjXlatABGZ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO: Find algorithmically how many paths of length 3 are there between node 0 and every other node\n",
+        "# Replace this line\n",
+        "print(np.zeros_like(x))"
+      ],
+      "metadata": {
+        "id": "nizLdZgLDzL4"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/Notebooks/Chap13/13_2_Graph_Classification.ipynb
+++ b/Notebooks/Chap13/13_2_Graph_Classification.ipynb
@@ -0,0 +1,244 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyOMSGUFWT+YN0fwYHpMmHJM",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_2_Graph_Classification.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 13.2: Graph classification**\n",
+        "\n",
+        "This notebook investigates representing graphs with matrices as illustrated in figure 13.4 from the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "import networkx as nx"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's build a model that maps a chemical structure to a binary decision.  This model might be used to predict whether a chemical is liquid at room temparature or not.  We'll start by drawing the chemical structure."
+      ],
+      "metadata": {
+        "id": "UNleESc7k5uB"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define a graph that represents the chemical structure of ethanol and draw it\n",
+        "# Each node is labelled with the node number and the element (carbon, hydrogen, oxygen)\n",
+        "G = nx.Graph()\n",
+        "G.add_edge('0:H','2:C')\n",
+        "G.add_edge('1:H','2:C')\n",
+        "G.add_edge('3:H','2:C')\n",
+        "G.add_edge('2:C','5:C')\n",
+        "G.add_edge('4:H','5:C')\n",
+        "G.add_edge('6:H','5:C')\n",
+        "G.add_edge('7:O','5:C')\n",
+        "G.add_edge('8:H','7:O')\n",
+        "nx.draw(G, nx.spring_layout(G, seed = 0), with_labels=True, node_size=600)\n",
+        "plt.show()"
+      ],
+      "metadata": {
+        "id": "TIrihEw-7DRV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define adjacency matrix\n",
+        "# TODO -- Define the adjacency matrix for this chemical\n",
+        "# Replace this line\n",
+        "A = np.zeros((9,9)) ;\n",
+        "\n",
+        "\n",
+        "print(A)\n",
+        "\n",
+        "# TODO -- Define node matrix\n",
+        "# There will be 9 nodes and 118 possible chemical elements\n",
+        "# so we'll define a 9x118 matrix.  Each column represents one\n",
+        "# node and is a one-hot vector (i.e. all zeros, except a single one at the\n",
+        "# chemical number of the element).\n",
+        "# Chemical numbers:  Hydrogen-->1, Carbon-->6,  Oxygen-->8\n",
+        "# Since the indices start at 0, we'll set element 0 to 1 for hydrogen, element 5\n",
+        "# to one for carbon, and element 7 to one for oxygen\n",
+        "# Replace this line:\n",
+        "X = np.zeros((118,9))\n",
+        "\n",
+        "\n",
+        "# Print the top 15 rows of the data matrix\n",
+        "print(X[0:15,:])"
+      ],
+      "metadata": {
+        "id": "gKBD5JsPfrkA"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's define a network with four layers that maps this graph to a binary value, using the formulation in equation 13.11."
+      ],
+      "metadata": {
+        "id": "40FLjNIcpHa9"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# We'll need these helper functions\n",
+        "\n",
+        "# Define the Rectified Linear Unit (ReLU) function\n",
+        "def ReLU(preactivation):\n",
+        "  activation = preactivation.clip(0.0)\n",
+        "  return activation\n",
+        "\n",
+        "# Define the logistic sigmoid function\n",
+        "def sigmoid(x):\n",
+        "  return 1.0/(1.0+np.exp(-x))"
+      ],
+      "metadata": {
+        "id": "52IFREpepHE4"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Our network will have K=3 hidden layers, and will use a dimension of D=200.\n",
+        "K = 3; D = 200\n",
+        "# Set seed so we always get the same random numbers\n",
+        "np.random.seed(1)\n",
+        "# Let's initialize the parameter matrices randomly with He initialization\n",
+        "Omega0 = np.random.normal(size=(D, 118)) * 2.0 / D\n",
+        "beta0 = np.random.normal(size=(D,1)) * 2.0 / D\n",
+        "Omega1 = np.random.normal(size=(D, D)) * 2.0 / D\n",
+        "beta1 = np.random.normal(size=(D,1)) * 2.0 / D\n",
+        "Omega2 = np.random.normal(size=(D, D)) * 2.0 / D\n",
+        "beta2 = np.random.normal(size=(D,1)) * 2.0 / D\n",
+        "omega3 = np.random.normal(size=(1, D))\n",
+        "beta3 = np.random.normal(size=(1,1))"
+      ],
+      "metadata": {
+        "id": "ag0YdEgnpApK"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def graph_neural_network(A,X, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3):\n",
+        "  # Define this network according to equation 13.11 from the book\n",
+        "  # Replace this line\n",
+        "  f = np.ones((1,1))\n",
+        "\n",
+        "  return f;"
+      ],
+      "metadata": {
+        "id": "RQuTMc2WrsU3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's test this network\n",
+        "f = graph_neural_network(A,X, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3)\n",
+        "print(\"Your value is %3f: \"%(f[0,0]), \"True value of f: 0.498010\")"
+      ],
+      "metadata": {
+        "id": "X7gYgOu6uIAt"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's check that permuting the indices of the graph doesn't change\n",
+        "# the output of the network\n",
+        "# Define a permutation matrix\n",
+        "P = np.array([[0,1,0,0,0,0,0,0,0],\n",
+        "              [0,0,0,0,1,0,0,0,0],\n",
+        "              [0,0,0,0,0,1,0,0,0],\n",
+        "              [0,0,0,0,0,0,0,0,1],\n",
+        "              [1,0,0,0,0,0,0,0,0],\n",
+        "              [0,0,1,0,0,0,0,0,0],\n",
+        "              [0,0,0,1,0,0,0,0,0],\n",
+        "              [0,0,0,0,0,0,0,1,0],\n",
+        "              [0,0,0,0,0,0,1,0,0]]);\n",
+        "\n",
+        "# TODO -- Use this matrix to permute the adjacency matrix A and node matrix X\n",
+        "# Replace these lines\n",
+        "A_permuted = np.copy(A)\n",
+        "X_permuted = np.copy(X)\n",
+        "\n",
+        "f = graph_neural_network(A_permuted,X_permuted, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3)\n",
+        "print(\"Your value is %3f: \"%(f[0,0]), \"True value of f: 0.498010\")"
+      ],
+      "metadata": {
+        "id": "F0zc3U_UuR5K"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO -- encode the adjacency matrix and node matrix for propanol and run the network again.  Show that the network still runs even though the size of the input graph is different.\n",
+        "\n",
+        "Propanol structure can be found [here](https://upload.wikimedia.org/wikipedia/commons/b/b8/Propanol_flat_structure.png)."
+      ],
+      "metadata": {
+        "id": "l44vHi50zGqY"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap13/13_3_Neighborhood_Sampling.ipynb
+++ b/Notebooks/Chap13/13_3_Neighborhood_Sampling.ipynb
@@ -0,0 +1,314 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyNXqwmC4yEc1mGv9/74b0jY",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_3_Neighborhood_Sampling.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 13.3: Neighborhood sampling**\n",
+        "\n",
+        "This notebook investigates neighborhood sampling of graphs as in figure 13.10 from the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "import networkx as nx"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's construct the graph from figure 13.10, which has 23 nodes."
+      ],
+      "metadata": {
+        "id": "UNleESc7k5uB"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define adjacency matrix\n",
+        "A = np.array([[0,1,1,1,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [1,0,1,0,0, 0,0,0,1,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [1,1,0,1,0, 0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [1,0,1,0,1, 0,1,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,0,0,1,0, 1,0,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,0,0,0,1, 0,0,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,0,0,1,0, 0,0,1,0,1, 1,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,0,0,1,1, 1,1,0,0,0, 1,0,0,1,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,1,0,0,0, 0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,1,1,0,0, 0,1,0,1,0, 0,1,1,0,0, 0,1,0,0,0, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,1,1,0,0, 0,0,1,0,0, 0,0,0,0,0, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,1, 0,0,0,0,1, 1,1,0,0,0, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,1, 1,0,0,1,0, 0,1,1,0,0, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,1,0,0, 0,0,1,0,0, 0,0,1,1,0, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,1,0,0,0, 1,0,0,0,1, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,1,0,0,1, 0,1,0,0,1, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,1, 0,1,1,0,0, 1,0,1,0,1, 0,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,1,1,0, 0,1,0,1,0, 1,1,1],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,0,1,0, 0,0,1,0,0, 0,0,1],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,1, 1,1,0,0,0, 1,0,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,1, 0,1,0],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,0, 1,0,1],\n",
+        "              [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,1,0, 0,1,0]]);\n",
+        "print(A)"
+      ],
+      "metadata": {
+        "id": "fHgH5hdG_W1h"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Routine to draw graph structure, highlighting original node (brown in fig 13.10)\n",
+        "# and neighborhood nodes (orange in figure 13.10)\n",
+        "def draw_graph_structure(adjacency_matrix, original_node, neighborhood_nodes=None):\n",
+        "\n",
+        "  G = nx.Graph()\n",
+        "  n_node = adjacency_matrix.shape[0]\n",
+        "  for i in range(n_node):\n",
+        "    for j in range(i):\n",
+        "      if adjacency_matrix[i,j]:\n",
+        "          G.add_edge(i,j)\n",
+        "\n",
+        "  color_map = []\n",
+        "\n",
+        "  for node in G:\n",
+        "    if original_node[node]:\n",
+        "      color_map.append('brown')\n",
+        "    else:\n",
+        "      if neighborhood_nodes[node]:\n",
+        "        color_map.append('orange')\n",
+        "      else:\n",
+        "        color_map.append('white')\n",
+        "\n",
+        "  nx.draw(G, nx.spring_layout(G, seed = 7), with_labels=True,node_color=color_map)\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "TIrihEw-7DRV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "n_nodes = A.shape[0]\n",
+        "\n",
+        "# Define a single output layer node\n",
+        "output_layer_nodes=np.zeros((n_nodes,1)); output_layer_nodes[16]=1\n",
+        "# Define the neighboring nodes to draw (none)\n",
+        "neighbor_nodes = np.zeros((n_nodes,1))\n",
+        "print(\"Output layer:\")\n",
+        "draw_graph_structure(A, output_layer_nodes, neighbor_nodes)"
+      ],
+      "metadata": {
+        "id": "gKBD5JsPfrkA"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's imagine that we want to form a batch for a node labelling task that consists of just node 16 in the output layer (highlighted).   The network consists of the input, hidden layer 1, hidden layer2, and the output layer."
+      ],
+      "metadata": {
+        "id": "JaH3g_-O-0no"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Find the nodes in hidden layer 2 that connect to node 16 in the output layer\n",
+        "# using the adjacency matrix\n",
+        "# Replace this line:\n",
+        "hidden_layer2_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "print(\"Hidden layer 2:\")\n",
+        "draw_graph_structure(A, output_layer_nodes, hidden_layer2_nodes)"
+      ],
+      "metadata": {
+        "id": "9oSiuP3B3HNS"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO - Find the nodes in hidden layer 1 that connect that connect to node 16 in the output layer\n",
+        "# using the adjacency matrix\n",
+        "# Replace this line:\n",
+        "hidden_layer1_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "print(\"Hidden layer 1:\")\n",
+        "draw_graph_structure(A, output_layer_nodes, hidden_layer1_nodes)"
+      ],
+      "metadata": {
+        "id": "zZFxw3m1_wWr"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Find the nodes in the input layer that connect to node 16 in the output layer\n",
+        "# using the adjacency matrix\n",
+        "# Replace this line:\n",
+        "input_layer_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "print(\"Input layer:\")\n",
+        "draw_graph_structure(A, output_layer_nodes, input_layer_nodes)"
+      ],
+      "metadata": {
+        "id": "EL3N8BXyCu0F"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This is bad news.  This is a fairly sparsely connected graph (i.e. adjacency matrix is mostly zeros) and there are only two hidden layers.  Nonetheless, we have to involve almost all the nodes in the graph to compute the loss at this output.\n",
+        "\n",
+        "To resolve this problem, we'll use neighborhood sampling.  We'll start again with a single node in the output layer."
+      ],
+      "metadata": {
+        "id": "CE0WqytvC7zr"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"Output layer:\")\n",
+        "draw_graph_structure(A, output_layer_nodes, neighbor_nodes)"
+      ],
+      "metadata": {
+        "id": "59WNys3KC5y6"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define umber of neighbors to sample\n",
+        "n_sample = 3"
+      ],
+      "metadata": {
+        "id": "uCoJwpcTNFdI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Find the nodes in hidden layer 2 that connect to node 16 in the output layer\n",
+        "# using the adjacency matrix.  Then sample n_sample of these nodes randomly without\n",
+        "# replacement.\n",
+        "\n",
+        "# Replace this line:\n",
+        "hidden_layer2_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "draw_graph_structure(A, output_layer_nodes, hidden_layer2_nodes)"
+      ],
+      "metadata": {
+        "id": "_WEop6lYGNhJ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Find the nodes in hidden layer 1 that connect to the nodes in hidden layer 2\n",
+        "# using the adjacency matrix.  Then sample n_sample of these nodes randomly without\n",
+        "# replacement.  Make sure not to sample nodes that were already included in hidden layer 2 our the ouput layer.\n",
+        "# The nodes at hidden layer 1 are the union of these nodes and the nodes in hidden layer 2\n",
+        "\n",
+        "# Replace this line:\n",
+        "hidden_layer1_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "draw_graph_structure(A, output_layer_nodes, hidden_layer1_nodes)\n"
+      ],
+      "metadata": {
+        "id": "k90qW_LDLpNk"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Find the nodes in the input layer that connect to the nodes in hidden layer 1\n",
+        "# using the adjacency matrix.  Then sample n_sample of these nodes randomly without\n",
+        "# replacement.  Make sure not to sample nodes that were already included in hidden layer 1,2, or the output layer.\n",
+        "# The nodes at the input layer are the union of these nodes and the nodes in hidden layers 1 and 2\n",
+        "\n",
+        "# Replace this line:\n",
+        "input_layer_nodes = np.zeros((n_nodes,1));\n",
+        "\n",
+        "draw_graph_structure(A, output_layer_nodes, input_layer_nodes)"
+      ],
+      "metadata": {
+        "id": "NDEYUty_O3Zr"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "If you did this correctly, there should be 9 yellow nodes in the figure.  The \"receptive field\" of node 16 in the output layer increases much more slowly as we move back through the layers of the network."
+      ],
+      "metadata": {
+        "id": "vu4eJURmVkc5"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb
+++ b/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb
@@ -0,0 +1,213 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyOdSkjfQnSZXnffGsZVM7r5",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 13.4: Graph attention networks**\n",
+        "\n",
+        "This notebook builds a graph attention mechanism from scratch, as discussed in section 13.8.6 of the book and illustrated in figure 13.12c\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$.  \n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "9OJkkoNqCVK2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set seed so we get the same random numbers\n",
+        "np.random.seed(1)\n",
+        "# Number of nodes in the graph\n",
+        "N = 8\n",
+        "# Number of dimensions of each input\n",
+        "D = 4\n",
+        "\n",
+        "# Define a graph\n",
+        "A = np.array([[0,1,0,1,0,0,0,0],\n",
+        "              [1,0,1,1,1,0,0,0],\n",
+        "              [0,1,0,0,1,0,0,0],\n",
+        "              [1,1,0,0,1,0,0,0],\n",
+        "              [0,1,1,1,0,1,0,1],\n",
+        "              [0,0,0,0,1,0,1,1],\n",
+        "              [0,0,0,0,0,1,0,0],\n",
+        "              [0,0,0,0,1,1,0,0]]);\n",
+        "print(A)\n",
+        "\n",
+        "# Let's also define some random data\n",
+        "X = np.random.normal(size=(D,N))"
+      ],
+      "metadata": {
+        "id": "oAygJwLiCSri"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll also need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4)"
+      ],
+      "metadata": {
+        "id": "W2iHFbtKMaDp"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Choose random values for the parameters\n",
+        "omega = np.random.normal(size=(D,D))\n",
+        "beta = np.random.normal(size=(D,1))\n",
+        "phi = np.random.normal(size=(1,2*D))"
+      ],
+      "metadata": {
+        "id": "79TSK7oLMobe"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll need a softmax operation that operates on the columns of the matrix and a ReLU function as well"
+      ],
+      "metadata": {
+        "id": "iYPf6c4MhCgq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define softmax operation that works independently on each column\n",
+        "def softmax_cols(data_in):\n",
+        "  # Exponentiate all of the values\n",
+        "  exp_values = np.exp(data_in) ;\n",
+        "  # Sum over columns\n",
+        "  denom = np.sum(exp_values, axis = 0);\n",
+        "  # Replicate denominator to N rows\n",
+        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
+        "  # Compute softmax\n",
+        "  softmax = exp_values / denom\n",
+        "  # return the answer\n",
+        "  return softmax\n",
+        "\n",
+        "\n",
+        "# Define the Rectified Linear Unit (ReLU) function\n",
+        "def ReLU(preactivation):\n",
+        "  activation = preactivation.clip(0.0)\n",
+        "  return activation\n"
+      ],
+      "metadata": {
+        "id": "obaQBdUAMXXv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        " # Now let's compute self attention in matrix form\n",
+        "def graph_attention(X,omega, beta, phi, A):\n",
+        "\n",
+        "  # TODO -- Write this function (see figure 13.12c)\n",
+        "  # 1. Compute X_prime\n",
+        "  # 2. Compute S\n",
+        "  # 3. To apply the mask, set S to a very large negative number (e.g. -1e20) everywhere where A+I is zero\n",
+        "  # 4. Run the softmax function to compute the attention values\n",
+        "  # 5. Postmultiply X' by the attention values\n",
+        "  # 6. Apply the ReLU function\n",
+        "  # Replace this line:\n",
+        "  output = np.ones_like(X) ;\n",
+        "\n",
+        "  return output;"
+      ],
+      "metadata": {
+        "id": "gb2WvQ3SiH8r"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Test out the graph attention mechanism\n",
+        "np.set_printoptions(precision=3)\n",
+        "output = graph_attention(X, omega, beta, phi, A);\n",
+        "print(\"Correct answer is:\")\n",
+        "print(\"[[1.796 1.346 0.569 1.703 1.298 1.224 1.24  1.234]\")\n",
+        "print(\" [0.768 0.672 0.    0.529 3.841 4.749 5.376 4.761]\")\n",
+        "print(\" [0.305 0.129 0.    0.341 0.785 1.014 1.113 1.024]\")\n",
+        "print(\" [0.    0.    0.    0.    0.35  0.864 1.098 0.871]]]\")\n",
+        "\n",
+        "\n",
+        "print(\"Your answer is:\")\n",
+        "print(output)"
+      ],
+      "metadata": {
+        "id": "d4p6HyHXmDh5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO -- Try to construct a dot-product self-attention mechanism as in practical 12.1 that respects the geometry of the graph and has zero attention between non-neighboring nodes by combining figures 13.12a and 13.12b.\n"
+      ],
+      "metadata": {
+        "id": "QDEkIrcgrql-"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb
+++ b/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb
@@ -0,0 +1,419 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyM0StKV3FIZ3MZqfflqC0Rv",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 15.1: GAN Toy example**\n",
+        "\n",
+        "This notebook investigates the GAN toy example as illustred in figure 15.1 in the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Get a batch of real data.  Our goal is to make data that looks like this.\n",
+        "def get_real_data_batch(n_sample):\n",
+        "  np.random.seed(0)\n",
+        "  x_true = np.random.normal(size=(1,n_sample)) + 7.5\n",
+        "  return x_true"
+      ],
+      "metadata": {
+        "id": "y_OkVWmam4Qx"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Define our generator.  This takes a standard normally-distributed latent variable $z$ and adds a scalar $\\theta$ to this, where $\\theta$ is the single parameter of this generative model according to:\n",
+        "\n",
+        "\\begin{equation}\n",
+        "x_i = z_i + \\theta.\n",
+        "\\end{equation}\n",
+        "\n",
+        "Obviously this model can generate the family of Gaussian distributions with unit variance, but different means."
+      ],
+      "metadata": {
+        "id": "RFpL0uCXoTpV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# This is our generator -- takes the single parameter theta\n",
+        "# of the generative model and generates n samples\n",
+        "def generator(z, theta):\n",
+        "    x_gen = z + theta\n",
+        "    return x_gen"
+      ],
+      "metadata": {
+        "id": "OtLQvf3Enfyw"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now, we define our disriminator.  This is a simple logistic regression model (1D linear model passed through sigmoid) that returns the probability that the data is real"
+      ],
+      "metadata": {
+        "id": "Xrzd8aehYAYR"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define our discriminative model\n",
+        "\n",
+        "# Logistic sigmoid, maps from [-infty,infty] to [0,1]\n",
+        "def sig(data_in):\n",
+        "  return  1.0 / (1.0+np.exp(-data_in))\n",
+        "\n",
+        "# Discriminator computes y\n",
+        "def discriminator(x, phi0, phi1):\n",
+        "  return sig(phi0 + phi1 * x)"
+      ],
+      "metadata": {
+        "id": "vHBgAFZMsnaC"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Draws a figure like Figure 15.1a\n",
+        "def draw_data_model(x_real, x_syn, phi0=None, phi1=None):\n",
+        "  fix, ax = plt.subplots();\n",
+        "\n",
+        "  for x in x_syn:\n",
+        "    ax.plot([x,x],[0,0.33],color='#f47a60')\n",
+        "  for x in x_real:\n",
+        "    ax.plot([x,x],[0,0.33],color='#7fe7dc')\n",
+        "\n",
+        "  if phi0 is not None:\n",
+        "    x_model = np.arange(0,10,0.01)\n",
+        "    y_model = discriminator(x_model, phi0, phi1)\n",
+        "    ax.plot(x_model, y_model,color='#dddddd')\n",
+        "  ax.set_xlim([0,10])\n",
+        "  ax.set_ylim([0,1])\n",
+        "\n",
+        "\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "V1FiDBhepcQJ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Get data batch\n",
+        "x_real = get_real_data_batch(10)\n",
+        "\n",
+        "# Initialize generator and synthesize a batch of examples\n",
+        "theta = 3.0\n",
+        "np.random.seed(1)\n",
+        "z = np.random.normal(size=(1,10))\n",
+        "x_syn = generator(z, theta)\n",
+        "\n",
+        "# Initialize discriminator model\n",
+        "phi0 = -2\n",
+        "phi1 = 1\n",
+        "\n",
+        "draw_data_model(x_real, x_syn, phi0, phi1)"
+      ],
+      "metadata": {
+        "id": "U8pFb497x36n"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "You can see that the synthesized (orange) samples don't look much like the real (cyan) ones, and the initial model to discriminate them (gray line represents probability of being real) is pretty bad as well.\n",
+        "\n",
+        "Let's deal with the discriminator first.  Let's define the loss"
+      ],
+      "metadata": {
+        "id": "SNDV1G5PYhcQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Discriminator loss\n",
+        "def compute_discriminator_loss(x_real, x_syn, phi0, phi1):\n",
+        "\n",
+        "  # TODO -- compute the loss for the discriminator\n",
+        "  # Run the real data and the synthetic data through the discriminator\n",
+        "  # Then use the standard binary cross entropy loss with the y=1 for the real samples\n",
+        "  # and y=0 for the synthesized ones.\n",
+        "  # Replace this line\n",
+        "  loss = 0.0\n",
+        "\n",
+        "\n",
+        "  return loss"
+      ],
+      "metadata": {
+        "id": "Bc3VwCabYcfg"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Test the loss\n",
+        "loss = compute_discriminator_loss(x_real, x_syn, phi0, phi1)\n",
+        "print(\"True Loss = 13.814757170851447, Your loss=\", loss )"
+      ],
+      "metadata": {
+        "id": "MiqM3GXSbn0z"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Gradient of loss (cheating, using finite differences)\n",
+        "def compute_discriminator_gradient(x_real, x_syn, phi0, phi1):\n",
+        "  delta = 0.0001;\n",
+        "  loss1 = compute_discriminator_loss(x_real, x_syn, phi0, phi1)\n",
+        "  loss2 = compute_discriminator_loss(x_real, x_syn, phi0+delta, phi1)\n",
+        "  loss3 = compute_discriminator_loss(x_real, x_syn, phi0, phi1+delta)\n",
+        "  dl_dphi0 = (loss2-loss1) / delta\n",
+        "  dl_dphi1 = (loss3-loss1) / delta\n",
+        "\n",
+        "  return dl_dphi0, dl_dphi1\n",
+        "\n",
+        "# This routine performs gradient descent with the discriminator\n",
+        "def update_discriminator(x_real, x_syn, n_iter, phi0, phi1):\n",
+        "\n",
+        "  # Define learning rate\n",
+        "  alpha = 0.01\n",
+        "\n",
+        "  # Get derivatives\n",
+        "  print(\"Initial discriminator loss = \", compute_discriminator_loss(x_real, x_syn, phi0, phi1))\n",
+        "  for iter in range(n_iter):\n",
+        "    # Get gradient\n",
+        "    dl_dphi0, dl_dphi1 = compute_discriminator_gradient(x_real, x_syn, phi0, phi1)\n",
+        "    # Take a gradient step downhill\n",
+        "    phi0 = phi0 - alpha * dl_dphi0 ;\n",
+        "    phi1 = phi1 - alpha * dl_dphi1 ;\n",
+        "\n",
+        "  print(\"Final Discriminator Loss= \", compute_discriminator_loss(x_real, x_syn, phi0, phi1))\n",
+        "\n",
+        "  return phi0, phi1"
+      ],
+      "metadata": {
+        "id": "zAxUPo3p0CIW"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's update the discriminator (sigmoid curve)\n",
+        "n_iter = 100\n",
+        "print(\"Initial parameters (phi0,phi1)\", phi0, phi1)\n",
+        "phi0, phi1 = update_discriminator(x_real, x_syn, n_iter, phi0, phi1)\n",
+        "print(\"Final parameters (phi0,phi1)\", phi0, phi1)\n",
+        "draw_data_model(x_real, x_syn, phi0, phi1)"
+      ],
+      "metadata": {
+        "id": "FE_DeweeAbMc"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's update the generator"
+      ],
+      "metadata": {
+        "id": "pRv9myh0d3Xm"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def compute_generator_loss(z, theta, phi0, phi1):\n",
+        "  # TODO -- Run the generator on the latent variables z with the parameters theta\n",
+        "  # to generate new data x_syn\n",
+        "  # Then run the discriminator on the new data to get the probability of being real\n",
+        "  # The loss is the total negative log probability of being synthesized (i.e. of not being real)\n",
+        "  # Replace this code\n",
+        "  loss = 1\n",
+        "\n",
+        "\n",
+        "  return loss"
+      ],
+      "metadata": {
+        "id": "5uiLrFBvJFAr"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Test generator loss to check you have it correct\n",
+        "loss = compute_generator_loss(z, theta, -2, 1)\n",
+        "print(\"True Loss = 13.78437035945412, Your loss=\", loss )"
+      ],
+      "metadata": {
+        "id": "cqnU3dGPd6NK"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def compute_generator_gradient(z, theta, phi0, phi1):\n",
+        "  delta = 0.0001\n",
+        "  loss1 = compute_generator_loss(z,theta, phi0, phi1) ;\n",
+        "  loss2 = compute_generator_loss(z,theta+delta, phi0, phi1) ;\n",
+        "  dl_dtheta = (loss2-loss1)/ delta\n",
+        "  return dl_dtheta\n",
+        "\n",
+        "def update_generator(z, theta, n_iter, phi0, phi1):\n",
+        "    # Define learning rate\n",
+        "    alpha = 0.02\n",
+        "\n",
+        "    # Get derivatives\n",
+        "    print(\"Initial generator loss = \", compute_generator_loss(z, theta, phi0, phi1))\n",
+        "    for iter in range(n_iter):\n",
+        "      # Get gradient\n",
+        "      dl_dtheta = compute_generator_gradient(x_real, x_syn, phi0, phi1)\n",
+        "      # Take a gradient step (uphill, since we are trying to make synthesized data less well classified by discriminator)\n",
+        "      theta = theta + alpha * dl_dtheta ;\n",
+        "\n",
+        "    print(\"Final generator loss = \", compute_generator_loss(z, theta, phi0, phi1))\n",
+        "    return theta\n"
+      ],
+      "metadata": {
+        "id": "P1Lqy922dqal"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "n_iter = 10\n",
+        "theta = 3.0\n",
+        "print(\"Theta before\", theta)\n",
+        "theta = update_generator(z, theta, n_iter, phi0, phi1)\n",
+        "print(\"Theta after\", theta)\n",
+        "\n",
+        "x_syn = generator(z,theta)\n",
+        "draw_data_model(x_real, x_syn, phi0, phi1)"
+      ],
+      "metadata": {
+        "id": "Q6kUkMO1P8V0"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's define a full GAN loop\n",
+        "\n",
+        "# Initialize the parameters\n",
+        "theta = 3\n",
+        "phi0 = -2\n",
+        "phi1 = 1\n",
+        "\n",
+        "# Number of iterations for updating generator and discriminator\n",
+        "n_iter_discrim = 300\n",
+        "n_iter_gen = 3\n",
+        "\n",
+        "print(\"Final parameters (phi0,phi1)\", phi0, phi1)\n",
+        "for c_gan_iter in range(5):\n",
+        "\n",
+        "  # Run generator to product syntehsized data\n",
+        "  x_syn = generator(z, theta)\n",
+        "  draw_data_model(x_real, x_syn, phi0, phi1)\n",
+        "\n",
+        "  # Update the discriminator\n",
+        "  print(\"Updating discriminator\")\n",
+        "  phi0, phi1 = update_discriminator(x_real, x_syn, n_iter_discrim, phi0, phi1)\n",
+        "  draw_data_model(x_real, x_syn, phi0, phi1)\n",
+        "\n",
+        "  # Update the generator\n",
+        "  print(\"Updating generator\")\n",
+        "  theta = update_generator(z, theta, n_iter_gen, phi0, phi1)\n"
+      ],
+      "metadata": {
+        "id": "pcbdK2agTO-y"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "You can see that the synthesized data (orange) is becoming closer to the true data (cyan).  However, this is extremely unstable -- as you will find if you mess around with the number of iterations of each optimization and the total iterations overall."
+      ],
+      "metadata": {
+        "id": "loMx0TQUgBs7"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb
+++ b/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb
@@ -0,0 +1,246 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyNyLnpoXgKN+RGCuTUszCAZ",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 15.2: Wassserstein Distance**\n",
+        "\n",
+        "This notebook investigates the GAN toy example as illustred in figure 15.1 in the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "from matplotlib import cm\n",
+        "from matplotlib.colors import ListedColormap\n",
+        "from scipy.optimize import linprog"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define two probability distributions\n",
+        "p = np.array([5, 3, 2, 1, 8, 7, 5, 9, 2, 1])\n",
+        "q = np.array([4, 10,1, 1, 4, 6, 3, 2, 0, 1])\n",
+        "p = p/np.sum(p);\n",
+        "q=  q/np.sum(q);\n",
+        "\n",
+        "# Draw those distributions\n",
+        "fig, ax =plt.subplots(2,1);\n",
+        "x = np.arange(0,p.size,1)\n",
+        "ax[0].bar(x,p, color=\"#cccccc\")\n",
+        "ax[0].set_ylim([0,0.35])\n",
+        "ax[0].set_ylabel(\"p(x=i)\")\n",
+        "\n",
+        "ax[1].bar(x,q,color=\"#f47a60\")\n",
+        "ax[1].set_ylim([0,0.35])\n",
+        "ax[1].set_ylabel(\"q(x=j)\")\n",
+        "plt.show()"
+      ],
+      "metadata": {
+        "id": "ZIfQwhd-AV6L"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO Define the distance matrix from figure 15.8d\n",
+        "# Replace this line\n",
+        "dist_mat = np.zeros((10,10))\n",
+        "\n",
+        "# vectorize the distance matrix\n",
+        "c = dist_mat.flatten()"
+      ],
+      "metadata": {
+        "id": "EZSlZQzWBKTm"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define pretty colormap\n",
+        "my_colormap_vals_hex =('2a0902', '2b0a03', '2c0b04', '2d0c05', '2e0c06', '2f0d07', '300d08', '310e09', '320f0a', '330f0b', '34100b', '35110c', '36110d', '37120e', '38120f', '39130f', '3a1410', '3b1411', '3c1511', '3d1612', '3e1613', '3f1713', '401714', '411814', '421915', '431915', '451a16', '461b16', '471b17', '481c17', '491d18', '4a1d18', '4b1e19', '4c1f19', '4d1f1a', '4e201b', '50211b', '51211c', '52221c', '53231d', '54231d', '55241e', '56251e', '57261f', '58261f', '592720', '5b2821', '5c2821', '5d2922', '5e2a22', '5f2b23', '602b23', '612c24', '622d25', '632e25', '652e26', '662f26', '673027', '683027', '693128', '6a3229', '6b3329', '6c342a', '6d342a', '6f352b', '70362c', '71372c', '72372d', '73382e', '74392e', '753a2f', '763a2f', '773b30', '783c31', '7a3d31', '7b3e32', '7c3e33', '7d3f33', '7e4034', '7f4134', '804235', '814236', '824336', '834437', '854538', '864638', '874739', '88473a', '89483a', '8a493b', '8b4a3c', '8c4b3c', '8d4c3d', '8e4c3e', '8f4d3f', '904e3f', '924f40', '935041', '945141', '955242', '965343', '975343', '985444', '995545', '9a5646', '9b5746', '9c5847', '9d5948', '9e5a49', '9f5a49', 'a05b4a', 'a15c4b', 'a35d4b', 'a45e4c', 'a55f4d', 'a6604e', 'a7614e', 'a8624f', 'a96350', 'aa6451', 'ab6552', 'ac6552', 'ad6653', 'ae6754', 'af6855', 'b06955', 'b16a56', 'b26b57', 'b36c58', 'b46d59', 'b56e59', 'b66f5a', 'b7705b', 'b8715c', 'b9725d', 'ba735d', 'bb745e', 'bc755f', 'bd7660', 'be7761', 'bf7862', 'c07962', 'c17a63', 'c27b64', 'c27c65', 'c37d66', 'c47e67', 'c57f68', 'c68068', 'c78169', 'c8826a', 'c9836b', 'ca846c', 'cb856d', 'cc866e', 'cd876f', 'ce886f', 'ce8970', 'cf8a71', 'd08b72', 'd18c73', 'd28d74', 'd38e75', 'd48f76', 'd59077', 'd59178', 'd69279', 'd7937a', 'd8957b', 'd9967b', 'da977c', 'da987d', 'db997e', 'dc9a7f', 'dd9b80', 'de9c81', 'de9d82', 'df9e83', 'e09f84', 'e1a185', 'e2a286', 'e2a387', 'e3a488', 'e4a589', 'e5a68a', 'e5a78b', 'e6a88c', 'e7aa8d', 'e7ab8e', 'e8ac8f', 'e9ad90', 'eaae91', 'eaaf92', 'ebb093', 'ecb295', 'ecb396', 'edb497', 'eeb598', 'eeb699', 'efb79a', 'efb99b', 'f0ba9c', 'f1bb9d', 'f1bc9e', 'f2bd9f', 'f2bfa1', 'f3c0a2', 'f3c1a3', 'f4c2a4', 'f5c3a5', 'f5c5a6', 'f6c6a7', 'f6c7a8', 'f7c8aa', 'f7c9ab', 'f8cbac', 'f8ccad', 'f8cdae', 'f9ceb0', 'f9d0b1', 'fad1b2', 'fad2b3', 'fbd3b4', 'fbd5b6', 'fbd6b7', 'fcd7b8', 'fcd8b9', 'fcdaba', 'fddbbc', 'fddcbd', 'fddebe', 'fddfbf', 'fee0c1', 'fee1c2', 'fee3c3', 'fee4c5', 'ffe5c6', 'ffe7c7', 'ffe8c9', 'ffe9ca', 'ffebcb', 'ffeccd', 'ffedce', 'ffefcf', 'fff0d1', 'fff2d2', 'fff3d3', 'fff4d5', 'fff6d6', 'fff7d8', 'fff8d9', 'fffada', 'fffbdc', 'fffcdd', 'fffedf', 'ffffe0')\n",
+        "my_colormap_vals_dec = np.array([int(element,base=16) for element in my_colormap_vals_hex])\n",
+        "r = np.floor(my_colormap_vals_dec/(256*256))\n",
+        "g = np.floor((my_colormap_vals_dec - r *256 *256)/256)\n",
+        "b = np.floor(my_colormap_vals_dec - r * 256 *256 - g * 256)\n",
+        "my_colormap = ListedColormap(np.vstack((r,g,b)).transpose()/255.0)\n",
+        "\n",
+        "def draw_2D_heatmap(data, title, my_colormap):\n",
+        "  # Make grid of intercept/slope values to plot\n",
+        "  xv, yv = np.meshgrid(np.linspace(0, 1, 10), np.linspace(0, 1, 10))\n",
+        "  fig,ax = plt.subplots()\n",
+        "  fig.set_size_inches(4,4)\n",
+        "  plt.imshow(data, cmap=my_colormap)\n",
+        "  ax.set_title(title)\n",
+        "  ax.set_xlabel('$q$'); ax.set_ylabel('$p$')\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "ABRANmp6F8iQ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "draw_2D_heatmap(dist_mat,'Distance $|i-j|$', my_colormap)"
+      ],
+      "metadata": {
+        "id": "G0HFPBXyHT6V"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define b to be the verticalconcatenation of p and q\n",
+        "b = np.hstack((p,q))[np.newaxis].transpose()"
+      ],
+      "metadata": {
+        "id": "SfqeT3KlHWrt"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO:  Now construct the matrix A that has the initial distribution constraints\n",
+        "# so that Ap=b where p is the transport plan P vectorized rows first so p = np.flatten(P)\n",
+        "# Replace this line:\n",
+        "A = np.zeros((20,100))\n"
+      ],
+      "metadata": {
+        "id": "7KrybL96IuNW"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now we have all of the things we need.  The vectorized distance matrix $\\mathbf{c}$,  the constraint matrix $\\mathbf{A}$, the vectorized and concatenated original distribution $\\mathbf{b}$.  We can run the linear programming optimization."
+      ],
+      "metadata": {
+        "id": "zEuEtU33S8Ly"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# We don't need the constraint that p>0 as this is the default\n",
+        "opt = linprog(c, A_eq=A, b_eq=b)\n",
+        "print(opt)"
+      ],
+      "metadata": {
+        "id": "wCfsOVbeSmF5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Extract the answer and display"
+      ],
+      "metadata": {
+        "id": "vpkkOOI2agyl"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "P = np.array(opt.x).reshape(10,10)\n",
+        "draw_2D_heatmap(P,'Transport plan $\\mathbf{P}$', my_colormap)"
+      ],
+      "metadata": {
+        "id": "nZGfkrbRV_D0"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Compute the Wasserstein distance\n"
+      ],
+      "metadata": {
+        "id": "ZEiRYRVgalsJ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "was = np.sum(P * dist_mat)\n",
+        "print(\"Wasserstein distance = \", was)"
+      ],
+      "metadata": {
+        "id": "yiQ_8j-Raq3c"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO -- Compute the\n",
+        "\n",
+        "*   Forward KL divergence $D_{KL}[p,q]$ between these distributions\n",
+        "*   Reverse KL divergence $D_{KL}[q,p]$ between these distributions\n",
+        "*  Jensen-Shannon divergence $D_{JS}[p,q]$ between these distributions\n",
+        "\n",
+        "What do you conclude?"
+      ],
+      "metadata": {
+        "id": "zf8yTusua71s"
+      }
+    }
+  ]
+}
--- a/Notebooks/Chap16/16_1_1D_Normalizing_Flows.ipynb
+++ b/Notebooks/Chap16/16_1_1D_Normalizing_Flows.ipynb
@@ -0,0 +1,235 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyMJLViYIpiivB2A7YIuZmzU",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_1_1D_Normalizing_Flows.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 16.1: 1D normalizing flows**\n",
+        "\n",
+        "This notebook investigates a 1D normalizing flows example similar to that illustrated in figures 16.1 to 16.3 in the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "First we start with a base probability density function"
+      ],
+      "metadata": {
+        "id": "IyVn-Gi-p7wf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the base pdf\n",
+        "def gauss_pdf(z, mu, sigma):\n",
+        "  pr_z = np.exp( -0.5 * (z-mu) * (z-mu) / (sigma * sigma))/(np.sqrt(2*3.1413) * sigma)\n",
+        "  return pr_z"
+      ],
+      "metadata": {
+        "id": "ZIfQwhd-AV6L"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "z = np.arange(-3,3,0.01)\n",
+        "pr_z = gauss_pdf(z, 0, 1)\n",
+        "\n",
+        "fig,ax = plt.subplots()\n",
+        "ax.plot(z, pr_z)\n",
+        "ax.set_xlim([-3,3])\n",
+        "ax.set_xlabel('$z$')\n",
+        "ax.set_ylabel('$Pr(z)$')\n",
+        "plt.show();"
+      ],
+      "metadata": {
+        "id": "gGh8RHmFp_Ls"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's define a nonlinear function that maps from the latent space $z$ to the observed data $x$."
+      ],
+      "metadata": {
+        "id": "wVXi5qIfrL9T"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define a function that maps from the base pdf over z to the observed space x\n",
+        "def f(z):\n",
+        "    x1 = 6/(1+np.exp(-(z-0.25)*1.5))-3\n",
+        "    x2 = z\n",
+        "    p = z * z/9\n",
+        "    x = (1-p) * x1 + p * x2\n",
+        "    return x\n",
+        "\n",
+        "# Compute gradient of that function using finite differences\n",
+        "def df_dz(z):\n",
+        "    return (f(z+0.0001)-f(z-0.0001))/0.0002"
+      ],
+      "metadata": {
+        "id": "shHdgZHjp52w"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "x = f(z)\n",
+        "fig, ax = plt.subplots()\n",
+        "ax.plot(z,x)\n",
+        "ax.set_xlim(-3,3)\n",
+        "ax.set_ylim(-3,3)\n",
+        "ax.set_xlabel('Latent variable, $z$')\n",
+        "ax.set_ylabel('Observed variable, $x$')\n",
+        "plt.show()"
+      ],
+      "metadata": {
+        "id": "sz7bnCLUq3Qs"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's evaluate the density in the observed space using equation 16.1"
+      ],
+      "metadata": {
+        "id": "rmI0BbuQyXoc"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# TODO -- plot the density in the observed space\n",
+        "# Replace these line\n",
+        "x = np.ones_like(z)\n",
+        "pr_x = np.ones_like(pr_z)\n"
+      ],
+      "metadata": {
+        "id": "iPdiT_5gyNOD"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Plot the density in the observed space\n",
+        "fig,ax = plt.subplots()\n",
+        "ax.plot(x, pr_x)\n",
+        "ax.set_xlim([-3,3])\n",
+        "ax.set_ylim([0, 0.5])\n",
+        "ax.set_xlabel('$x$')\n",
+        "ax.set_ylabel('$Pr(x)$')\n",
+        "plt.show();"
+      ],
+      "metadata": {
+        "id": "Jlks8MW3zulA"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's draw some samples from the new distribution (see section 16.1)"
+      ],
+      "metadata": {
+        "id": "1c5rO0HHz-FV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "np.random.seed(1)\n",
+        "n_sample = 20\n",
+        "\n",
+        "# TODO -- Draw samples from the modeled density\n",
+        "# Replace this line\n",
+        "x_samples = np.ones((n_sample, 1))\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "LIlTRfpZz2k_"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Draw the samples\n",
+        "fig,ax = plt.subplots()\n",
+        "ax.plot(x, pr_x)\n",
+        "for x_sample in x_samples:\n",
+        "  ax.plot([x_sample, x_sample], [0,0.1], 'r-')\n",
+        "\n",
+        "ax.set_xlim([-3,3])\n",
+        "ax.set_ylim([0, 0.5])\n",
+        "ax.set_xlabel('$x$')\n",
+        "ax.set_ylabel('$Pr(x)$')\n",
+        "plt.show();"
+      ],
+      "metadata": {
+        "id": "JS__QPNv0vUA"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/Notebooks/Chap16/16_2_Autoregressive_Flows.ipynb
+++ b/Notebooks/Chap16/16_2_Autoregressive_Flows.ipynb
@@ -0,0 +1,307 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyMe8jb5kLJqkNSE/AwExTpa",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_2_Autoregressive_Flows.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 16.2: 1D autoregressive flows**\n",
+        "\n",
+        "This notebook investigates a 1D normalizing flows example similar to that illustrated in figure 16.7 in the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "First we'll define an invertible one dimensional function as in figure 16.5"
+      ],
+      "metadata": {
+        "id": "jTK456TUd2FV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# First let's make the 1D piecewise linear mapping as illustated in figure 16.5\n",
+        "def g(h, phi):\n",
+        "  # TODO -- write this function (equation 16.12)\n",
+        "  # Note: If you have the first printing of the book, there is a mistake in equation 16.12\n",
+        "  # Check the errata for the correct equation (or figure it out yourself!)\n",
+        "  # Replace this line:\n",
+        "  h_prime = 1\n",
+        "\n",
+        "\n",
+        "  return h_prime"
+      ],
+      "metadata": {
+        "id": "zceww_9qFi00"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's test this out.  If you managed to vectorize the routine above, then good for you\n",
+        "# but I'll assume you didn't and so we'll use a loop\n",
+        "\n",
+        "# Define the parameters\n",
+        "phi = np.array([0.2, 0.1, 0.4, 0.05, 0.25])\n",
+        "\n",
+        "# Run the function on an array\n",
+        "h = np.arange(0,1,0.01)\n",
+        "h_prime = np.zeros_like(h)\n",
+        "for i in range(len(h)):\n",
+        "  h_prime[i] = g(h[i], phi)\n",
+        "\n",
+        "# Draw the function\n",
+        "fig, ax = plt.subplots()\n",
+        "ax.plot(h,h_prime, 'b-')\n",
+        "ax.set_xlim([0,1])\n",
+        "ax.set_ylim([0,1])\n",
+        "ax.set_xlabel('Input, $h$')\n",
+        "ax.set_ylabel('Output, $h^\\prime$')\n",
+        "plt.show()\n"
+      ],
+      "metadata": {
+        "id": "CLXhYl9ZIuRN"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We will also need the inverse of this function"
+      ],
+      "metadata": {
+        "id": "zOCMYC0leOyZ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the inverse function\n",
+        "def g_inverse(h_prime, phi):\n",
+        "    # Lot's of ways to do this, but we'll just do it by bracketing\n",
+        "    h_low = 0\n",
+        "    h_mid = 0.5\n",
+        "    h_high = 0.999\n",
+        "\n",
+        "    thresh = 0.0001\n",
+        "    c_iter = 0\n",
+        "    while(c_iter < 20 and h_high - h_low > thresh):\n",
+        "        h_prime_low = g(h_low, phi)\n",
+        "        h_prime_mid = g(h_mid, phi)\n",
+        "        h_prime_high = g(h_high, phi)\n",
+        "        if h_prime_mid < h_prime:\n",
+        "          h_low = h_mid\n",
+        "        else:\n",
+        "          h_high = h_mid\n",
+        "\n",
+        "        h_mid = h_low+(h_high-h_low)/2\n",
+        "        c_iter+=1\n",
+        "\n",
+        "    return h_mid"
+      ],
+      "metadata": {
+        "id": "OIqFAgobeSM8"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's define an autogressive flow.  Let's switch to looking at figure 16.7.# We'll assume that our piecewise function will use five parameters phi1,phi2,phi3,phi4,phi5"
+      ],
+      "metadata": {
+        "id": "t8XPxipfd7hz"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "\n",
+        "def ReLU(preactivation):\n",
+        "  activation = preactivation.clip(0.0)\n",
+        "  return activation\n",
+        "\n",
+        "def softmax(x):\n",
+        "  x = np.exp(x) ;\n",
+        "  x = x/ np.sum(x) ;\n",
+        "  return x\n",
+        "\n",
+        "# Return value of phi that doesn't depend on any of the iputs\n",
+        "def get_phi():\n",
+        "  return np.array([0.2, 0.1, 0.4, 0.05, 0.25])\n",
+        "\n",
+        "# Compute values of phi that depend on h1\n",
+        "def shallow_network_phi_h1(h1, n_hidden=10):\n",
+        "  # For neatness of code, we'll just define the parameters in the network definition itself\n",
+        "  n_input = 1\n",
+        "  np.random.seed(n_input)\n",
+        "  beta0 = np.random.normal(size=(n_hidden,1))\n",
+        "  Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
+        "  beta1 = np.random.normal(size=(5,1))\n",
+        "  Omega1 = np.random.normal(size=(5, n_hidden))\n",
+        "  return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1]])))\n",
+        "\n",
+        "# Compute values of phi that depend on h1 and h2\n",
+        "def shallow_network_phi_h1h2(h1,h2,n_hidden=10):\n",
+        "  # For neatness of code, we'll just define the parameters in the network definition itself\n",
+        "  n_input = 2\n",
+        "  np.random.seed(n_input)\n",
+        "  beta0 = np.random.normal(size=(n_hidden,1))\n",
+        "  Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
+        "  beta1 = np.random.normal(size=(5,1))\n",
+        "  Omega1 = np.random.normal(size=(5, n_hidden))\n",
+        "  return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1],[h2]])))\n",
+        "\n",
+        "# Compute values of phi that depend on h1, h2, and h3\n",
+        "def shallow_network_phi_h1h2h3(h1,h2,h3, n_hidden=10):\n",
+        "  # For neatness of code, we'll just define the parameters in the network definition itself\n",
+        "  n_input = 3\n",
+        "  np.random.seed(n_input)\n",
+        "  beta0 = np.random.normal(size=(n_hidden,1))\n",
+        "  Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
+        "  beta1 = np.random.normal(size=(5,1))\n",
+        "  Omega1 = np.random.normal(size=(5, n_hidden))\n",
+        "  return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1],[h2],[h3]])))"
+      ],
+      "metadata": {
+        "id": "PnHGlZtcNEAI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The forward mapping as shown in figure 16.7 a"
+      ],
+      "metadata": {
+        "id": "8fXeG4V44GVH"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def forward_mapping(h1,h2,h3,h4):\n",
+        "  #TODO implement the forward mapping\n",
+        "  #Replace this line:\n",
+        "  h_prime1 = 0 ; h_prime2=0; h_prime3=0; h_prime4 = 0\n",
+        "\n",
+        "  return h_prime1, h_prime2, h_prime3, h_prime4"
+      ],
+      "metadata": {
+        "id": "N1zjnIoX0TRP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The backward mapping as shown in figure 16.7b"
+      ],
+      "metadata": {
+        "id": "H8vQfFwI4L7r"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def backward_mapping(h1_prime,h2_prime,h3_prime,h4_prime):\n",
+        "  #TODO implement the backward mapping\n",
+        "  #Replace this line:\n",
+        "  h1=0; h2=0; h3=0; h4 = 0\n",
+        "\n",
+        "  return h1,h2,h3,h4"
+      ],
+      "metadata": {
+        "id": "HNcQTiVE4DMJ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, let's make sure that the network really can be inverted"
+      ],
+      "metadata": {
+        "id": "W2IxFkuyZJyn"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Test the network to see if it does invert correctly\n",
+        "h1 = 0.22; h2 = 0.41; h3 = 0.83; h4 = 0.53\n",
+        "print(\"Original h values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1,h2,h3,h4))\n",
+        "h1_prime, h2_prime, h3_prime, h4_prime = forward_mapping(h1,h2,h3,h4)\n",
+        "print(\"h_prime values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1_prime,h2_prime,h3_prime,h4_prime))\n",
+        "h1,h2,h3,h4 =  backward_mapping(h1_prime,h2_prime,h3_prime,h4_prime)\n",
+        "print(\"Reconstructed h values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1,h2,h3,h4))"
+      ],
+      "metadata": {
+        "id": "RT7qvEFp700I"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "sDknSPMLZmzh"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/Notebooks/Chap16/16_3_Contraction_Mappings.ipynb
+++ b/Notebooks/Chap16/16_3_Contraction_Mappings.ipynb
@@ -0,0 +1,294 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "authorship_tag": "ABX9TyNeCWINUqqUGKMcxsqPFTAh",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_3_Contraction_Mappings.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# **Notebook 16.3: Contraction mappings**\n",
+        "\n",
+        "This notebook investigates a 1D normalizing flows example similar to that illustrated in figure 16.9 in the book.\n",
+        "\n",
+        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
+        "\n",
+        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
+      ],
+      "metadata": {
+        "id": "t9vk9Elugvmi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "metadata": {
+        "id": "OLComQyvCIJ7"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define a function that is a contraction mapping\n",
+        "def f(z):\n",
+        "    return 0.3 + 0.5 *z + 0.02 * np.sin(z*15)"
+      ],
+      "metadata": {
+        "id": "4Pfz2KSghdVI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def draw_function(f, fixed_point=None):\n",
+        "  z = np.arange(0,1,0.01)\n",
+        "  z_prime = f(z)\n",
+        "\n",
+        "  # Draw this function\n",
+        "  fig, ax = plt.subplots()\n",
+        "  ax.plot(z, z_prime,'c-')\n",
+        "  ax.plot([0,1],[0,1],'k--')\n",
+        "  if fixed_point!=None:\n",
+        "    ax.plot(fixed_point, fixed_point, 'ro')\n",
+        "  ax.set_xlim(0,1)\n",
+        "  ax.set_ylim(0,1)\n",
+        "  ax.set_xlabel('Input, $z$')\n",
+        "  ax.set_ylabel('Output, f$[z]$')\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "zEwCbIx0hpAI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "draw_function(f)"
+      ],
+      "metadata": {
+        "id": "k4e5Yu0fl8bz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's find where $\\mbox{f}[z]=z$ using fixed point iteration"
+      ],
+      "metadata": {
+        "id": "DfgKrpCAjnol"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Takes a function f and a starting point z\n",
+        "def fixed_point_iteration(f, z0):\n",
+        "  # TODO -- write this function\n",
+        "  # Print out the iterations as you go, so you can see the progress\n",
+        "  # Set the maximum number of iterations to 20\n",
+        "  # Replace this line\n",
+        "  z_out = 0.5;\n",
+        "\n",
+        "\n",
+        "\n",
+        "  return z_out"
+      ],
+      "metadata": {
+        "id": "bAOBvZT-j3lv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now let's test that and plot the solution"
+      ],
+      "metadata": {
+        "id": "CAS0lgIomAa0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's test that\n",
+        "z = fixed_point_iteration(f, 0.2)\n",
+        "draw_function(f, z)"
+      ],
+      "metadata": {
+        "id": "EYQZJdNPk8Lg"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's define another function\n",
+        "def f2(z):\n",
+        "    return 0.7 + -0.6 *z + 0.03 * np.sin(z*15)\n",
+        "draw_function(f2)"
+      ],
+      "metadata": {
+        "id": "4DipPiqVlnwJ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's test that\n",
+        "# TODO Before running this code, predict what you think will happen\n",
+        "z = fixed_point_iteration(f2, 0.9)\n",
+        "draw_function(f2, z)"
+      ],
+      "metadata": {
+        "id": "tYOdbWcomdEE"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Let's define another function\n",
+        "# Define a function that is a contraction mapping\n",
+        "def f3(z):\n",
+        "    return -0.2 + 1.5 *z + 0.1 * np.sin(z*15)\n",
+        "draw_function(f3)"
+      ],
+      "metadata": {
+        "id": "Mni37RUpmrIu"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Now let's test that\n",
+        "# TODO Before running this code, predict what you think will happen\n",
+        "z = fixed_point_iteration(f3, 0.7)\n",
+        "draw_function(f3, z)"
+      ],
+      "metadata": {
+        "id": "agt5mfJrnM1O"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, let's invert a problem of the form $y = z+ f[z]$  for a given value of $y$. What is the $z$ that maps to it?"
+      ],
+      "metadata": {
+        "id": "n6GI46-ZoQz6"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def f4(z):\n",
+        "   return -0.3 + 0.5 *z + 0.02 * np.sin(z*15)"
+      ],
+      "metadata": {
+        "id": "dy6r3jr9rjPf"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def fixed_point_iteration_z_plus_f(f, y, z0):\n",
+        "  # TODO -- write this function\n",
+        "  # Replace this line\n",
+        "  z_out = 1\n",
+        "\n",
+        "  return z_out"
+      ],
+      "metadata": {
+        "id": "GMX64Iz0nl-B"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def draw_function2(f, y, fixed_point=None):\n",
+        "  z = np.arange(0,1,0.01)\n",
+        "  z_prime = z+f(z)\n",
+        "\n",
+        "  # Draw this function\n",
+        "  fig, ax = plt.subplots()\n",
+        "  ax.plot(z, z_prime,'c-')\n",
+        "  ax.plot(z, y-f(z),'r-')\n",
+        "  ax.plot([0,1],[0,1],'k--')\n",
+        "  if fixed_point!=None:\n",
+        "    ax.plot(fixed_point, y, 'ro')\n",
+        "  ax.set_xlim(0,1)\n",
+        "  ax.set_ylim(0,1)\n",
+        "  ax.set_xlabel('Input, $z$')\n",
+        "  ax.set_ylabel('Output, z+f$[z]$')\n",
+        "  plt.show()"
+      ],
+      "metadata": {
+        "id": "uXxKHad5qT8Y"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Test this out and draw\n",
+        "y = 0.8\n",
+        "z = fixed_point_iteration_z_plus_f(f4,y,0.2)\n",
+        "draw_function2(f4,y,z)\n",
+        "# If you have done this correctly, the red dot should be\n",
+        "# where the cyan curve has a y value of 0.8"
+      ],
+      "metadata": {
+        "id": "mNEBXC3Aqd_1"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/index.html
+++ b/index.html
@@ -7,7 +7,7 @@ To be published by MIT Press Dec 5th 2023.<br>

 <h2> Download draft PDF </h2>

-<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.1/UnderstandingDeepLearning_26_07_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-07-26. CC-BY-NC-ND license
+<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.3/UnderstandingDeepLearning_01_10_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-10-01. CC-BY-NC-ND license
 <br>
 <img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
 <br>
@@ -74,7 +74,7 @@ To be published by MIT Press Dec 5th 2023.<br>
 	<li>  Appendices - <a href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLAppendixPDF.zip">PDF Figures</a>  / <a href="https://drive.google.com/uc?export=download&id=1k2j7hMN40ISPSg9skFYWFL3oZT7r8v-l"> SVG Figures</a> / <a href="https://docs.google.com/presentation/d/1_2cJHRnsoQQHst0rwZssv-XH4o5SEHks/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">Powerpoint Figures</a>
 </ul>

-Instructions for editing figures / equations can be found <a href="https://drive.google.com/uc?export=download&id=1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/">here</a>.</p>
+Instructions for editing figures / equations can be found <a href="https://drive.google.com/file/d/1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/view?usp=sharing">here</a>.</p>

 <h2>Resources for students</h2>

@@ -116,15 +116,15 @@ Instructions for editing figures / equations can be found <a href="https://drive
 	<li> Notebook 10.3 - 2D convolution: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_3_2D_Convolution.ipynb">ipynb/colab </a>
 	<li> Notebook 10.4 - Downsampling & upsampling: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_4_Downsampling_and_Upsampling.ipynb">ipynb/colab </a>
 	<li> Notebook 10.5 - Convolution for MNIST: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_5_Convolution_For_MNIST.ipynb">ipynb/colab </a>
-	<li> Notebook 11.1 - Shattered gradients: (coming soon)
-	<li> Notebook 11.2 - Residual networks: (coming soon)
-	<li> Notebook 11.3 - Batch normalization: (coming soon)
-	<li> Notebook 12.1 - Self-attention: (coming soon)
-	<li> Notebook 12.2 - Multi-head self-attention: (coming soon)
-	<li> Notebook 12.3 - Tokenization: (coming soon)
+	<li> Notebook 11.1 - Shattered gradients:  <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb">ipynb/colab </a>
+	<li> Notebook 11.2 - Residual networks: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_2_Residual_Networks.ipynb">ipynb/colab </a>
+	<li> Notebook 11.3 - Batch normalization: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_3_Batch_Normalization.ipynb">ipynb/colab </a>
+	<li> Notebook 12.1 - Self-attention: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_1_Self_Attention.ipynb">ipynb/colab </a>
+	<li> Notebook 12.2 - Multi-head self-attention: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb">ipynb/colab </a>
+	<li> Notebook 12.3 - Tokenization: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_3_Tokenization.ipynb">ipynb/colab </a>
 	<li> Notebook 12.4 - Decoding strategies: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_4_Decoding_Strategies.ipynb">ipynb/colab </a>
-	<li> Notebook 13.1 - Encoding graphs: (coming soon)
-	<li> Notebook 13.2 - Graph classification : (coming soon)
+	<li> Notebook 13.1 - Encoding graphs: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_1_Graph_Representation.ipynb">ipynb/colab </a>
+	<li> Notebook 13.2 - Graph classification : <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_2_Graph_Classification.ipynb">ipynb/colab </a>
 	<li> Notebook 13.3 - Neighborhood sampling: (coming soon)
 	<li> Notebook 13.4 - Graph attention: (coming soon)
 	<li> Notebook 15.1 - GAN toy example: (coming soon)
Author	SHA1	Message	Date
udlbook	50d1c5e255	Created using Colaboratory	2023-10-12 18:42:59 +01:00
udlbook	9a3517629a	Created using Colaboratory	2023-10-12 18:37:05 +01:00
udlbook	af77c76435	Created using Colaboratory	2023-10-12 18:07:26 +01:00
udlbook	0515940ace	Created using Colaboratory	2023-10-12 17:39:48 +01:00
udlbook	5d578df07b	Created using Colaboratory	2023-10-12 17:25:40 +01:00
udlbook	d174c9f34c	Created using Colaboratory	2023-10-12 15:15:13 +01:00
udlbook	dbb3d4b666	Created using Colaboratory	2023-10-11 18:22:52 +01:00
udlbook	39ad6413ce	Created using Colaboratory	2023-10-11 18:21:04 +01:00
udlbook	3e51d89714	Created using Colaboratory	2023-10-10 12:01:50 +01:00
udlbook	5680e5d7f7	Created using Colaboratory	2023-10-10 11:52:48 +01:00
udlbook	0a5a97f55d	Created using Colaboratory	2023-10-10 11:51:31 +01:00
udlbook	0dc94ead03	Created using Colaboratory	2023-10-09 16:55:44 +01:00
udlbook	c7e7e731b3	Created using Colaboratory	2023-10-06 16:57:48 +01:00
udlbook	06f197c787	Created using Colaboratory	2023-10-06 15:31:42 +01:00
udlbook	ff29fc34e8	Created using Colaboratory	2023-10-05 09:19:41 +01:00
udlbook	2653116c47	Created using Colaboratory	2023-10-04 18:57:52 +01:00
udlbook	f50de74496	Created using Colaboratory	2023-10-04 17:03:39 +01:00
udlbook	aa5d89adf3	Created using Colaboratory	2023-10-04 12:46:08 +01:00
udlbook	18e827842c	Created using Colaboratory	2023-10-04 08:32:27 +01:00
udlbook	22b6b18660	Created using Colaboratory	2023-10-03 18:52:37 +01:00
udlbook	ed060b6b08	Created using Colaboratory	2023-10-03 17:22:46 +01:00
udlbook	df0132505b	Created using Colaboratory	2023-10-03 08:58:32 +01:00
udlbook	67fb0f5990	Update index.html	2023-10-01 18:22:32 +01:00
udlbook	ecd01f2992	Delete Notesbooks/Chap11 directory	2023-10-01 18:19:38 +01:00
udlbook	70878c94be	Created using Colaboratory	2023-10-01 16:35:03 +01:00
udlbook	832a3c9104	Created using Colaboratory	2023-10-01 16:19:13 +01:00
udlbook	68cd38bf2f	Created using Colaboratory	2023-10-01 14:19:25 +01:00
udlbook	964d26c684	Delete Notebooks/Chap13/13_4_Graph_Representation.ipynb	2023-10-01 14:19:01 +01:00
udlbook	c1d56880a6	Created using Colaboratory	2023-10-01 14:18:25 +01:00
udlbook	0cd321fb96	Created using Colaboratory	2023-10-01 10:01:31 +01:00
udlbook	043969fb79	Created using Colaboratory	2023-09-30 12:43:59 +01:00
udlbook	4052d29b0a	Created using Colaboratory	2023-09-30 11:53:00 +01:00
udlbook	19ee3afd90	Created using Colaboratory	2023-09-29 20:21:45 +01:00
udlbook	91f99c7398	Created using Colaboratory	2023-09-29 19:12:35 +01:00
udlbook	96fe95cea1	Created using Colaboratory	2023-09-29 19:11:28 +01:00
udlbook	fc35dc4168	Created using Colaboratory	2023-09-29 18:40:22 +01:00
udlbook	80497e298d	Created using Colaboratory	2023-09-29 12:51:12 +01:00
udlbook	8c658ac321	Created using Colaboratory	2023-09-29 12:50:35 +01:00
udlbook	4a08fa44bf	Delete Notebooks/Chap_11/11_1_Shattered_Gradients.ipynb	2023-09-29 12:45:08 +01:00
udlbook	c5755efe68	Delete Notebooks/Chap_07/7_1_Backpropagation_in_Toy_Model.ipynb	2023-09-29 12:44:51 +01:00
udlbook	55b425b41b	Delete Notebooks/Chap_01/1_1_BackgroundMathematics.ipynb	2023-09-29 12:44:27 +01:00
udlbook	e000397470	Created using Colaboratory	2023-09-29 12:42:13 +01:00
udlbook	b4d0b49776	Created using Colaboratory	2023-09-26 05:13:40 -05:00
udlbook	f5c1b2af2e	Created using Colaboratory	2023-09-26 04:34:17 -05:00
udlbook	1cf990bfe7	Update index.html	2023-08-07 16:15:45 -05:00
udlbook	ae967f8e7e	Update index.html	2023-08-06 17:51:08 -05:00