Created using Colab

2025-05-22 13:04:04 -04:00 · 2025-05-22 12:12:42 -04:00 · 2025-05-22 12:11:38 -04:00 · 2025-05-16 15:45:18 -04:00 · 2025-05-16 15:39:03 -04:00 · 2025-05-16 15:32:56 -04:00
147 changed files with 39971 additions and 1284 deletions
--- a/.editorconfig
+++ b/.editorconfig
@@ -0,0 +1,10 @@
+root = true
+
+[*.{js,jsx,ts,tsx,md,mdx,json,cjs,mjs,css}]
+indent_style = space
+indent_size = 4
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+max_line_length = 100
--- a/.eslintrc.cjs
+++ b/.eslintrc.cjs
@@ -0,0 +1,18 @@
+module.exports = {
+    root: true,
+    env: { browser: true, es2020: true, node: true },
+    extends: [
+        "eslint:recommended",
+        "plugin:react/recommended",
+        "plugin:react/jsx-runtime",
+        "plugin:react-hooks/recommended",
+    ],
+    ignorePatterns: ["build", ".eslintrc.cjs"],
+    parserOptions: { ecmaVersion: "latest", sourceType: "module" },
+    settings: { react: { version: "18.2" } },
+    plugins: ["react-refresh"],
+    rules: {
+        "react/jsx-no-target-blank": "off",
+        "react-refresh/only-export-components": ["warn", { allowConstantExport: true }],
+    },
+};
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,30 @@
+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+
+# dependencies
+/node_modules
+/.pnp
+.pnp.js
+
+# testing
+/coverage
+
+# production
+/dist
+
+# ENV
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+# IDE
+.idea
+.vscode
+
+# macOS
+.DS_Store
--- a/.prettierignore
+++ b/.prettierignore
@@ -0,0 +1,7 @@
+# ignore these directories when formatting the repo
+/Blogs
+/CM20315
+/CM20315_2023
+/Notebooks
+/PDFFigures
+/Slides
--- a/.prettierrc.cjs
+++ b/.prettierrc.cjs
@@ -0,0 +1,14 @@
+/** @type {import("prettier").Config} */
+const prettierConfig = {
+    trailingComma: "all",
+    tabWidth: 4,
+    useTabs: false,
+    semi: true,
+    singleQuote: false,
+    bracketSpacing: true,
+    printWidth: 100,
+    endOfLine: "lf",
+    plugins: [require.resolve("prettier-plugin-organize-imports")],
+};
+
+module.exports = prettierConfig;
--- a/Blogs/BorealisBayesianFunction.ipynb
+++ b/Blogs/BorealisBayesianFunction.ipynb
--- a/Blogs/BorealisBayesianParameter.ipynb
+++ b/Blogs/BorealisBayesianParameter.ipynb
--- a/Blogs/BorealisGradientFlow.ipynb
+++ b/Blogs/BorealisGradientFlow.ipynb
@@ -31,7 +31,7 @@
      "source": [
        "# Gradient flow\n",
        "\n",
-        "This notebook replicates some of the results in the the Borealis AI [blog](https://www.borealisai.com/research-blogs/gradient-flow/) on gradient flow.  \n"
+        "This notebook replicates some of the results in the Borealis AI [blog](https://www.borealisai.com/research-blogs/gradient-flow/) on gradient flow.  \n"
      ],
      "metadata": {
        "id": "ucrRRJ4dq8_d"
--- a/Blogs/BorealisNTK.ipynb
+++ b/Blogs/BorealisNTK.ipynb
@@ -166,7 +166,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Routines to calculate the empirical and analytical NTK (i.e. the NTK with infinite hidden units) for the the shallow network"
+        "Routines to calculate the empirical and analytical NTK (i.e. the NTK with infinite hidden units) for the shallow network"
      ],
      "metadata": {
        "id": "mxW8E5kYIzlj"
--- a/Blogs/BorealisODENumerical.ipynb
+++ b/Blogs/BorealisODENumerical.ipynb
@@ -0,0 +1,432 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Blogs/BorealisODENumerical.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JXsO7ce7oqeq"
+      },
+      "source": [
+        "# Numerical methods for ODEs\n",
+        "\n",
+        "This blog contains code that accompanies the RBC Borealis blog on numerical methods for ODEs. Contact udlbookmail@gmail.com if you find any problems."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AnvAKtP_oqes"
+      },
+      "source": [
+        "Import relevant libraries"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "UF-gJyZggyrl"
+      },
+      "outputs": [],
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "szWLVrSSoqet"
+      },
+      "source": [
+        "Define the ODE that we will be experimenting with."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "NkrGZLL6iM3P"
+      },
+      "outputs": [],
+      "source": [
+        "# The ODE that we will experiment with\n",
+        "def ode_lin_homog(t,x):\n",
+        "  return 0.5 * x ;\n",
+        "\n",
+        "# The derivative of the ODE function with respect to x (needed for Taylor's method)\n",
+        "def ode_lin_homog_deriv_x(t,x):\n",
+        "  return 0.5 ;\n",
+        "\n",
+        "# The derivative of the ODE function with respect to t (needed for Taylor's method)\n",
+        "def ode_lin_homog_deriv_t(t,x):\n",
+        "  return 0.0 ;\n",
+        "\n",
+        "# The closed form solution (so we can measure the error)\n",
+        "def ode_lin_homog_soln(t,C=0.5):\n",
+        "  return C * np.exp(0.5 * t) ;"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "In1C9wZkoqet"
+      },
+      "source": [
+        "This is a generic method that runs the numerical methods. It takes the initial conditions ($t_0$, $x_0$), the final time $t_1$ and the step size $h$.  It also takes the ODE function itself and its derivatives (only used for Taylor's method).  Finally, the parameter \"step_function\" is the method used to update (e.g., Euler's methods, Runge-Kutte 4-step)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "VZfZDJAfmyrf"
+      },
+      "outputs": [],
+      "source": [
+        "def run_numerical(x_0, t_0, t_1, h, ode_func, ode_func_deriv_x, ode_func_deriv_t, ode_soln, step_function):\n",
+        "  x = [x_0]\n",
+        "  t = [t_0]\n",
+        "  while (t[-1] <= t_1):\n",
+        "    x = x+[step_function(x[-1],t[-1],h, ode_func, ode_func_deriv_x, ode_func_deriv_t)]\n",
+        "    t = t + [t[-1]+h]\n",
+        "\n",
+        "  # Returns x,y plot plus total numerical error at last point.\n",
+        "  return t, x, np.abs(ode_soln(t[-1])-x[-1])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Vfkc3-_7oqet"
+      },
+      "source": [
+        "Run the numerical method with step sizes of 2.0, 1.0, 0.5, 0.25, 0.125, 0.0675 and plot the results"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "1tyGbMZhoqeu"
+      },
+      "outputs": [],
+      "source": [
+        "def run_and_plot(ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function):\n",
+        "    # Specify the grid of points to draw the ODE\n",
+        "    t = np.arange(0.04, 4.0, 0.2)\n",
+        "    x = np.arange(0.04, 4.0, 0.2)\n",
+        "    T, X = np.meshgrid(t,x)\n",
+        "\n",
+        "    # ODE equation at these grid points (used to draw quiver-plot)\n",
+        "    dx = ode(T,X)\n",
+        "    dt = np.ones(dx.shape)\n",
+        "\n",
+        "    # The ground truth solution\n",
+        "    t2= np.arange(0,10,0.1)\n",
+        "    x2 = ode_solution(t2)\n",
+        "\n",
+        "    #####################################x_0, t_0, t_1, h #################################################\n",
+        "    t_sim1,x_sim1,error1 = run_numerical(0.5, 0.0, 4.0, 2.0000, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "    t_sim2,x_sim2,error2 = run_numerical(0.5, 0.0, 4.0, 1.0000, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "    t_sim3,x_sim3,error3 = run_numerical(0.5, 0.0, 4.0, 0.5000, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "    t_sim4,x_sim4,error4 = run_numerical(0.5, 0.0, 4.0, 0.2500, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "    t_sim5,x_sim5,error5 = run_numerical(0.5, 0.0, 4.0, 0.1250, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "    t_sim6,x_sim6,error6 = run_numerical(0.5, 0.0, 4.0, 0.0675, ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "\n",
+        "    # Plot the ODE and ground truth solution\n",
+        "    fig,ax = plt.subplots()\n",
+        "    ax.quiver(T,X,dt,dx, scale=35.0)\n",
+        "    ax.plot(t2,x2,'r-')\n",
+        "\n",
+        "    # Plot the numerical approximations\n",
+        "    ax.plot(t_sim1,x_sim1,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "    ax.plot(t_sim2,x_sim2,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "    ax.plot(t_sim3,x_sim3,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "    ax.plot(t_sim4,x_sim4,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "    ax.plot(t_sim5,x_sim5,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "    ax.plot(t_sim6,x_sim6,'.-',markeredgecolor='#773c23ff',markerfacecolor='#d18362', color='#d18362', markersize=10)\n",
+        "\n",
+        "    ax.set_aspect('equal')\n",
+        "    ax.set_xlim(0,4)\n",
+        "    ax.set_ylim(0,4)\n",
+        "\n",
+        "    plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JYrq8QIwvOIy"
+      },
+      "source": [
+        "# Euler Method\n",
+        "\n",
+        "Define the Euler method and set up functions for plotting."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "N73xMnCukVVX"
+      },
+      "outputs": [],
+      "source": [
+        "def euler_step(x_0, t_0, h, ode_func, ode_func_deriv_x=None, ode_func_deriv_t=None):\n",
+        "  return x_0 + h * ode_func(t_0, x_0) ;"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4B1_PGEcsZ9H"
+      },
+      "outputs": [],
+      "source": [
+        "run_and_plot(ode_lin_homog, None, None, ode_lin_homog_soln, euler_step)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FfwNihtkvJeX"
+      },
+      "source": [
+        "# Heun's Method"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "srHfNDcDxI1o"
+      },
+      "outputs": [],
+      "source": [
+        "def heun_step(x_0, t_0, h, ode_func, ode_func_deriv_x=None, ode_func_deriv_t=None):\n",
+        "  f_x0_t0 = ode_func(t_0, x_0)\n",
+        "  return x_0 + h/2 * ( f_x0_t0 + ode_func(t_0+h, x_0+h*f_x0_t0)) ;"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "WOApHz9xoqev"
+      },
+      "outputs": [],
+      "source": [
+        "run_and_plot(ode_lin_homog, None, None, ode_lin_homog_soln, heun_step)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0XSzzFDIvRhm"
+      },
+      "source": [
+        "# Modified Euler method"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "fSXprgVJ5Yep"
+      },
+      "outputs": [],
+      "source": [
+        "def modified_euler_step(x_0, t_0, h, ode_func, ode_func_deriv_x=None, ode_func_deriv_t=None):\n",
+        "  f_x0_t0 = ode_func(t_0, x_0)\n",
+        "  return x_0 + h * ode_func(t_0+h/2, x_0+ h * f_x0_t0/2) ;"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "8LKSrCD2oqev"
+      },
+      "outputs": [],
+      "source": [
+        "run_and_plot(ode_lin_homog, None, None, ode_lin_homog_soln, modified_euler_step)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "yp8ZBpwooqev"
+      },
+      "source": [
+        "# Second order Taylor's method"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "NtBBgzWLoqev"
+      },
+      "outputs": [],
+      "source": [
+        "def taylor_2nd_order(x_0, t_0, h, ode_func, ode_func_deriv_x, ode_func_deriv_t):\n",
+        "    f1 = ode_func(t_0, x_0)\n",
+        "    return x_0 + h * ode_func(t_0, x_0) + (h*h/2) * (ode_func_deriv_x(t_0,x_0) * ode_func(t_0, x_0) + ode_func_deriv_t(t_0, x_0))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ioeeIohUoqev"
+      },
+      "outputs": [],
+      "source": [
+        "run_and_plot(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, taylor_2nd_order)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "WcuhV5lL1zAJ"
+      },
+      "source": [
+        "# Fourth Order Runge Kutta"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0NZN81Bpwu56"
+      },
+      "outputs": [],
+      "source": [
+        "def runge_kutta_4_step(x_0, t_0, h, ode_func, ode_func_deriv_x=None, ode_func_deriv_t=None):\n",
+        "    f1 = ode_func(t_0, x_0)\n",
+        "    f2 = ode_func(t_0+h/2,x_0+f1 * h/2)\n",
+        "    f3 = ode_func(t_0+h/2,x_0+f2 * h/2)\n",
+        "    f4 = ode_func(t_0+h, x_0+ f3*h)\n",
+        "    return x_0 + (h/6) * (f1 + 2*f2 + 2*f3+f4)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "K-OxE9E6oqew"
+      },
+      "outputs": [],
+      "source": [
+        "run_and_plot(ode_lin_homog, None, None, ode_lin_homog_soln, runge_kutta_4_step)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7JifxBhhoqew"
+      },
+      "source": [
+        "# Plot the error as a function of step size"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ZoEpmlCfsi9P"
+      },
+      "outputs": [],
+      "source": [
+        "# Run systematically with a number of different step sizes and store errors for each\n",
+        "def get_errors(ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function):\n",
+        "    # Choose the step size h to divide the plotting interval into 1,2,4,8... segments.\n",
+        "    # The plots in the article add a few more smaller step sizes, but this takes a while to compute.\n",
+        "    # Add them back in if you want the full plot.\n",
+        "    all_h = (1./np.array([1,2,4,8,16,32,64,128,256,512,1024,2048,4096])).tolist()\n",
+        "    all_err = []\n",
+        "\n",
+        "    for i in range(len(all_h)):\n",
+        "        t_sim,x_sim,err = run_numerical(0.5, 0.0, 4.0, all_h[i], ode, ode_deriv_x, ode_deriv_t, ode_solution, step_function)\n",
+        "        all_err = all_err + [err]\n",
+        "\n",
+        "    return all_h, all_err"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "X0O0KK47xF28"
+      },
+      "outputs": [],
+      "source": [
+        "# Plot the errors\n",
+        "all_h, all_err_euler = get_errors(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, euler_step)\n",
+        "all_h, all_err_heun = get_errors(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, heun_step)\n",
+        "all_h, all_err_mod_euler = get_errors(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, modified_euler_step)\n",
+        "all_h, all_err_taylor = get_errors(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, taylor_2nd_order)\n",
+        "all_h, all_err_rk = get_errors(ode_lin_homog, ode_lin_homog_deriv_x, ode_lin_homog_deriv_t, ode_lin_homog_soln, runge_kutta_4_step)\n",
+        "\n",
+        "\n",
+        "fig, ax = plt.subplots()\n",
+        "ax.loglog(all_h, all_err_euler,'ro-')\n",
+        "ax.loglog(all_h, all_err_heun,'bo-')\n",
+        "ax.loglog(all_h, all_err_mod_euler,'go-')\n",
+        "ax.loglog(all_h, all_err_taylor,'co-')\n",
+        "ax.loglog(all_h, all_err_rk,'mo-')\n",
+        "ax.set_ylim(1e-13,1e1)\n",
+        "ax.set_xlim(1e-6,1e1)\n",
+        "ax.set_aspect(0.5)\n",
+        "ax.set_xlabel('Step size, $h$')\n",
+        "ax.set_ylabel('Error')\n",
+        "plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BttOqpeo9MsJ"
+      },
+      "source": [
+        "Note that for this ODE, the Heun, Modified Euler and Taylor methods provide EXACTLY the same updates, and so the error curves for all three are identical (subject to difference is numerical rounding errors).  This is not in general the case, although the general trend would be the same for each."
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.10"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/Blogs/Borealis_NNGP.ipynb
+++ b/Blogs/Borealis_NNGP.ipynb
--- a/CM20315/CM20315_Coursework_IV.ipynb
+++ b/CM20315/CM20315_Coursework_IV.ipynb
@@ -128,7 +128,7 @@
        "\n",
        "In part (b) of the practical we calculate the volume of a hypersphere of radius 0.5 (i.e., of diameter 1) as a function of the radius.  You will find that the volume decreases to almost nothing in high dimensions.  All of the volume is in the corners of the unit hypercube (which always has volume 1). Double weird.\n",
        "\n",
-        "Note that you you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
+        "Note that you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
      ],
      "metadata": {
        "id": "b2FYKV1SL4Z7"
--- a/CM20315/CM20315_Loss_II.ipynb
+++ b/CM20315/CM20315_Loss_II.ipynb
@@ -199,7 +199,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1.  The black dots show the training data.  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1.  The black dots show the training data.  We'll compute the likelihood and the negative log likelihood."
      ],
      "metadata": {
        "id": "MvVX6tl9AEXF"
--- a/CM20315/CM20315_Loss_III.ipynb
+++ b/CM20315/CM20315_Loss_III.ipynb
@@ -218,7 +218,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue)   The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue)   The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots  We'll compute the likelihood and the negative log likelihood."
      ],
      "metadata": {
        "id": "MvVX6tl9AEXF"
--- a/CM20315_2023/CM20315_Coursework_IV.ipynb
+++ b/CM20315_2023/CM20315_Coursework_IV.ipynb
@@ -128,7 +128,7 @@
        "\n",
        "In part (b) of the practical we calculate the volume of a hypersphere of radius 0.5 (i.e., of diameter 1) as a function of the radius.  You will find that the volume decreases to almost nothing in high dimensions.  All of the volume is in the corners of the unit hypercube (which always has volume 1). Double weird.\n",
        "\n",
-        "Note that you you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
+        "Note that you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
      ],
      "metadata": {
        "id": "b2FYKV1SL4Z7"
--- a/CM20315_2023/CM20315_Coursework_V_2023.ipynb
+++ b/CM20315_2023/CM20315_Coursework_V_2023.ipynb
@@ -214,7 +214,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Compute the derivative of the the loss with respect to the function output f_val\n",
+        "# Compute the derivative of the loss with respect to the function output f_val\n",
        "def dl_df(f_val,y):\n",
        "  # Compute sigmoid of network output\n",
        "  sig_f_val = sig(f_val)\n",
--- a/LICENSE.txt
+++ b/LICENSE.txt
--- a/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb
+++ b/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "s5zzKSOusPOB"
@@ -41,7 +39,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "WV2Dl6owme2d"
@@ -99,7 +96,7 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([0,10]);ax.set_xlim([0,10])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('y')\n",
-        "plt.show\n",
+        "plt.show()\n",
        "\n",
        "# TODO -- experiment with changing the values of beta and omega\n",
        "# to understand what they do.  Try to make a line\n",
@@ -107,7 +104,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "AedfvD9dxShZ"
@@ -192,7 +188,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "i8tLwpls476R"
@@ -236,7 +231,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "fGzVJQ6N-mHJ"
@@ -275,11 +269,10 @@
        "# Compute with vector/matrix form\n",
        "y_vec = beta_vec+np.matmul(omega_mat, x_vec)\n",
        "print(\"Matrix/vector form\")\n",
-        "print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0],y_vec[1])))\n"
+        "print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0][0],y_vec[1][0])))\n"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "3LGRoTMLU8ZU"
@@ -293,7 +286,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "7Y5zdKtKZAB2"
@@ -325,11 +317,10 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([0,100]);ax.set_xlim([-5,5])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('exp[x]')\n",
-        "plt.show"
+        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "XyrT8257IWCu"
@@ -341,11 +332,10 @@
        "2. What is $\\exp[1]$?\n",
        "3. What is $\\exp[-\\infty]$?\n",
        "4. What is $\\exp[+\\infty]$?\n",
-        "5. A function is convex if we can draw a straight line between any two points on the function, and this line always lies above the function. Similarly, a function is concave if a straight line between any two points always lies below the function.  Is the exponential function convex or concave or neither?\n"
+        "5. A function is convex if we can draw a straight line between any two points on the function, and the line lies above the function everywhere between these two points. Similarly, a function is concave if a straight line between any two points lies below the function everywhere between these two points.  Is the exponential function convex or concave or neither?\n"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "R6A4e5IxIWCu"
@@ -373,11 +363,10 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([-5,5]);ax.set_xlim([0,5])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('$\\log[x]$')\n",
-        "plt.show"
+        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "yYWrL5AXIWCv"
@@ -397,8 +386,8 @@
  ],
  "metadata": {
    "colab": {
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
--- a/Notebooks/Chap02/2_1_Supervised_Learning.ipynb
+++ b/Notebooks/Chap02/2_1_Supervised_Learning.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOmndC0N7dFV7W3Mh5ljOLl",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -197,7 +196,7 @@
      "source": [
        "# Visualizing the loss function\n",
        "\n",
-        "The above process is equivalent to to descending coordinate wise on the loss function<br>\n",
+        "The above process is equivalent to descending coordinate wise on the loss function<br>\n",
        "\n",
        "Now let's plot that function"
      ],
@@ -235,8 +234,8 @@
        "levels = 40\n",
        "ax.contour(phi0_mesh, phi1_mesh, all_losses ,levels, colors=['#80808080'])\n",
        "ax.set_ylim([1,-1])\n",
-        "ax.set_xlabel('Intercept, $\\phi_0$')\n",
-        "ax.set_ylabel('Slope, $\\phi_1$')\n",
+        "ax.set_xlabel(r'Intercept, $\\phi_0$')\n",
+        "ax.set_ylabel(r'Slope, $\\phi_1$')\n",
        "\n",
        "# Plot the position of your best fitting line on the loss function\n",
        "# It should be close to the minimum\n",
--- a/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb
+++ b/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb
--- a/Notebooks/Chap03/3_3_Shallow_Network_Regions.ipynb
+++ b/Notebooks/Chap03/3_3_Shallow_Network_Regions.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyNioITtfAcfxEfM3UOfQyb9",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -62,7 +61,7 @@
      "source": [
        "The number of regions $N$ created by a shallow neural network with $D_i$ inputs and $D$ hidden units is given by Zaslavsky's formula:\n",
        "\n",
-        "\\begin{equation}N = \\sum_{j=0}^{D_{i}}\\binom{D}{j}=\\sum_{j=0}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} <br>\n",
+        "\\begin{equation}N = \\sum_{j=0}^{D_{i}}\\binom{D}{j}=\\sum_{j=0}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} \n",
        "\n"
      ],
      "metadata": {
@@ -221,7 +220,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Now let's plot the graph from figure 3.9a (takes ~1min)\n",
+        "# Now let's plot the graph from figure 3.9b (takes ~1min)\n",
        "dims = np.array([1,5,10,50,100])\n",
        "regions = np.zeros((dims.shape[0], 200))\n",
        "params = np.zeros((dims.shape[0], 200))\n",
--- a/Notebooks/Chap04/4_1_Composing_Networks.ipynb
+++ b/Notebooks/Chap04/4_1_Composing_Networks.ipynb
@@ -134,7 +134,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Let's define two networks.  We'll put the prefixes n1_ and n2_ before all the variables to make it clear which network is which.  We'll just consider the inputs and outputs over the range [-1,1].  If you set the \"plot_all\" flat to True,  you can see the details of how they were created."
+        "Let's define two networks.  We'll put the prefixes n1_ and n2_ before all the variables to make it clear which network is which.  We'll just consider the inputs and outputs over the range [-1,1]."
      ],
      "metadata": {
        "id": "LxBJCObC-NTY"
--- a/Notebooks/Chap04/4_2_Clipping_functions.ipynb
+++ b/Notebooks/Chap04/4_2_Clipping_functions.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPkFrjmRAUf0fxN07RC4xMI",
+      "authorship_tag": "ABX9TyPZzptvvf7OPZai8erQ/0xT",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -127,26 +127,26 @@
        "    fig, ax = plt.subplots(3,3)\n",
        "    fig.set_size_inches(8.5, 8.5)\n",
        "    fig.tight_layout(pad=3.0)\n",
-        "    ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel('$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
-        "    ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel('$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
-        "    ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel('$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
-        "    ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(\"$h_{1}^{'}$\")\n",
-        "    ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(\"$h_{2}^{'}$\")\n",
-        "    ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(\"$h_{3}^{'}$\")\n",
-        "    ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(\"$\\phi_1 h_{1}^{'}$\")\n",
-        "    ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(\"$\\phi_2 h_{2}^{'}$\")\n",
-        "    ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(\"$\\phi_3 h_{3}^{'}$\")\n",
+        "    ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel(r'$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
+        "    ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel(r'$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
+        "    ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel(r'$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
+        "    ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(r\"$h_{1}^{'}$\")\n",
+        "    ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(r\"$h_{2}^{'}$\")\n",
+        "    ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(r\"$h_{3}^{'}$\")\n",
+        "    ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(r\"$\\phi_1 h_{1}^{'}$\")\n",
+        "    ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(r\"$\\phi_2 h_{2}^{'}$\")\n",
+        "    ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(r\"$\\phi_3 h_{3}^{'}$\")\n",
        "\n",
        "    for plot_y in range(3):\n",
        "      for plot_x in range(3):\n",
        "        ax[plot_y,plot_x].set_xlim([0,1]);ax[plot_x,plot_y].set_ylim([-1,1])\n",
        "        ax[plot_y,plot_x].set_aspect(0.5)\n",
-        "      ax[2,plot_y].set_xlabel('Input, $x$');\n",
+        "      ax[2,plot_y].set_xlabel(r'Input, $x$');\n",
        "    plt.show()\n",
        "\n",
        "    fig, ax = plt.subplots()\n",
        "    ax.plot(x,y)\n",
-        "    ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
+        "    ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
        "    ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
        "    ax.set_aspect(0.5)\n",
        "    plt.show()"
@@ -169,7 +169,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Define parameters (note first dimension of theta and phi is padded to make indices match\n",
+        "# Define parameters (note first dimension of theta and psi is padded to make indices match\n",
        "# notation in book)\n",
        "theta = np.zeros([4,2])\n",
        "psi = np.zeros([4,4])\n",
--- a/Notebooks/Chap04/4_3_Deep_Networks.ipynb
+++ b/Notebooks/Chap04/4_3_Deep_Networks.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyO2DaD75p+LGi7WgvTzjrk1",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -31,7 +30,7 @@
      "source": [
        "# **Notebook 4.3 Deep neural networks**\n",
        "\n",
-        "This network investigates converting neural networks to matrix form.\n",
+        "This notebook investigates converting neural networks to matrix form.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TODO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
@@ -118,7 +117,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Let's define a network.  We'll just consider the inputs and outputs over the range [-1,1].  If you set the \"plot_all\" flat to True,  you can see the details of how it was created."
+        "Let's define a network.  We'll just consider the inputs and outputs over the range [-1,1]."
      ],
      "metadata": {
        "id": "LxBJCObC-NTY"
@@ -150,7 +149,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now we'll define the same neural network, but this time, we will  use matrix form.  When you get this right, it will draw the same plot as above."
+        "Now we'll define the same neural network, but this time, we will  use matrix form as in equation 4.15.  When you get this right, it will draw the same plot as above."
      ],
      "metadata": {
        "id": "XCJqo_AjfAra"
@@ -176,8 +175,8 @@
        "n1_in_mat = np.reshape(n1_in,(n_dim_in,n_data))\n",
        "\n",
        "# This runs the network for ALL of the inputs, x at once so we can draw graph\n",
-        "h1 = ReLU(np.matmul(beta_0,np.ones((1,n_data))) + np.matmul(Omega_0,n1_in_mat))\n",
-        "n1_out = np.matmul(beta_1,np.ones((1,n_data))) + np.matmul(Omega_1,h1)\n",
+        "h1 = ReLU(beta_0 + np.matmul(Omega_0,n1_in_mat))\n",
+        "n1_out = beta_1 + np.matmul(Omega_1,h1)\n",
        "\n",
        "# Draw the network and check that it looks the same as the non-matrix case\n",
        "plot_neural(n1_in, n1_out)"
@@ -247,9 +246,9 @@
        "n1_in_mat = np.reshape(n1_in,(n_dim_in,n_data))\n",
        "\n",
        "# This runs the network for ALL of the inputs, x at once so we can draw graph (hence extra np.ones term)\n",
-        "h1 = ReLU(np.matmul(beta_0,np.ones((1,n_data))) + np.matmul(Omega_0,n1_in_mat))\n",
-        "h2 = ReLU(np.matmul(beta_1,np.ones((1,n_data))) + np.matmul(Omega_1,h1))\n",
-        "n1_out = np.matmul(beta_2,np.ones((1,n_data))) + np.matmul(Omega_2,h2)\n",
+        "h1 = ReLU(beta_0 + np.matmul(Omega_0,n1_in_mat))\n",
+        "h2 = ReLU(beta_1 + np.matmul(Omega_1,h1))\n",
+        "n1_out = beta_2 + np.matmul(Omega_2,h2)\n",
        "\n",
        "# Draw the network and check that it looks the same as the non-matrix version\n",
        "plot_neural(n1_in, n1_out)"
@@ -291,10 +290,10 @@
        "\n",
        "\n",
        "# If you set the parameters to the correct sizes, the following code will run\n",
-        "h1 = ReLU(np.matmul(beta_0,np.ones((1,n_data))) + np.matmul(Omega_0,x));\n",
-        "h2 = ReLU(np.matmul(beta_1,np.ones((1,n_data))) + np.matmul(Omega_1,h1));\n",
-        "h3 = ReLU(np.matmul(beta_2,np.ones((1,n_data))) + np.matmul(Omega_2,h2));\n",
-        "y = np.matmul(beta_3,np.ones((1,n_data))) + np.matmul(Omega_3,h3)\n",
+        "h1 = ReLU(beta_0 + np.matmul(Omega_0,x));\n",
+        "h2 = ReLU(beta_1 + np.matmul(Omega_1,h1));\n",
+        "h3 = ReLU(beta_2 + np.matmul(Omega_2,h2));\n",
+        "y = beta_3 + np.matmul(Omega_3,h3)\n",
        "\n",
        "if h1.shape[0] is not D_1 or h1.shape[1] is not n_data:\n",
        "  print(\"h1 is wrong shape\")\n",
--- a/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb
+++ b/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb
@@ -118,7 +118,7 @@
        "  ax.plot(x_model,y_model)\n",
        "  if sigma_model is not None:\n",
        "    ax.fill_between(x_model, y_model-2*sigma_model, y_model+2*sigma_model, color='lightgray')\n",
-        "  ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
+        "  ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
        "  ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
        "  ax.set_aspect(0.5)\n",
        "  if title is not None:\n",
@@ -222,7 +222,7 @@
        "gauss_prob = normal_distribution(y_gauss, mu, sigma)\n",
        "fig, ax = plt.subplots()\n",
        "ax.plot(y_gauss, gauss_prob)\n",
-        "ax.set_xlabel('Input, $y$'); ax.set_ylabel('Probability $Pr(y)$')\n",
+        "ax.set_xlabel(r'Input, $y$'); ax.set_ylabel(r'Probability $Pr(y)$')\n",
        "ax.set_xlim([-5,5]);ax.set_ylim([0,1.0])\n",
        "plt.show()\n",
        "\n",
--- a/Notebooks/Chap05/5_2_Binary_Cross_Entropy_Loss.ipynb
+++ b/Notebooks/Chap05/5_2_Binary_Cross_Entropy_Loss.ipynb
@@ -119,12 +119,12 @@
        "  fig.set_size_inches(7.0, 3.5)\n",
        "  fig.tight_layout(pad=3.0)\n",
        "  ax[0].plot(x_model,out_model)\n",
-        "  ax[0].set_xlabel('Input, $x$'); ax[0].set_ylabel('Model output')\n",
+        "  ax[0].set_xlabel(r'Input, $x$'); ax[0].set_ylabel(r'Model output')\n",
        "  ax[0].set_xlim([0,1]);ax[0].set_ylim([-4,4])\n",
        "  if title is not None:\n",
        "    ax[0].set_title(title)\n",
        "  ax[1].plot(x_model,lambda_model)\n",
-        "  ax[1].set_xlabel('Input, $x$'); ax[1].set_ylabel('$\\lambda$ or Pr(y=1|x)')\n",
+        "  ax[1].set_xlabel(r'Input, $x$'); ax[1].set_ylabel(r'$\\lambda$ or Pr(y=1|x)')\n",
        "  ax[1].set_xlim([0,1]);ax[1].set_ylim([-0.05,1.05])\n",
        "  if title is not None:\n",
        "    ax[1].set_title(title)\n",
--- a/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb
+++ b/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb
@@ -211,7 +211,7 @@
        "id": "MvVX6tl9AEXF"
      },
      "source": [
-        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue).  The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots  We'll compute the the likelihood and the negative log likelihood."
+        "The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue).  The dots at the bottom show the training data with the same color scheme.  So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots  We'll compute the likelihood and the negative log likelihood."
      ]
    },
    {
@@ -236,11 +236,10 @@
      },
      "outputs": [],
      "source": [
-        "# Let's double check we get the right answer before proceeding\n",
-        "print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.2,categorical_distribution(np.array([[0]]),np.array([[0.2],[0.5],[0.3]]))))\n",
-        "print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.5,categorical_distribution(np.array([[1]]),np.array([[0.2],[0.5],[0.3]]))))\n",
-        "print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.3,categorical_distribution(np.array([[2]]),np.array([[0.2],[0.5],[0.3]]))))\n",
-        "\n"
+        "# Here are three examples\n",
+        "print(categorical_distribution(np.array([[0]]),np.array([[0.2],[0.5],[0.3]])))\n",
+        "print(categorical_distribution(np.array([[1]]),np.array([[0.2],[0.5],[0.3]])))\n",
+        "print(categorical_distribution(np.array([[2]]),np.array([[0.2],[0.5],[0.3]])))"
      ]
    },
    {
--- a/Notebooks/Chap06/6_1_Line_Search.ipynb
+++ b/Notebooks/Chap06/6_1_Line_Search.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyN4E9Vtuk6t2BhZ0Ajv5SW3",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -67,7 +66,7 @@
        "  fig,ax = plt.subplots()\n",
        "  ax.plot(phi_plot,loss_function(phi_plot),'r-')\n",
        "  ax.set_xlim(0,1); ax.set_ylim(0,1)\n",
-        "  ax.set_xlabel('$\\phi$'); ax.set_ylabel('$L[\\phi]$')\n",
+        "  ax.set_xlabel(r'$\\phi$'); ax.set_ylabel(r'$L[\\phi]$')\n",
        "  if a is not None and b is not None and c is not None and d is not None:\n",
        "      plt.axvspan(a, d, facecolor='k', alpha=0.2)\n",
        "      ax.plot([a,a],[0,1],'b-')\n",
@@ -131,7 +130,8 @@
        "\n",
        "        print('Iter %d, a=%3.3f, b=%3.3f, c=%3.3f, d=%3.3f'%(n_iter, a,b,c,d))\n",
        "\n",
-        "        # Rule #1 If the HEIGHT at point A is less than the HEIGHT at points B, C, and D then halve values of B, C, and D\n",
+        "        # Rule #1 If the HEIGHT at point A is less than the HEIGHT at points B, C, and D then move them to they are half\n",
+        "        # as far from A as they start\n",
        "        # i.e. bring them closer to the original point\n",
        "        # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
        "        if (0):\n",
--- a/Notebooks/Chap06/6_2_Gradient_Descent.ipynb
+++ b/Notebooks/Chap06/6_2_Gradient_Descent.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap06/6_2_Gradient_Descent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "el8l05WQEO46"
@@ -111,7 +109,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "QU5mdGvpTtEG"
@@ -140,7 +137,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "eB5DQvU5hYNx"
@@ -162,7 +158,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "F3trnavPiHpH"
@@ -218,7 +213,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "s9Duf05WqqSC"
@@ -252,7 +246,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "RS1nEcYVuEAM"
@@ -265,7 +258,7 @@
        "\\frac{\\partial L}{\\partial \\phi_{1}}&\\approx & \\frac{L[\\phi_0, \\phi_1+\\delta]-L[\\phi_0, \\phi_1]}{\\delta}\n",
        "\\end{align}\n",
        "\n",
-        "We can't do this when there are many parameters;  for a million parameters, we would have to evaluate the loss function two million times, and usually computing the gradients directly is much more efficient."
+        "We can't do this when there are many parameters;  for a million parameters, we would have to evaluate the loss function one million plus one times, and usually computing the gradients directly is much more efficient."
      ]
    },
    {
@@ -290,7 +283,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "5EIjMM9Fw2eT"
@@ -333,11 +325,11 @@
        "          print('Iter %d, a=%3.3f, b=%3.3f, c=%3.3f, d=%3.3f'%(n_iter, a,b,c,d))\n",
        "          print('a %f, b%f, c%f, d%f'%(lossa,lossb,lossc,lossd))\n",
        "\n",
-        "        # Rule #1 If point A is less than points B, C, and D then halve points B,C, and D\n",
+        "        # Rule #1 If point A is less than points B, C, and D then halve distance from A to points B,C, and D\n",
        "        if np.argmin((lossa,lossb,lossc,lossd))==0:\n",
-        "          b = b/2\n",
-        "          c = c/2\n",
-        "          d = d/2\n",
+        "          b = a+ (b-a)/2\n",
+        "          c = a+ (c-a)/2\n",
+        "          d = a+ (d-a)/2\n",
        "          continue;\n",
        "\n",
        "        # Rule #2 If point b is less than point c then\n",
@@ -412,8 +404,8 @@
  ],
  "metadata": {
    "colab": {
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
--- a/Notebooks/Chap06/6_3_Stochastic_Gradient_Descent.ipynb
+++ b/Notebooks/Chap06/6_3_Stochastic_Gradient_Descent.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap06/6_3_Stochastic_Gradient_Descent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "el8l05WQEO46"
@@ -122,7 +120,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "QU5mdGvpTtEG"
@@ -150,7 +147,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "eB5DQvU5hYNx"
@@ -172,7 +168,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "F3trnavPiHpH"
@@ -228,7 +223,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "s9Duf05WqqSC"
@@ -279,7 +273,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "RS1nEcYVuEAM"
@@ -316,7 +309,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "5EIjMM9Fw2eT"
@@ -359,11 +351,11 @@
        "          print('Iter %d, a=%3.3f, b=%3.3f, c=%3.3f, d=%3.3f'%(n_iter, a,b,c,d))\n",
        "          print('a %f, b%f, c%f, d%f'%(lossa,lossb,lossc,lossd))\n",
        "\n",
-        "        # Rule #1 If point A is less than points B, C, and D then halve points B,C, and D\n",
+        "        # Rule #1 If point A is less than points B, C, and D then change B,C,D so they are half their current distance from A\n",
        "        if np.argmin((lossa,lossb,lossc,lossd))==0:\n",
-        "          b = b/2\n",
-        "          c = c/2\n",
-        "          d = d/2\n",
+        "          b = a+ (b-a)/2\n",
+        "          c = a+ (c-a)/2\n",
+        "          d = a+ (d-a)/2\n",
        "          continue;\n",
        "\n",
        "        # Rule #2 If point b is less than point c then\n",
@@ -577,9 +569,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyNk5FN4qlw3pk8BwDVWw1jN",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
--- a/Notebooks/Chap06/6_5_Adam.ipynb
+++ b/Notebooks/Chap06/6_5_Adam.ipynb
@@ -108,8 +108,8 @@
        "    ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
        "    ax.plot(opt_path[0,:], opt_path[1,:],'-', color='#a0d9d3ff')\n",
        "    ax.plot(opt_path[0,:], opt_path[1,:],'.', color='#a0d9d3ff',markersize=10)\n",
-        "    ax.set_xlabel(\"$\\phi_{0}$\")\n",
-        "    ax.set_ylabel(\"$\\phi_{1}$\")\n",
+        "    ax.set_xlabel(r\"$\\phi_{0}$\")\n",
+        "    ax.set_ylabel(r\"$\\phi_{1}$\")\n",
        "    plt.show()"
      ],
      "metadata": {
@@ -221,7 +221,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "This moves towards the minimum at a sensible speed, but we never actually converge -- the solution just bounces back and forth between the last two points.  To make it converge, we add momentum to both the estimates of the gradient and the pointwise squared gradient.  We also modify the statistics by a factor that depends on the time to make sure the progress is now slow to start with."
+        "This moves towards the minimum at a sensible speed, but we never actually converge -- the solution just bounces back and forth between the last two points.  To make it converge, we add momentum to both the estimates of the gradient and the pointwise squared gradient.  We also modify the statistics by a factor that depends on the time to make sure the progress is not slow to start with."
      ],
      "metadata": {
        "id": "_6KoKBJdGGI4"
--- a/Notebooks/Chap07/7_1_Backpropagation_in_Toy_Model.ipynb
+++ b/Notebooks/Chap07/7_1_Backpropagation_in_Toy_Model.ipynb
@@ -279,7 +279,7 @@
            "f2: true value = 7.137, your value = 0.000\n",
            "h3: true value = 0.657, your value = 0.000\n",
            "f3: true value = 2.372, your value = 0.000\n",
-            "like original = 0.139, like from forward pass = 0.000\n"
+            "l_i original = 0.139, l_i from forward pass = 0.000\n"
          ]
        }
      ],
@@ -292,7 +292,7 @@
        "print(\"f2: true value = %3.3f, your value = %3.3f\"%(7.137, f2))\n",
        "print(\"h3: true value = %3.3f, your value = %3.3f\"%(0.657, h3))\n",
        "print(\"f3: true value = %3.3f, your value = %3.3f\"%(2.372, f3))\n",
-        "print(\"like original = %3.3f, like from forward pass = %3.3f\"%(l_i_func, l_i))\n"
+        "print(\"l_i original = %3.3f, l_i from forward pass = %3.3f\"%(l_i_func, l_i))\n"
      ]
    },
    {
--- a/Notebooks/Chap07/7_2_Backpropagation.ipynb
+++ b/Notebooks/Chap07/7_2_Backpropagation.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyM2kkHLr00J4Jeypw41sTkQ",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -68,7 +67,7 @@
        "# Set seed so we always get the same random numbers\n",
        "np.random.seed(0)\n",
        "\n",
-        "# Number of layers\n",
+        "# Number of hidden layers\n",
        "K = 5\n",
        "# Number of neurons per layer\n",
        "D = 6\n",
@@ -115,9 +114,9 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now let's run our random network.  The weight matrices $\\boldsymbol\\Omega_{1\\ldots K}$ are the entries of the list \"all_weights\" and the biases $\\boldsymbol\\beta_{1\\ldots k}$ are the entries of the list \"all_biases\"\n",
+        "Now let's run our random network.  The weight matrices $\\boldsymbol\\Omega_{0\\ldots K}$ are the entries of the list \"all_weights\" and the biases $\\boldsymbol\\beta_{0\\ldots K}$ are the entries of the list \"all_biases\"\n",
        "\n",
-        "We know that we will need the activations $\\mathbf{f}_{0\\ldots K}$ and the activations $\\mathbf{h}_{1\\ldots K}$ for the forward pass of backpropagation, so we'll store and return these as well.\n"
+        "We know that we will need the preactivations $\\mathbf{f}_{0\\ldots K}$ and the activations $\\mathbf{h}_{1\\ldots K}$ for the forward pass of backpropagation, so we'll store and return these as well.\n"
      ],
      "metadata": {
        "id": "5irtyxnLJSGX"
@@ -132,7 +131,7 @@
        "  K = len(all_weights) -1\n",
        "\n",
        "  # We'll store the pre-activations at each layer in a list \"all_f\"\n",
-        "  # and the activations in a second list[all_h].\n",
+        "  # and the activations in a second list \"all_h\".\n",
        "  all_f = [None] * (K+1)\n",
        "  all_h = [None] * (K+1)\n",
        "\n",
@@ -142,8 +141,8 @@
        "\n",
        "  # Run through the layers, calculating all_f[0...K-1] and all_h[1...K]\n",
        "  for layer in range(K):\n",
-        "      # Update preactivations and activations at this layer according to eqn 7.16\n",
-        "      # Remmember to use np.matmul for matrrix multiplications\n",
+        "      # Update preactivations and activations at this layer according to eqn 7.17\n",
+        "      # Remember to use np.matmul for matrix multiplications\n",
        "      # TODO -- Replace the lines below\n",
        "      all_f[layer] = all_h[layer]\n",
        "      all_h[layer+1] = all_f[layer]\n",
@@ -166,7 +165,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Define in input\n",
+        "# Define input\n",
        "net_input = np.ones((D_i,1)) * 1.2\n",
        "# Compute network output\n",
        "net_output, all_f, all_h = compute_network_output(net_input,all_weights, all_biases)\n",
@@ -230,8 +229,8 @@
        "# We'll need the indicator function\n",
        "def indicator_function(x):\n",
        "  x_in = np.array(x)\n",
-        "  x_in[x_in>=0] = 1\n",
-        "  x_in[x_in<0] = 0\n",
+        "  x_in[x_in>0] = 1\n",
+        "  x_in[x_in<=0] = 0\n",
        "  return x_in\n",
        "\n",
        "# Main backward pass routine\n",
@@ -249,23 +248,23 @@
        "\n",
        "  # Now work backwards through the network\n",
        "  for layer in range(K,-1,-1):\n",
-        "    # TODO Calculate the derivatives of the loss with respect to the biases at layer this from all_dl_df[layer]. (eq 7.21)\n",
+        "    # TODO Calculate the derivatives of the loss with respect to the biases at layer from all_dl_df[layer]. (eq 7.22)\n",
        "    # NOTE!  To take a copy of matrix X, use Z=np.array(X)\n",
        "    # REPLACE THIS LINE\n",
        "    all_dl_dbiases[layer] = np.zeros_like(all_biases[layer])\n",
        "\n",
-        "    # TODO Calculate the derivatives of the loss with respect to the weights at layer from all_dl_df[layer] and all_h[layer] (eq 7.22)\n",
+        "    # TODO Calculate the derivatives of the loss with respect to the weights at layer from all_dl_df[layer] and all_h[layer] (eq 7.23)\n",
        "    # Don't forget to use np.matmul\n",
        "    # REPLACE THIS LINE\n",
        "    all_dl_dweights[layer] = np.zeros_like(all_weights[layer])\n",
        "\n",
-        "    # TODO: calculate the derivatives of the loss with respect to the activations from weight and derivatives of next preactivations (second part of last line of eq 7.24)\n",
+        "    # TODO: calculate the derivatives of the loss with respect to the activations from weight and derivatives of next preactivations (second part of last line of eq 7.25)\n",
        "    # REPLACE THIS LINE\n",
        "    all_dl_dh[layer] = np.zeros_like(all_h[layer])\n",
        "\n",
        "\n",
        "    if layer > 0:\n",
-        "      # TODO Calculate the derivatives of the loss with respect to the pre-activation f (use deriv of ReLu function, first part of last line of eq. 7.24)\n",
+        "      # TODO Calculate the derivatives of the loss with respect to the pre-activation f (use derivative of ReLu function, first part of last line of eq. 7.25)\n",
        "      # REPLACE THIS LINE\n",
        "      all_dl_df[layer-1] = np.zeros_like(all_f[layer-1])\n",
        "\n",
@@ -300,7 +299,7 @@
        "delta_fd = 0.000001\n",
        "\n",
        "# Test the dervatives of the bias vectors\n",
-        "for layer in range(K):\n",
+        "for layer in range(K+1):\n",
        "  dl_dbias  = np.zeros_like(all_dl_dbiases[layer])\n",
        "  # For every element in the bias\n",
        "  for row in range(all_biases[layer].shape[0]):\n",
@@ -324,7 +323,7 @@
        "\n",
        "\n",
        "# Test the derivatives of the weights matrices\n",
-        "for layer in range(K):\n",
+        "for layer in range(K+1):\n",
        "  dl_dweight  = np.zeros_like(all_dl_dweights[layer])\n",
        "  # For every element in the bias\n",
        "  for row in range(all_weights[layer].shape[0]):\n",
--- a/Notebooks/Chap07/7_3_Initialization.ipynb
+++ b/Notebooks/Chap07/7_3_Initialization.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyNHLXFpiSnUzAbzhtOk+bxu",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -120,7 +119,7 @@
        "  K = len(all_weights)-1\n",
        "\n",
        "  # We'll store the pre-activations at each layer in a list \"all_f\"\n",
-        "  # and the activations in a second list[all_h].\n",
+        "  # and the activations in a second list \"all_h\".\n",
        "  all_f = [None] * (K+1)\n",
        "  all_h = [None] * (K+1)\n",
        "\n",
@@ -151,7 +150,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Now let's investigate how this the size of the outputs vary as we change the initialization variance:\n"
+        "Now let's investigate how the size of the outputs vary as we change the initialization variance:\n"
      ],
      "metadata": {
        "id": "bIUrcXnOqChl"
@@ -177,7 +176,7 @@
        "data_in = np.random.normal(size=(1,n_data))\n",
        "net_output, all_f, all_h = compute_network_output(data_in, all_weights, all_biases)\n",
        "\n",
-        "for layer in range(K):\n",
+        "for layer in range(1,K+1):\n",
        "  print(\"Layer %d, std of hidden units = %3.3f\"%(layer, np.std(all_h[layer])))"
      ],
      "metadata": {
@@ -196,7 +195,7 @@
        "# Change this to 50 layers with 80 hidden units per layer\n",
        "\n",
        "# TODO\n",
-        "# Now experiment with sigma_sq_omega to try to stop the variance of the forward computation explode"
+        "# Now experiment with sigma_sq_omega to try to stop the variance of the forward computation exploding"
      ],
      "metadata": {
        "id": "VL_SO4tar3DC"
@@ -249,6 +248,9 @@
        "\n",
        "# Main backward pass routine\n",
        "def backward_pass(all_weights, all_biases, all_f, all_h, y):\n",
+        "  # Retrieve number of layers\n",
+        "  K = len(all_weights) - 1\n",
+        "\n",
        "  # We'll store the derivatives dl_dweights and dl_dbiases in lists as well\n",
        "  all_dl_dweights = [None] * (K+1)\n",
        "  all_dl_dbiases = [None] * (K+1)\n",
@@ -335,8 +337,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# You can see that the values of the hidden units are increasing on average (the variance is across all hidden units at the layer\n",
-        "# and the 1000 training examples\n",
+        "# You can see that the gradients of the hidden units are increasing on average (the standard deviation is across all hidden units at the layer\n",
+        "# and the 100 training examples\n",
        "\n",
        "# TODO\n",
        "# Change this to 50 layers with 80 hidden units per layer\n",
--- a/Notebooks/Chap08/8_1_MNIST_1D_Performance.ipynb
+++ b/Notebooks/Chap08/8_1_MNIST_1D_Performance.ipynb
@@ -1,28 +1,10 @@
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "provenance": [],
-      "gpuType": "T4",
-      "authorship_tag": "ABX9TyOuKMUcKfOIhIL2qTX9jJCy",
-      "include_colab_link": true
-    },
-    "kernelspec": {
-      "name": "python3",
-      "display_name": "Python 3"
-    },
-    "language_info": {
-      "name": "python"
-    },
-    "accelerator": "GPU"
-  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "view-in-github",
-        "colab_type": "text"
+        "colab_type": "text",
+        "id": "view-in-github"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap08/8_1_MNIST_1D_Performance.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
@@ -30,6 +12,9 @@
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "L6chybAVFJW2"
+      },
      "source": [
        "# **Notebook 8.1: MNIST_1D_Performance**\n",
        "\n",
@@ -38,25 +23,27 @@
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TODO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
-      ],
-      "metadata": {
-        "id": "L6chybAVFJW2"
-      }
+      ]
    },
    {
      "cell_type": "code",
-      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
-      ],
+      "execution_count": null,
      "metadata": {
        "id": "ifVjS4cTOqKz"
      },
-      "execution_count": null,
-      "outputs": []
+      "outputs": [],
+      "source": [
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "%pip install git+https://github.com/greydanus/mnist1d"
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "qyE7G1StPIqO"
+      },
+      "outputs": [],
      "source": [
        "import torch, torch.nn as nn\n",
        "from torch.utils.data import TensorDataset, DataLoader\n",
@@ -64,42 +51,42 @@
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import mnist1d"
-      ],
-      "metadata": {
-        "id": "qyE7G1StPIqO"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
-      "source": [
-        "Let's generate a training and test dataset using the MNIST1D code.  The dataset gets saved as a .pkl file so it doesn't have to be regenerated each time."
-      ],
      "metadata": {
        "id": "F7LNq72SP6jO"
-      }
+      },
+      "source": [
+        "Let's generate a training and test dataset using the MNIST1D code.  The dataset gets saved as a .pkl file so it doesn't have to be regenerated each time."
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YLxf7dJfPaqw"
+      },
+      "outputs": [],
      "source": [
        "args = mnist1d.data.get_dataset_args()\n",
-        "data = mnist1d.data.get_dataset(args, path='./sample_data/mnist1d_data.pkl', download=False, regenerate=False)\n",
+        "data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
        "\n",
        "# The training and test input and outputs are in\n",
        "# data['x'], data['y'], data['x_test'], and data['y_test']\n",
        "print(\"Examples in training set: {}\".format(len(data['y'])))\n",
        "print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
        "print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
-      ],
-      "metadata": {
-        "id": "YLxf7dJfPaqw"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "FxaB5vc0uevl"
+      },
+      "outputs": [],
      "source": [
        "D_i = 40    # Input dimensions\n",
        "D_k = 100   # Hidden dimensions\n",
@@ -120,15 +107,15 @@
        "\n",
        "# Call the function you just defined\n",
        "model.apply(weights_init)\n"
-      ],
-      "metadata": {
-        "id": "FxaB5vc0uevl"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_rX6N3VyyQTY"
+      },
+      "outputs": [],
      "source": [
        "# choose cross entropy loss function (equation 5.24)\n",
        "loss_function = torch.nn.CrossEntropyLoss()\n",
@@ -136,11 +123,10 @@
        "optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
        "# object that decreases learning rate by half every 10 epochs\n",
        "scheduler = StepLR(optimizer, step_size=10, gamma=0.5)\n",
-        "# create 100 dummy data points and store in data loader class\n",
        "x_train = torch.tensor(data['x'].astype('float32'))\n",
-        "y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
+        "y_train = torch.tensor(data['y'].transpose().astype('int64'))\n",
        "x_test= torch.tensor(data['x_test'].astype('float32'))\n",
-        "y_test = torch.tensor(data['y_test'].astype('long'))\n",
+        "y_test = torch.tensor(data['y_test'].astype('int64'))\n",
        "\n",
        "# load the data into a class that creates the batches\n",
        "data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
@@ -185,15 +171,15 @@
        "\n",
        "  # tell scheduler to consider updating learning rate\n",
        "  scheduler.step()"
-      ],
-      "metadata": {
-        "id": "_rX6N3VyyQTY"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "yI-l6kA_EH9G"
+      },
+      "outputs": [],
      "source": [
        "# Plot the results\n",
        "fig, ax = plt.subplots()\n",
@@ -214,25 +200,38 @@
        "ax.set_title('Train loss %3.2f, Test loss %3.2f'%(losses_train[-1],losses_test[-1]))\n",
        "ax.legend()\n",
        "plt.show()"
-      ],
-      "metadata": {
-        "id": "yI-l6kA_EH9G"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "q-yT6re6GZS4"
+      },
      "source": [
        "**TODO**\n",
        "\n",
        "Play with the model -- try changing the number of layers, hidden units, learning rate, batch size, momentum or anything else you like.  See if you can improve the test results.\n",
        "\n",
        "Is it a good idea to optimize the hyperparameters in this way?  Will the final result be a good estimate of the true test performance?"
-      ],
-      "metadata": {
-        "id": "q-yT6re6GZS4"
-      }
-    }
      ]
    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "authorship_tag": "ABX9TyOuKMUcKfOIhIL2qTX9jJCy",
+      "gpuType": "T4",
+      "include_colab_link": true,
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/Notebooks/Chap08/8_2_Bias_Variance_Trade_Off.ipynb
+++ b/Notebooks/Chap08/8_2_Bias_Variance_Trade_Off.ipynb
@@ -92,7 +92,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Draw the fitted function, together win uncertainty used to generate points\n",
+        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
@@ -203,7 +203,7 @@
        "# Closed form solution\n",
        "beta, omega = fit_model_closed_form(x_data,y_data,n_hidden=3)\n",
        "\n",
-        "# Get prediction for model across graph grange\n",
+        "# Get prediction for model across graph range\n",
        "x_model = np.linspace(0,1,100);\n",
        "y_model = network(x_model, beta, omega)\n",
        "\n",
@@ -302,7 +302,7 @@
        "sigma_func = 0.3\n",
        "n_hidden = 5\n",
        "\n",
-        "# Set random seed so that get same result every time\n",
+        "# Set random seed so that we get the same result every time\n",
        "np.random.seed(1)\n",
        "\n",
        "for c_hidden in range(len(hidden_variables)):\n",
--- a/Notebooks/Chap08/8_3_Double_Descent.ipynb
+++ b/Notebooks/Chap08/8_3_Double_Descent.ipynb
@@ -5,7 +5,6 @@
    "colab": {
      "provenance": [],
      "gpuType": "T4",
-      "authorship_tag": "ABX9TyN/KUpEObCKnHZ/4Onp5sHG",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -48,8 +47,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "fn9BP5N5TguP"
@@ -100,7 +99,7 @@
        "# data['x'], data['y'], data['x_test'], and data['y_test']\n",
        "print(\"Examples in training set: {}\".format(len(data['y'])))\n",
        "print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
-        "print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
+        "print(\"Dimensionality of each example: {}\".format(data['x'].shape[-1]))"
      ],
      "metadata": {
        "id": "PW2gyXL5UkLU"
@@ -124,7 +123,7 @@
        "  D_k = n_hidden   # Hidden dimensions\n",
        "  D_o = 10    # Output dimensions\n",
        "\n",
-        "  # Define a model with two hidden layers of size 100\n",
+        "  # Define a model with two hidden layers\n",
        "  # And ReLU activations between them\n",
        "  model = nn.Sequential(\n",
        "  nn.Linear(D_i, D_k),\n",
@@ -148,7 +147,7 @@
    {
      "cell_type": "code",
      "source": [
-        "def fit_model(model, data):\n",
+        "def fit_model(model, data, n_epoch):\n",
        "\n",
        "  # choose cross entropy loss function (equation 5.24)\n",
        "  loss_function = torch.nn.CrossEntropyLoss()\n",
@@ -157,7 +156,6 @@
        "  optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)\n",
        "\n",
        "\n",
-        "  # create 100 dummy data points and store in data loader class\n",
        "  x_train = torch.tensor(data['x'].astype('float32'))\n",
        "  y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
        "  x_test= torch.tensor(data['x_test'].astype('float32'))\n",
@@ -166,9 +164,6 @@
        "  # load the data into a class that creates the batches\n",
        "  data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
        "\n",
-        "  # loop over the dataset n_epoch times\n",
-        "  n_epoch = 1000\n",
-        "\n",
        "  for epoch in range(n_epoch):\n",
        "    # loop over batches\n",
        "    for i, batch in enumerate(data_loader):\n",
@@ -205,6 +200,18 @@
      "execution_count": null,
      "outputs": []
    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def count_parameters(model):\n",
+        "    return sum(p.numel() for p in model.parameters() if p.requires_grad)"
+      ],
+      "metadata": {
+        "id": "AQNCmFNV6JpV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
    {
      "cell_type": "markdown",
      "source": [
@@ -228,19 +235,27 @@
        "# This code will take a while (~30 mins on GPU) to run!  Go and make a cup of coffee!\n",
        "\n",
        "hidden_variables = np.array([2,4,6,8,10,14,18,22,26,30,35,40,45,50,55,60,70,80,90,100,120,140,160,180,200,250,300,400]) ;\n",
+        "\n",
        "errors_train_all = np.zeros_like(hidden_variables)\n",
        "errors_test_all = np.zeros_like(hidden_variables)\n",
+        "total_weights_all = np.zeros_like(hidden_variables)\n",
+        "\n",
+        "# loop over the dataset n_epoch times\n",
+        "n_epoch = 1000\n",
        "\n",
        "# For each hidden variable size\n",
        "for c_hidden in range(len(hidden_variables)):\n",
        "    print(f'Training model with {hidden_variables[c_hidden]:3d} hidden variables')\n",
        "    # Get a model\n",
        "    model = get_model(hidden_variables[c_hidden]) ;\n",
+        "    # Count and store number of weights\n",
+        "    total_weights_all[c_hidden] = count_parameters(model)\n",
        "    # Train the model\n",
-        "    errors_train, errors_test = fit_model(model, data)\n",
+        "    errors_train, errors_test = fit_model(model, data, n_epoch)\n",
        "    # Store the results\n",
        "    errors_train_all[c_hidden] = errors_train\n",
-        "    errors_test_all[c_hidden]= errors_test"
+        "    errors_test_all[c_hidden]= errors_test\n",
+        "\n"
      ],
      "metadata": {
        "id": "K4OmBZGHWXpk"
@@ -251,12 +266,29 @@
    {
      "cell_type": "code",
      "source": [
+        "import matplotlib.pyplot as plt\n",
+        "import numpy as np\n",
+        "\n",
+        "# Assuming data['y'] is available and contains the training examples\n",
+        "num_training_examples = len(data['y'])\n",
+        "\n",
+        "# Find the index where total_weights_all is closest to num_training_examples\n",
+        "closest_index = np.argmin(np.abs(np.array(total_weights_all) - num_training_examples))\n",
+        "\n",
+        "# Get the corresponding value of hidden variables\n",
+        "hidden_variable_at_num_training_examples = hidden_variables[closest_index]\n",
+        "\n",
        "# Plot the results\n",
        "fig, ax = plt.subplots()\n",
        "ax.plot(hidden_variables, errors_train_all, 'r-', label='train')\n",
        "ax.plot(hidden_variables, errors_test_all, 'b-', label='test')\n",
-        "ax.set_ylim(0,100);\n",
-        "ax.set_xlabel('No hidden variables'); ax.set_ylabel('Error')\n",
+        "\n",
+        "# Add a vertical line at the point where total weights equal the number of training examples\n",
+        "ax.axvline(x=hidden_variable_at_num_training_examples, color='g', linestyle='--', label='N(weights) = N(train)')\n",
+        "\n",
+        "ax.set_ylim(0, 100)\n",
+        "ax.set_xlabel('No. hidden variables')\n",
+        "ax.set_ylabel('Error')\n",
        "ax.legend()\n",
        "plt.show()\n"
      ],
@@ -265,6 +297,24 @@
      },
      "execution_count": null,
      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "KT4X8_hE5NFb"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "iGKZSfVF2r4z"
+      },
+      "execution_count": null,
+      "outputs": []
    }
  ]
 }
--- a/Notebooks/Chap08/8_4_High_Dimensional_Spaces.ipynb
+++ b/Notebooks/Chap08/8_4_High_Dimensional_Spaces.ipynb
@@ -134,7 +134,7 @@
      "source": [
        "# Volume of a hypersphere\n",
        "\n",
-        "In the second part of this notebook we calculate the volume of a hypersphere of radius 0.5 (i.e., of diameter 1) as a function of the radius.  Note that you you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
+        "In the second part of this notebook we calculate the volume of a hypersphere of radius 0.5 (i.e., of diameter 1) as a function of the radius.  Note that you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
      ],
      "metadata": {
        "id": "b2FYKV1SL4Z7"
@@ -224,7 +224,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You should see see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
+        "You should see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
        "\n",
        "The conclusion of all of this is that in high dimensions you should be sceptical of your intuitions about how things work.  I have tried to visualize many things in one or two dimensions in the book, but you should also be sceptical about these visualizations!"
      ],
--- a/Notebooks/Chap09/9_1_L2_Regularization.ipynb
+++ b/Notebooks/Chap09/9_1_L2_Regularization.ipynb
@@ -178,7 +178,7 @@
        "\n",
        "def draw_loss_function(compute_loss, data,  model, my_colormap, phi_iters = None):\n",
        "\n",
-        "  # Make grid of intercept/slope values to plot\n",
+        "  # Make grid of offset/frequency values to plot\n",
        "  offsets_mesh, freqs_mesh = np.meshgrid(np.arange(-10,10.0,0.1), np.arange(2.5,22.5,0.1))\n",
        "  loss_mesh = np.zeros_like(freqs_mesh)\n",
        "  # Compute loss for every set of parameters\n",
@@ -304,7 +304,7 @@
        "for c_step in range (n_steps):\n",
        "  # Do gradient descent step\n",
        "  phi_all[:,c_step+1:c_step+2] = gradient_descent_step(phi_all[:,c_step:c_step+1],data, model)\n",
-        "  # Measure loss and draw model every 4th step\n",
+        "  # Measure loss and draw model every 8th step\n",
        "  if c_step % 8 == 0:\n",
        "    loss =  compute_loss(data[0,:], data[1,:], model, phi_all[:,c_step+1:c_step+2])\n",
        "    draw_model(data,model,phi_all[:,c_step+1], \"Iteration %d, loss = %f\"%(c_step+1,loss))\n",
@@ -369,7 +369,7 @@
        "# Code to draw the regularization function\n",
        "def draw_reg_function():\n",
        "\n",
-        "  # Make grid of intercept/slope values to plot\n",
+        "  # Make grid of offset/frequency values to plot\n",
        "  offsets_mesh, freqs_mesh = np.meshgrid(np.arange(-10,10.0,0.1), np.arange(2.5,22.5,0.1))\n",
        "  loss_mesh = np.zeros_like(freqs_mesh)\n",
        "  # Compute loss for every set of parameters\n",
@@ -399,7 +399,7 @@
        "# Code to draw loss function with regularization\n",
        "def draw_loss_function_reg(data,  model, lambda_, my_colormap, phi_iters = None):\n",
        "\n",
-        "  # Make grid of intercept/slope values to plot\n",
+        "  # Make grid of offset/frequency values to plot\n",
        "  offsets_mesh, freqs_mesh = np.meshgrid(np.arange(-10,10.0,0.1), np.arange(2.5,22.5,0.1))\n",
        "  loss_mesh = np.zeros_like(freqs_mesh)\n",
        "  # Compute loss for every set of parameters\n",
@@ -512,7 +512,7 @@
        "for c_step in range (n_steps):\n",
        "  # Do gradient descent step\n",
        "  phi_all[:,c_step+1:c_step+2] = gradient_descent_step2(phi_all[:,c_step:c_step+1],lambda_, data, model)\n",
-        "  # Measure loss and draw model every 4th step\n",
+        "  # Measure loss and draw model every 8th step\n",
        "  if c_step % 8 == 0:\n",
        "    loss =  compute_loss2(data[0,:], data[1,:], model, phi_all[:,c_step+1:c_step+2], lambda_)\n",
        "    draw_model(data,model,phi_all[:,c_step+1], \"Iteration %d, loss = %f\"%(c_step+1,loss))\n",
@@ -528,7 +528,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You should see that the gradient descent algorithm now finds the correct minimum.  By applying a tiny bit of domain knowledge (the parameter phi0 tends to be near zero and the parameters phi1 tends to be near 12.5), we get a better solution.  However, the cost is that this solution is slightly biased towards this prior knowledge."
+        "You should see that the gradient descent algorithm now finds the correct minimum.  By applying a tiny bit of domain knowledge (the parameter phi0 tends to be near zero and the parameter phi1 tends to be near 12.5), we get a better solution.  However, the cost is that this solution is slightly biased towards this prior knowledge."
      ],
      "metadata": {
        "id": "wrszSLrqZG4k"
--- a/Notebooks/Chap09/9_2_Implicit_Regularization.ipynb
+++ b/Notebooks/Chap09/9_2_Implicit_Regularization.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOR3WOJwfTlMD8eOLsPfPrz",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -140,7 +139,7 @@
        "    fig.set_size_inches(7,7)\n",
        "    ax.contourf(phi0mesh, phi1mesh, loss_function, 256, cmap=my_colormap);\n",
        "    ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
-        "    ax.set_xlabel('$\\phi_{0}$'); ax.set_ylabel('$\\phi_{1}$')\n",
+        "    ax.set_xlabel(r'$\\phi_{0}$'); ax.set_ylabel(r'$\\phi_{1}$')\n",
        "\n",
        "    if grad_path_typical_lr is not None:\n",
        "        ax.plot(grad_path_typical_lr[0,:], grad_path_typical_lr[1,:],'ro-')\n",
--- a/Notebooks/Chap09/9_3_Ensembling.ipynb
+++ b/Notebooks/Chap09/9_3_Ensembling.ipynb
@@ -52,7 +52,7 @@
        "# import libraries\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
-        "# Define seed so get same results each time\n",
+        "# Define seed to get same results each time\n",
        "np.random.seed(1)"
      ]
    },
@@ -80,7 +80,7 @@
        "    for i in range(n_data):\n",
        "        x[i] = np.random.uniform(i/n_data, (i+1)/n_data, 1)\n",
        "\n",
-        "    # y value from running through functoin and adding noise\n",
+        "    # y value from running through function and adding noise\n",
        "    y = np.ones(n_data)\n",
        "    for i in range(n_data):\n",
        "        y[i] = true_function(x[i])\n",
@@ -96,7 +96,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Draw the fitted function, together win uncertainty used to generate points\n",
+        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
@@ -137,7 +137,7 @@
        "n_data = 15\n",
        "x_data,y_data = generate_data(n_data, sigma_func)\n",
        "\n",
-        "# Plot the functinon, data and uncertainty\n",
+        "# Plot the function, data and uncertainty\n",
        "plot_function(x_func, y_func, x_data, y_data, sigma_func=sigma_func)"
      ],
      "metadata": {
@@ -216,7 +216,7 @@
        "# Closed form solution\n",
        "beta, omega = fit_model_closed_form(x_data,y_data,n_hidden=14)\n",
        "\n",
-        "# Get prediction for model across graph grange\n",
+        "# Get prediction for model across graph range\n",
        "x_model = np.linspace(0,1,100);\n",
        "y_model = network(x_model, beta, omega)\n",
        "\n",
@@ -297,7 +297,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Plot the median of the results\n",
+        "# Plot the mean of the results\n",
        "# TODO -- find the mean prediction\n",
        "# Replace this line\n",
        "y_model_mean = all_y_model[0,:]\n",
--- a/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb
+++ b/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "el8l05WQEO46"
@@ -38,7 +36,7 @@
        "# import libraries\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
-        "# Define seed so get same results each time\n",
+        "# Define seed to get same results each time\n",
        "np.random.seed(1)"
      ]
    },
@@ -87,7 +85,7 @@
      },
      "outputs": [],
      "source": [
-        "# Draw the fitted function, together win uncertainty used to generate points\n",
+        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
@@ -159,7 +157,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "i8T_QduzeBmM"
@@ -195,7 +192,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "JojV6ueRk49G"
@@ -211,7 +207,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "YX0O_Ciwp4W1"
@@ -225,7 +220,7 @@
        " &\\propto&\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n",
        "\\end{align}\n",
        "\n",
-        "In fact, since this already a normal distribution, the constant of proportionality must be one and we can write\n",
+        "In fact, since this is already a normal distribution, the constant of proportionality must be one and we can write\n",
        "\n",
        "\\begin{align}\n",
        " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &=& \\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n",
@@ -277,7 +272,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "GjPnlG4q0UFK"
@@ -334,7 +328,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "GiNg5EroUiUb"
@@ -343,17 +336,16 @@
        "Now we need to perform inference for a new data points $\\mathbf{x}^*$ with corresponding hidden values $\\mathbf{h}^*$.  Instead of having a single estimate of the parameters, we have a distribution over the possible parameters.  So we marginalize (integrate) over this distribution to account for all possible values:\n",
        "\n",
        "\\begin{align}\n",
-        "Pr(y^*|\\mathbf{x}^*)  &=& \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
-        "&=& \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
-        "&=& \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},  [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
-        "[\\mathbf{h}^*;1]\\biggr]\n",
+        "Pr(y^*|\\mathbf{x}^*)  &= \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
+        "&= \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
+        "&= \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},  [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
+        "[\\mathbf{h}^*;1]\\biggr],\n",
        "\\end{align}\n",
        "\n",
+        "where the notation $[\\mathbf{h}^{*T},1]$ is a row vector containing $\\mathbf{h}^{T}$ with a one appended to the end and $[\\mathbf{h};1 ]$ is a column vector containing $\\mathbf{h}$ with a one appended to the end.\n",
        "\n",
        "\n",
-        "\n",
-        "To compute this, we reformulated the integrand using the relations from appendices\n",
-        "C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
+        "To compute this, we reformulated the integrand using the relations from appendices C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
        "to $\\boldsymbol\\phi$. The integral of the normal distribution must be one, and so the final result is just the constant. This constant is itself a normal distribution in $y^*$. <br>\n",
        "\n",
        "If you feel so inclined you can work through the math of this yourself.\n",
@@ -404,7 +396,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "8Hcbe_16sK0F"
@@ -419,9 +410,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyMB8B4269DVmrcLoCWrhzKF",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
--- a/Notebooks/Chap09/9_5_Augmentation.ipynb
+++ b/Notebooks/Chap09/9_5_Augmentation.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyM38ZVBK4/xaHk5Ys5lF6dN",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -44,8 +43,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "syvgxgRr3myY"
@@ -95,7 +94,7 @@
        "D_k = 200   # Hidden dimensions\n",
        "D_o = 10    # Output dimensions\n",
        "\n",
-        "# Define a model with two hidden layers of size 100\n",
+        "# Define a model with two hidden layers of size 200\n",
        "# And ReLU activations between them\n",
        "model = nn.Sequential(\n",
        "nn.Linear(D_i, D_k),\n",
@@ -108,10 +107,7 @@
        "  # Initialize the parameters with He initialization\n",
        "  if isinstance(layer_in, nn.Linear):\n",
        "    nn.init.kaiming_uniform_(layer_in.weight)\n",
-        "    layer_in.bias.data.fill_(0.0)\n",
-        "\n",
-        "# Call the function you just defined\n",
-        "model.apply(weights_init)"
+        "    layer_in.bias.data.fill_(0.0)\n"
      ],
      "metadata": {
        "id": "JfIFWFIL33eF"
--- a/Notebooks/Chap10/10_2_Convolution_for_MNIST_1D.ipynb
+++ b/Notebooks/Chap10/10_2_Convolution_for_MNIST_1D.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyNJodaaCLMRWL9vTl8B/iLI",
+      "authorship_tag": "ABX9TyNb46PJB/CC1pcHGfjpUUZg",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -45,8 +45,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap10/10_4_Downsampling_and_Upsampling.ipynb
+++ b/Notebooks/Chap10/10_4_Downsampling_and_Upsampling.ipynb
@@ -31,7 +31,7 @@
      "source": [
        "# **Notebook 10.4: Downsampling and Upsampling**\n",
        "\n",
-        "This notebook investigates the down sampling and downsampling methods discussed in section 10.4 of the book.\n",
+        "This notebook investigates the upsampling and downsampling methods discussed in section 10.4 of the book.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TODO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
@@ -301,7 +301,7 @@
      "cell_type": "code",
      "source": [
        "# Define 2 by 2 original patch\n",
-        "orig_2_2 = np.array([[2, 4], [4,8]])\n",
+        "orig_2_2 = np.array([[6, 8], [8,4]])\n",
        "print(orig_2_2)"
      ],
      "metadata": {
--- a/Notebooks/Chap10/10_5_Convolution_For_MNIST.ipynb
+++ b/Notebooks/Chap10/10_5_Convolution_For_MNIST.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyNAcc98STMeyQgh9SbVHWG+",
+      "authorship_tag": "ABX9TyORZF8xy4X1yf4oRhRq8Rtm",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -65,10 +65,19 @@
      "source": [
        "# Run this once to load the train and test data straight into a dataloader class\n",
        "# that will provide the batches\n",
+        "\n",
+        "# (It may complain that some files are missing because the files seem to have been\n",
+        "# reorganized on the underlying website, but it still seems to work). If everything is working\n",
+        "# properly, then the whole notebook should run to the end without further problems\n",
+        "# even before you make changes.\n",
        "batch_size_train = 64\n",
        "batch_size_test = 1000\n",
+        "\n",
+        "# TODO Change this directory to point towards an existing directory\n",
+        "myDir = '/files/'\n",
+        "\n",
        "train_loader = torch.utils.data.DataLoader(\n",
-        "  torchvision.datasets.MNIST('/files/', train=True, download=True,\n",
+        "  torchvision.datasets.MNIST(myDir, train=True, download=True,\n",
        "                             transform=torchvision.transforms.Compose([\n",
        "                               torchvision.transforms.ToTensor(),\n",
        "                               torchvision.transforms.Normalize(\n",
@@ -77,7 +86,7 @@
        "  batch_size=batch_size_train, shuffle=True)\n",
        "\n",
        "test_loader = torch.utils.data.DataLoader(\n",
-        "  torchvision.datasets.MNIST('/files/', train=False, download=True,\n",
+        "  torchvision.datasets.MNIST(myDir, train=False, download=True,\n",
        "                             transform=torchvision.transforms.Compose([\n",
        "                               torchvision.transforms.ToTensor(),\n",
        "                               torchvision.transforms.Normalize(\n",
--- a/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
+++ b/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMLKg5ZmXqojcVrZD5BGm9g",
+      "authorship_tag": "ABX9TyP3VmRg51U+7NCfSYjRRrgv",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -267,8 +267,8 @@
        "  fig,ax = plt.subplots()\n",
        "  ax.plot(np.squeeze(x_in), np.squeeze(dydx), 'b-')\n",
        "  ax.set_xlim(-2,2)\n",
-        "  ax.set_xlabel('Input, $x$')\n",
-        "  ax.set_ylabel('Gradient, $dy/dx$')\n",
+        "  ax.set_xlabel(r'Input, $x$')\n",
+        "  ax.set_ylabel(r'Gradient, $dy/dx$')\n",
        "  ax.set_title('No layers = %d'%(K))\n",
        "  plt.show()"
      ],
--- a/Notebooks/Chap11/11_2_Residual_Networks.ipynb
+++ b/Notebooks/Chap11/11_2_Residual_Networks.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMXS3SPB4cS/4qxix0lH/Hq",
+      "authorship_tag": "ABX9TyNIY8tswL9e48d5D53aSmHO",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -45,8 +45,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap11/11_3_Batch_Normalization.ipynb
+++ b/Notebooks/Chap11/11_3_Batch_Normalization.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPVeAd3eDpEOCFh8CVyr1zz",
+      "authorship_tag": "ABX9TyPx2mM2zTHmDJeKeiE1RymT",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -45,8 +45,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
+++ b/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMSk8qTqDYqFnRJVZKlsue0",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -29,7 +28,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "# **Notebook 12.1: Multhead Self-Attention**\n",
+        "# **Notebook 12.2: Multihead Self-Attention**\n",
        "\n",
        "This notebook builds a multihead self-attention mechanism as in figure 12.6\n",
        "\n",
@@ -147,9 +146,7 @@
        "  exp_values = np.exp(data_in) ;\n",
        "  # Sum over columns\n",
        "  denom = np.sum(exp_values, axis = 0);\n",
-        "  # Replicate denominator to N rows\n",
-        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
-        "  # Compute softmax\n",
+        "  # Compute softmax (numpy broadcasts denominator to all rows automatically)\n",
        "  softmax = exp_values / denom\n",
        "  # return the answer\n",
        "  return softmax"
--- a/Notebooks/Chap13/13_2_Graph_Classification.ipynb
+++ b/Notebooks/Chap13/13_2_Graph_Classification.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOMSGUFWT+YN0fwYHpMmHJM",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -99,7 +98,7 @@
        "\n",
        "# TODO -- Define node matrix\n",
        "# There will be 9 nodes and 118 possible chemical elements\n",
-        "# so we'll define a 9x118 matrix.  Each column represents one\n",
+        "# so we'll define a 118x9 matrix.  Each column represents one\n",
        "# node and is a one-hot vector (i.e. all zeros, except a single one at the\n",
        "# chemical number of the element).\n",
        "# Chemical numbers:  Hydrogen-->1, Carbon-->6,  Oxygen-->8\n",
--- a/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb
+++ b/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb
@@ -109,7 +109,7 @@
        "# Choose random values for the parameters\n",
        "omega = np.random.normal(size=(D,D))\n",
        "beta = np.random.normal(size=(D,1))\n",
-        "phi = np.random.normal(size=(1,2*D))"
+        "phi = np.random.normal(size=(2*D,1))"
      ],
      "metadata": {
        "id": "79TSK7oLMobe"
--- a/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb
+++ b/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyM0StKV3FIZ3MZqfflqC0Rv",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -339,7 +338,7 @@
        "    print(\"Initial generator loss = \", compute_generator_loss(z, theta, phi0, phi1))\n",
        "    for iter in range(n_iter):\n",
        "      # Get gradient\n",
-        "      dl_dtheta = compute_generator_gradient(x_real, x_syn, phi0, phi1)\n",
+        "      dl_dtheta = compute_generator_gradient(z, theta, phi0, phi1)\n",
        "      # Take a gradient step (uphill, since we are trying to make synthesized data less well classified by discriminator)\n",
        "      theta = theta + alpha * dl_dtheta ;\n",
        "\n",
--- a/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb
+++ b/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb
@@ -86,6 +86,7 @@
      "cell_type": "code",
      "source": [
        "# TODO Define the distance matrix from figure 15.8d\n",
+        "# The index should be normalized before being used in the distance calculation.\n",
        "# Replace this line\n",
        "dist_mat = np.zeros((10,10))\n",
        "\n",
@@ -128,7 +129,7 @@
    {
      "cell_type": "code",
      "source": [
-        "draw_2D_heatmap(dist_mat,'Distance $|i-j|$', my_colormap)"
+        "draw_2D_heatmap(dist_mat,r'Distance $|i-j|$', my_colormap)"
      ],
      "metadata": {
        "id": "G0HFPBXyHT6V"
@@ -197,7 +198,7 @@
      "cell_type": "code",
      "source": [
        "TP = np.array(opt.x).reshape(10,10)\n",
-        "draw_2D_heatmap(TP,'Transport plan $\\mathbf{P}$', my_colormap)"
+        "draw_2D_heatmap(TP,r'Transport plan $\\mathbf{P}$', my_colormap)"
      ],
      "metadata": {
        "id": "nZGfkrbRV_D0"
@@ -218,7 +219,8 @@
      "cell_type": "code",
      "source": [
        "was = np.sum(TP * dist_mat)\n",
-        "print(\"Wasserstein distance = \", was)"
+        "print(\"Your Wasserstein distance = \", was)\n",
+        "print(\"Correct answer =  0.15148578811369506\")"
      ],
      "metadata": {
        "id": "yiQ_8j-Raq3c"
--- a/Notebooks/Chap17/17_1_Latent_Variable_Models.ipynb
+++ b/Notebooks/Chap17/17_1_Latent_Variable_Models.ipynb
@@ -55,7 +55,7 @@
        "Pr(z) = \\text{Norm}_{z}[0,1]\n",
        "\\end{equation}\n",
        "\n",
-        "As in figure 17.2, we'll assume that the output is two dimensional, we we need to define a function that maps from the 1D latent variable to two dimensions.  Usually, we would use a neural network, but in this case, we'll just define an arbitrary relationship.\n",
+        "As in figure 17.2, we'll assume that the output is two dimensional, we need to define a function that maps from the 1D latent variable to two dimensions.  Usually, we would use a neural network, but in this case, we'll just define an arbitrary relationship.\n",
        "\n",
        "\\begin{align}\n",
        "x_{1} &=& 0.5\\cdot\\exp\\Bigl[\\sin\\bigl[2+ 3.675 z \\bigr]\\Bigr]\\\\\n",
--- a/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb
+++ b/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "t9vk9Elugvmi"
@@ -40,7 +38,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "paLz5RukZP1J"
@@ -114,7 +111,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "r5Hl2QkimWx9"
@@ -139,13 +135,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, expected_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "zTCykVeWqj_O"
@@ -253,13 +248,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, deriv_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ASu4yKSwAEYI"
@@ -269,7 +263,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "xoFR1wifc8-b"
@@ -366,13 +359,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, deriv_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "1TWBiUC7bQSw"
@@ -403,7 +395,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "d-0tntSYdKPR"
@@ -415,9 +406,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyOxO2/0DTH4n4zhC97qbagY",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
--- a/Notebooks/Chap17/17_3_Importance_Sampling.ipynb
+++ b/Notebooks/Chap17/17_3_Importance_Sampling.ipynb
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap17/17_3_Importance_Sampling.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "t9vk9Elugvmi"
@@ -40,7 +38,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "f7a6xqKjkmvT"
@@ -61,7 +58,7 @@
        "by drawing $I$ samples $y_i$ and using the formula:\n",
        "\n",
        "\\begin{equation}\n",
-        "\\mathbb{E}_{y}\\Bigl[\\exp\\bigl[- (y-1)^4\\bigr]\\Bigr] \\approx \\frac{1}{I} \\sum_{i=1}^I \\exp\\bigl[-(y-1)^4 \\bigr]\n",
+        "\\mathbb{E}_{y}\\Bigl[\\exp\\bigl[- (y-1)^4\\bigr]\\Bigr] \\approx \\frac{1}{I} \\sum_{i=1}^I \\exp\\bigl[-(y_i-1)^4 \\bigr]\n",
        "\\end{equation}"
      ]
    },
@@ -126,7 +123,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Jr4UPcqmnXCS"
@@ -166,8 +162,8 @@
        "mean_all = np.zeros_like(n_sample_all)\n",
        "variance_all = np.zeros_like(n_sample_all)\n",
        "for i in range(len(n_sample_all)):\n",
-        "  print(\"Computing mean and variance for expectation with %d samples\"%(n_sample_all[i]))\n",
-        "  mean_all[i],variance_all[i] = compute_mean_variance(n_sample_all[i])"
+        "  mean_all[i],variance_all[i] = compute_mean_variance(n_sample_all[i])\n",
+        "  print(\"No samples: \", n_sample_all[i], \", Mean: \", mean_all[i], \", Variance: \", variance_all[i])"
      ]
    },
    {
@@ -189,7 +185,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "XTUpxFlSuOl7"
@@ -199,7 +194,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "6hxsl3Pxo1TT"
@@ -234,7 +228,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "G9Xxo0OJsIqD"
@@ -283,7 +276,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "2sVDqP0BvxqM"
@@ -313,8 +305,8 @@
        "mean_all2 = np.zeros_like(n_sample_all)\n",
        "variance_all2 = np.zeros_like(n_sample_all)\n",
        "for i in range(len(n_sample_all)):\n",
-        "  print(\"Computing variance for expectation with %d samples\"%(n_sample_all[i]))\n",
-        "  mean_all2[i], variance_all2[i] = compute_mean_variance2(n_sample_all[i])"
+        "  mean_all2[i], variance_all2[i] = compute_mean_variance2(n_sample_all[i])\n",
+        "  print(\"No samples: \", n_sample_all[i], \", Mean: \", mean_all2[i], \", Variance: \", variance_all2[i])"
      ]
    },
    {
@@ -348,7 +340,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "EtBP6NeLwZqz"
@@ -360,7 +351,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "_wuF-NoQu1--"
@@ -432,8 +422,8 @@
        "mean_all2b = np.zeros_like(n_sample_all)\n",
        "variance_all2b = np.zeros_like(n_sample_all)\n",
        "for i in range(len(n_sample_all)):\n",
-        "  print(\"Computing variance for expectation with %d samples\"%(n_sample_all[i]))\n",
-        "  mean_all2b[i], variance_all2b[i] = compute_mean_variance2b(n_sample_all[i])"
+        "  mean_all2b[i], variance_all2b[i] = compute_mean_variance2b(n_sample_all[i])\n",
+        "  print(\"No samples: \", n_sample_all[i], \", Mean: \", mean_all2b[i], \", Variance: \", variance_all2b[i])"
      ]
    },
    {
@@ -478,7 +468,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "y8rgge9MNiOc"
@@ -490,9 +479,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyNecz9/CDOggPSmy1LjT/Dv",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
--- a/Notebooks/Chap19/19_2_Dynamic_Programming.ipynb
+++ b/Notebooks/Chap19/19_2_Dynamic_Programming.ipynb
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOlD6kmCxX3SKKuh3oJikKA",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -393,7 +392,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Update the state values for the current policy, by making the values at at adjacent\n",
+        "# Update the state values for the current policy, by making the values at adjacent\n",
        "# states compatible with the Bellman equation (equation 19.11)\n",
        "def policy_evaluation(policy, state_values, rewards,  transition_probabilities_given_action, gamma):\n",
        "\n",
@@ -406,6 +405,10 @@
        "            state_values_new[state] = 3.0\n",
        "            break\n",
        "\n",
+        "        # TODO -- Write this function (from equation 19.11, but bear in mind policy is deterministic here)\n",
+        "        # Replace this line\n",
+        "        state_values_new[state] = 0\n",
+        "\n",
        "    return state_values_new\n",
        "\n",
        "# Greedily choose the action that maximizes the value for each state.\n",
--- a/Notebooks/Chap20/20_1_Random_Data.ipynb
+++ b/Notebooks/Chap20/20_1_Random_Data.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPkSYbEjOcEmLt8tU6HxNuR",
+      "authorship_tag": "ABX9TyNgBRvfIlngVobKuLE6leM+",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -45,8 +45,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap20/20_2_Full_Batch_Gradient_Descent.ipynb
+++ b/Notebooks/Chap20/20_2_Full_Batch_Gradient_Descent.ipynb
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOo4vm4MXcIvAzVlMCaLikH",
+      "authorship_tag": "ABX9TyO6xuszaG4nNAcWy/3juLkn",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -44,8 +44,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap20/20_2_Full_Batch_Gradient_Descent_GPU.ipynb
+++ b/Notebooks/Chap20/20_2_Full_Batch_Gradient_Descent_GPU.ipynb
@@ -5,7 +5,7 @@
    "colab": {
      "provenance": [],
      "gpuType": "T4",
-      "authorship_tag": "ABX9TyMjPBfDONmjqTSyEQDP2gjY",
+      "authorship_tag": "ABX9TyOG/5A+P053/x1IfFg52z4V",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -47,8 +47,8 @@
    {
      "cell_type": "code",
      "source": [
-        "# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
-        "!git clone https://github.com/greydanus/mnist1d"
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d"
      ],
      "metadata": {
        "id": "D5yLObtZCi9J"
--- a/Notebooks/Chap20/20_3_Lottery_Tickets.ipynb
+++ b/Notebooks/Chap20/20_3_Lottery_Tickets.ipynb
@@ -43,7 +43,8 @@
        "id": "Sg2i1QmhKW5d"
      },
      "source": [
-        "# Run this if you're in a Colab\n",
+        "# Run this if you're in a Colab to install MNIST 1D repository\n",
+        "!pip install git+https://github.com/greydanus/mnist1d\n",
        "!git clone https://github.com/greydanus/mnist1d"
      ],
      "execution_count": null,
@@ -95,6 +96,12 @@
        "id": "I-vm_gh5xTJs"
      },
      "source": [
+        "from mnist1d.data import get_dataset, get_dataset_args\n",
+        "from mnist1d.utils import set_seed, to_pickle, from_pickle\n",
+        "\n",
+        "import sys ; sys.path.append('./mnist1d/notebooks')\n",
+        "from train import get_model_args, train_model\n",
+        "\n",
        "args = mnist1d.get_dataset_args()\n",
        "data = mnist1d.get_dataset(args=args)  # by default, this will download a pre-made dataset from the GitHub repo\n",
        "\n",
@@ -210,7 +217,7 @@
        "  # we would return [1,1,0,0,1]\n",
        "  # Remember that these are torch tensors and not numpy arrays\n",
        "  # Replace this function:\n",
-        "  mask = torch.ones_like(scores)\n",
+        "  mask = torch.ones_like(absolute_weights)\n",
        "\n",
        "\n",
        "  return mask"
@@ -237,7 +244,6 @@
        "def find_lottery_ticket(model, dataset, args, sparsity_schedule, criteria_fn=None, **kwargs):\n",
        "\n",
        "  criteria_fn = lambda init_params, final_params: final_params.abs()\n",
-        "\n",
        "  init_params = model.get_layer_vecs()\n",
        "  stats = {'train_losses':[], 'test_losses':[], 'train_accs':[], 'test_accs':[]}\n",
        "  models = []\n",
@@ -253,7 +259,7 @@
        "    model.set_layer_masks(masks)\n",
        "\n",
        "    # training process\n",
-        "    results = mnist1d.train_model(dataset, model, args)\n",
+        "    results = train_model(dataset, model, args)\n",
        "    model = results['checkpoints'][-1]\n",
        "\n",
        "    # store stats\n",
@@ -291,7 +297,8 @@
      },
      "source": [
        "# train settings\n",
-        "model_args = mnist1d.get_model_args()\n",
+        "from train import get_model_args, train_model\n",
+        "model_args = get_model_args()\n",
        "model_args.total_steps = 1501\n",
        "model_args.hidden_size = 500\n",
        "model_args.print_every = 5000 # print never\n",
--- a/Notebooks/Chap21/21_1_Bias_Mitigation.ipynb
+++ b/Notebooks/Chap21/21_1_Bias_Mitigation.ipynb
@@ -137,7 +137,7 @@
        "id": "CfZ-srQtmff2"
      },
      "source": [
-        "Why might the distributions for blue and yellow populations be different? It could be that the behaviour of the populations is identical, but the credit rating algorithm is biased; it may favor one population over another or simply be more noisy for one group. Alternatively, it could be that that the populations genuinely behave differently. In practice, the differences in blue and yellow distributions are probably attributable to a combination of these factors.\n",
+        "Why might the distributions for blue and yellow populations be different? It could be that the behaviour of the populations is identical, but the credit rating algorithm is biased; it may favor one population over another or simply be more noisy for one group. Alternatively, it could be that the populations genuinely behave differently. In practice, the differences in blue and yellow distributions are probably attributable to a combination of these factors.\n",
        "\n",
        "Let’s assume that we can’t retrain the credit score prediction algorithm; our job is to adjudicate whether each individual is refused the loan ($\\hat{y}=0$)\n",
        " or granted it ($\\hat{y}=1$). Since we only have the credit score\n",
@@ -382,7 +382,7 @@
      "source": [
        "# Equal opportunity:\n",
        "\n",
-        "The thresholds are chosen so that so that the true positive rate is is the same for both population. Of the people who pay back the loan, the same proportion are offered credit in each group. In terms of the two ROC curves, it means choosing thresholds so that the vertical position on each curve is the same without regard for the horizontal position."
+        "The thresholds are chosen so that so that the true positive rate is the same for both population. Of the people who pay back the loan, the same proportion are offered credit in each group. In terms of the two ROC curves, it means choosing thresholds so that the vertical position on each curve is the same without regard for the horizontal position."
      ]
    },
    {
--- a/Notebooks/LICENSE
+++ b/Notebooks/LICENSE
@@ -0,0 +1,7 @@
+Copyright 2023 Simon Prince
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/Trees/LinearRegression_FitModel.ipynb
+++ b/Trees/LinearRegression_FitModel.ipynb
--- a/Trees/LinearRegression_FitModel_Answers.ipynb
+++ b/Trees/LinearRegression_FitModel_Answers.ipynb
--- a/Trees/LinearRegression_FitModel_Quadratic.ipynb
+++ b/Trees/LinearRegression_FitModel_Quadratic.ipynb
--- a/Trees/LinearRegression_LossFunction.ipynb
+++ b/Trees/LinearRegression_LossFunction.ipynb
--- a/Trees/LinearRegression_LossFunction_Answers.ipynb
+++ b/Trees/LinearRegression_LossFunction_Answers.ipynb
--- a/Trees/SAT_Construction.ipynb
+++ b/Trees/SAT_Construction.ipynb
--- a/Trees/SAT_Construction2.ipynb
+++ b/Trees/SAT_Construction2.ipynb
--- a/Trees/SAT_Construction2_Answers.ipynb
+++ b/Trees/SAT_Construction2_Answers.ipynb
--- a/Trees/SAT_Construction_Answers.ipynb
+++ b/Trees/SAT_Construction_Answers.ipynb
--- a/Trees/SAT_Crossword.ipynb
+++ b/Trees/SAT_Crossword.ipynb
--- a/Trees/SAT_Crossword_Answers.ipynb
+++ b/Trees/SAT_Crossword_Answers.ipynb
--- a/Trees/SAT_Exhaustive.ipynb
+++ b/Trees/SAT_Exhaustive.ipynb
--- a/Trees/SAT_Exhaustive_Answers.ipynb
+++ b/Trees/SAT_Exhaustive_Answers.ipynb
--- a/Trees/SAT_Graph_Coloring.ipynb
+++ b/Trees/SAT_Graph_Coloring.ipynb
--- a/Trees/SAT_Graph_Coloring_Answers.ipynb
+++ b/Trees/SAT_Graph_Coloring_Answers.ipynb
--- a/Trees/SAT_Sudoku.ipynb
+++ b/Trees/SAT_Sudoku.ipynb
--- a/Trees/SAT_Sudoku_Answers.ipynb
+++ b/Trees/SAT_Sudoku_Answers.ipynb
--- a/Trees/SAT_Tseitin.ipynb
+++ b/Trees/SAT_Tseitin.ipynb
--- a/Trees/SAT_Tseitin_Answers.ipynb
+++ b/Trees/SAT_Tseitin_Answers.ipynb
--- a/Trees/SAT_Z3.ipynb
+++ b/Trees/SAT_Z3.ipynb
--- a/Trees/SAT_Z3_Answers.ipynb
+++ b/Trees/SAT_Z3_Answers.ipynb
--- a/Trees/cb_2018_us_state_500k.zip
+++ b/Trees/cb_2018_us_state_500k.zip
--- a/UDL_Answer_Booklet_Students.pdf
+++ b/UDL_Answer_Booklet_Students.pdf
--- a/UDL_Equations.tex
+++ b/UDL_Equations.tex
--- a/UDL_Errata.pdf
+++ b/UDL_Errata.pdf
--- a/index.html
+++ b/index.html
@@ -1,406 +1,20 @@
-<!DOCTYPE html>
+<!doctype html>
 <html lang="en">
    <head>
-    <meta charset="UTF-8">
-    <title>udlbook</title>
-    <link rel="stylesheet" href="style.css">
+        <meta charset="utf-8" />
+        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+        <link rel="icon" type="image/x-icon" href="/favicon.ico" />
+        <link rel="preconnect" href="https://fonts.googleapis.com" />
+        <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+        <link
+            href="https://fonts.googleapis.com/css2?family=Encode+Sans+Expanded:wght@400;700&display=swap"
+            rel="stylesheet"
+        />
+        
+        <title>Understanding Deep Learning</title>
    </head>
-
    <body>
-<div id="head">
-    <div>
-        <h1 style="margin: 0; font-size: 36px">Understanding Deep Learning</h1>
-        by Simon J.D. Prince
-        <br>Published by MIT Press Dec 5th 2023.<br>
-        <ul>
-            <li>
-                <p style="font-size: larger; margin-bottom: 0">Download full PDF <a
-                        href="https://github.com/udlbook/udlbook/releases/download/v2.0.1/UnderstandingDeepLearning_02_15_24_C.pdf">here</a>
-                </p>2024-02-15. CC-BY-NC-ND license<br>
-                <img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
-            </li>
-            <li> Order your copy from  <a href="https://mitpress.mit.edu/9780262048644/understanding-deep-learning/">here </a></li>
-            <li> Known errata can be found here:   <a
-            href="https://github.com/udlbook/udlbook/raw/main/UDL_Errata.pdf">PDF</a></li>
-            <li> Report new errata via <a href="https://github.com/udlbook/udlbook/issues">github</a>
-                or contact me directly at udlbookmail@gmail.com
-            <li> Follow me on <a href="https://twitter.com/SimonPrinceAI">Twitter</a> or <a
-                    href="https://www.linkedin.com/in/simon-prince-615bb9165/">LinkedIn</a> for updates.
-        </ul>
-        <h2>Table of contents</h2>
-        <ul>
-            <li> Chapter 1 - Introduction
-            <li> Chapter 2 - Supervised learning
-            <li> Chapter 3 - Shallow neural networks
-            <li> Chapter 4 - Deep neural networks
-            <li> Chapter 5 - Loss functions
-            <li> Chapter 6 - Training models
-            <li> Chapter 7 - Gradients and initialization
-            <li> Chapter 8 - Measuring performance
-            <li> Chapter 9 - Regularization
-            <li> Chapter 10 - Convolutional networks
-            <li> Chapter 11 - Residual networks
-            <li> Chapter 12 - Transformers
-            <li> Chapter 13 - Graph neural networks
-            <li> Chapter 14 - Unsupervised learning
-            <li> Chapter 15 - Generative adversarial networks
-            <li> Chapter 16 - Normalizing flows
-            <li> Chapter 17 - Variational autoencoders
-            <li> Chapter 18 - Diffusion models
-            <li> Chapter 19 - Deep reinforcement learning
-            <li> Chapter 20 - Why does deep learning work?
-            <li> Chapter 21 - Deep learning and ethics
-        </ul>
-    </div>
-    <div id="cover">
-        <img src="https://raw.githubusercontent.com/udlbook/udlbook/main/UDLCoverSmall.jpg"
-             alt="front cover">
-    </div>
-</div>
-<div id="body">
-    <h2>Resources for instructors </h2>
-    <p>Instructor answer booklet available with proof of credentials via <a
-            href="https://mitpress.mit.edu/9780262048644/understanding-deep-learning"> MIT Press</a>.</p>
-    <p>Request an exam/desk copy via <a href="https://mitpress.ublish.com/request?cri=15055">MIT Press</a>.</p>
-    <p>Figures in PDF (vector) / SVG (vector) / Powerpoint (images):
-    <ul>
-        <li> Chapter 1 - Introduction: <a href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap1PDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1udnl5pUOAc8DcAQ7HQwyzP9pwL95ynnv">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1IjTqIUvWCJc71b5vEJYte-Dwujcp7rvG/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-            Figures</a>
-        <li> Chapter 2 - Supervised learning: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap2PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1VSxcU5y1qNFlmd3Lb3uOWyzILuOj1Dla"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1Br7R01ROtRWPlNhC_KOommeHAWMBpWtz/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 3 - Shallow neural networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap3PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=19kZFWlXhzN82Zx02ByMmSZOO4T41fmqI"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1e9M3jB5I9qZ4dCBY90Q3Hwft_i068QVQ/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 4 - Deep neural networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap4PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1ojr0ebsOhzvS04ItAflX2cVmYqHQHZUa"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1LTSsmY4mMrJbqXVvoTOCkQwHrRKoYnJj/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 5 - Loss functions: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap5PDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=17MJO7fiMpFZVqKeqXTbQ36AMpmR4GizZ">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1gcpC_3z9oRp87eMkoco-kdLD-MM54Puk/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-            Figures</a>
-        <li> Chapter 6 - Training models: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap6PDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1VPdhFRnCr9_idTrX0UdHKGAw2shUuwhK">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1AKoeggAFBl9yLC7X5tushAGzCCxmB7EY/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-            Figures</a>
-        <li> Chapter 7 - Gradients and initialization: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap7PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1TTl4gvrTvNbegnml4CoGoKOOd6O8-PGs"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/11zhB6PI-Dp6Ogmr4IcI6fbvbqNqLyYcz/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 8 - Measuring performance: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap8PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=19eQOnygd_l0DzgtJxXuYnWa4z7QKJrJx"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1SHRmJscDLUuQrG7tmysnScb3ZUAqVMZo/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 9 - Regularization: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap9PDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1LprgnUGL7xAM9-jlGZC9LhMPeefjY0r0">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1VwIfvjpdfTny6sEfu4ZETwCnw6m8Eg-5/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-            Figures</a>
-        <li> Chapter 10 - Convolutional networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap10PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1-Wb3VzaSvVeRzoUzJbI2JjZE0uwqupM9"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1MtfKBC4Y9hWwGqeP6DVwUNbi1j5ncQCg/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 11 - Residual networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap11PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1Mr58jzEVseUAfNYbGWCQyDtEDwvfHRi1"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1saY8Faz0KTKAAifUrbkQdLA2qkyEjOPI/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 12 - Transformers: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap12PDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1txzOVNf8-jH4UfJ6SLnrtOfPd1Q3ebzd">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1GVNvYWa0WJA6oKg89qZre-UZEhABfm0l/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-            Figures</a>
-        <li> Chapter 13 - Graph neural networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap13PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1lQIV6nRp6LVfaMgpGFhuwEXG-lTEaAwe"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1YwF3U82c1mQ74c1WqHVTzLZ0j7GgKaWP/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 14 - Unsupervised learning: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap14PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1aMbI6iCuUvOywqk5pBOmppJu1L1anqsM"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1A-lBGv3NHl4L32NvfFgy1EKeSwY-0UeB/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
-                PowerPoint Figures</a>
-        <li> Chapter 15 - Generative adversarial networks: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap15PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1EErnlZCOlXc3HK7m83T2Jh_0NzIUHvtL"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/10Ernk41ShOTf4IYkMD-l4dJfKATkXH4w/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 16 - Normalizing flows: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap16PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1B9bxtmdugwtg-b7Y4AdQKAIEVWxjx8l3"> SVG Figures</a>                                                                                                     
-            /
-            <a href="https://docs.google.com/presentation/d/1nLLzqb9pdfF_h6i1HUDSyp7kSMIkSUUA/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 17 - Variational autoencoders: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap17PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1SNtNIY7khlHQYMtaOH-FosSH3kWwL4b7"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1lQE4Bu7-LgvV2VlJOt_4dQT-kusYl7Vo/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Chapter 18 - Diffusion models: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap18PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1A-pIGl4PxjVMYOKAUG3aT4a8wD3G-q_r"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1x_ufIBtVPzWUvRieKMkpw5SdRjXWwdfR/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
-            PowerPoint Figures</a>
-        <li> Chapter 19 - Deep reinforcement learning: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap19PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1a5WUoF7jeSgwC_PVdckJi1Gny46fCqh0"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1TnYmVbFNhmMFetbjyfXGmkxp1EHauMqr/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
-                PowerPoint Figures </a>
-        <li> Chapter 20 - Why does deep learning work?: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap20PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1M2d0DHEgddAQoIedKSDTTt7m1ZdmBLQ3"> SVG Figures</a>
-            /
-            <a href="https://docs.google.com/presentation/d/1coxF4IsrCzDTLrNjRagHvqB_FBy10miA/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
-                PowerPoint Figures</a>
-        <li> Chapter 21 - Deep learning and ethics: <a
-                href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap21PDF.zip">PDF Figures</a> / <a
-                href="https://drive.google.com/uc?export=download&id=1jixmFfwmZkW_UVYzcxmDcMsdFFtnZ0bU"> SVG Figures</a>/
-            <a
-                    href="https://docs.google.com/presentation/d/1EtfzanZYILvi9_-Idm28zD94I_6OrN9R/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
-                Figures</a>
-        <li> Appendices - <a href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLAppendixPDF.zip">PDF
-            Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1k2j7hMN40ISPSg9skFYWFL3oZT7r8v-l">
-            SVG
-            Figures</a> / <a
-                href="https://docs.google.com/presentation/d/1_2cJHRnsoQQHst0rwZssv-XH4o5SEHks/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">Powerpoint
-            Figures</a>
-    </ul>
-
-    Instructions for editing figures / equations can be found <a
-        href="https://drive.google.com/file/d/1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/view?usp=sharing">here</a>.
-
-    <p> My slides for 20 lecture undergraduate deep learning course:</p>
-    <ul>
-        <li><a href="https://drive.google.com/uc?export=download&id=17RHb11BrydOvxSFNbRIomE1QKLVI087m">1. Introduction</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1491zkHULC7gDfqlV6cqUxyVYXZ-de-Ub">2. Supervised Learning</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1XkP1c9EhOBowla1rT1nnsDGMf2rZvrt7">3. Shallow Neural Networks</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1e2ejfZbbfMKLBv0v-tvBWBdI8gO3SSS1">4. Deep Neural Networks</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1fxQ_a1Q3eFPZ4kPqKbak6_emJK-JfnRH">5. Loss Functions</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=17QQ5ZzXBtR_uCNCUU1gPRWWRUeZN9exW">6. Fitting Models</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1hC8JUCOaFWiw3KGn0rm7nW6mEq242QDK">7. Computing Gradients</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1tSjCeAVg0JCeBcPgDJDbi7Gg43Qkh9_d">7b. Initialization</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1RVZW3KjEs0vNSGx3B2fdizddlr6I0wLl">8. Performance</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1LTicIKPRPbZRkkg6qOr1DSuOB72axood">9. Regularization</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1bGVuwAwrofzZdfvj267elIzkYMIvYFj0">10. Convolutional Networks</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=14w31QqWRDix1GdUE-na0_E0kGKBhtKzs">11. Image Generation</a></li>
-        <li><a href="https://drive.google.com/uc?export=download&id=1af6bTTjAbhDYfrDhboW7Fuv52Gk9ygKr">12. Transformers and LLMs</a></li>
-    </ul>
-
-    <h2>Resources for students</h2>
-
-    <p>Answers to selected questions: <a
-            href="https://github.com/udlbook/udlbook/raw/main/UDL_Answer_Booklet_Students.pdf">PDF</a>
-    </p>
-    <p>Python notebooks: (Early ones more thoroughly tested than later ones!)</p>
-
-    <ul>
-        <li> Notebook 1.1 - Background mathematics: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb">ipynb/colab</a>
-        </li>
-        <li> Notebook 2.1 - Supervised learning: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap02/2_1_Supervised_Learning.ipynb">ipynb/colab</a>
-        </li>
-        <li> Notebook 3.1 - Shallow networks I: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 3.2 - Shallow networks II: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap03/3_2_Shallow_Networks_II.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 3.3 - Shallow network regions: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap03/3_3_Shallow_Network_Regions.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 3.4 - Activation functions: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap03/3_4_Activation_Functions.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 4.1 - Composing networks: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap04/4_1_Composing_Networks.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 4.2 - Clipping functions: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap04/4_2_Clipping_functions.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 4.3 - Deep networks: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap04/4_3_Deep_Networks.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 5.1 - Least squares loss: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap05/5_1_Least_Squares_Loss.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 5.2 - Binary cross-entropy loss: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap05/5_2_Binary_Cross_Entropy_Loss.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 5.3 - Multiclass cross-entropy loss: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 6.1 - Line search: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap06/6_1_Line_Search.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 6.2 - Gradient descent: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap06/6_2_Gradient_Descent.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 6.3 - Stochastic gradient descent: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap06/6_3_Stochastic_Gradient_Descent.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 6.4 - Momentum: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap06/6_4_Momentum.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 6.5 - Adam: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap06/6_5_Adam.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 7.1 - Backpropagation in toy model: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap07/7_1_Backpropagation_in_Toy_Model.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 7.2 - Backpropagation: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap07/7_2_Backpropagation.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 7.3 - Initialization: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap07/7_3_Initialization.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 8.1 - MNIST-1D performance: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap08/8_1_MNIST_1D_Performance.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 8.2 - Bias-variance trade-off: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap08/8_2_Bias_Variance_Trade_Off.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 8.3 - Double descent: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap08/8_3_Double_Descent.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 8.4 - High-dimensional spaces: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap08/8_4_High_Dimensional_Spaces.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 9.1 - L2 regularization: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap09/9_1_L2_Regularization.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 9.2 - Implicit regularization: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap09/9_2_Implicit_Regularization.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 9.3 - Ensembling: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap09/9_3_Ensembling.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 9.4 - Bayesian approach: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 9.5 - Augmentation <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap09/9_5_Augmentation.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 10.1 - 1D convolution: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_1_1D_Convolution.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 10.2 - Convolution for MNIST-1D: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_2_Convolution_for_MNIST_1D.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 10.3 - 2D convolution: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_3_2D_Convolution.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 10.4 - Downsampling & upsampling: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_4_Downsampling_and_Upsampling.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 10.5 - Convolution for MNIST: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_5_Convolution_For_MNIST.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 11.1 - Shattered gradients: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 11.2 - Residual networks: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_2_Residual_Networks.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 11.3 - Batch normalization: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_3_Batch_Normalization.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 12.1 - Self-attention: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_1_Self_Attention.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 12.2 - Multi-head self-attention: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 12.3 - Tokenization: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_3_Tokenization.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 12.4 - Decoding strategies: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_4_Decoding_Strategies.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 13.1 - Encoding graphs: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_1_Graph_Representation.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 13.2 - Graph classification : <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_2_Graph_Classification.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 13.3 - Neighborhood sampling: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_3_Neighborhood_Sampling.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 13.4 - Graph attention: <a
-                href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb">ipynb/colab </a>
-        </li>
-        <li> Notebook 15.1 - GAN toy example: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb">ipynb/colab </a></li>
-        <li> Notebook 15.2 - Wasserstein distance: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb">ipynb/colab </a></li>
-        <li> Notebook 16.1 - 1D normalizing flows: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap16/16_1_1D_Normalizing_Flows.ipynb">ipynb/colab </a></li>
-        <li> Notebook 16.2 - Autoregressive flows: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap16/16_2_Autoregressive_Flows.ipynb">ipynb/colab </a></li>
-        <li> Notebook 16.3 - Contraction mappings: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap16/16_3_Contraction_Mappings.ipynb">ipynb/colab </a></li>
-        <li> Notebook 17.1 - Latent variable models: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap17/17_1_Latent_Variable_Models.ipynb">ipynb/colab </a></li>
-        <li> Notebook 17.2 - Reparameterization trick: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb">ipynb/colab </a></li>
-        <li> Notebook 17.3 - Importance sampling: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap17/17_3_Importance_Sampling.ipynb">ipynb/colab </a></li>
-        <li> Notebook 18.1 - Diffusion encoder: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap18/18_1_Diffusion_Encoder.ipynb">ipynb/colab </a></li>
-        <li> Notebook 18.2 - 1D diffusion model: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap18/18_2_1D_Diffusion_Model.ipynb">ipynb/colab </a></li>
-        <li> Notebook 18.3 - Reparameterized model: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap18/18_3_Reparameterized_Model.ipynb">ipynb/colab </a></li>
-        <li> Notebook 18.4 - Families of diffusion models: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap18/18_4_Families_of_Diffusion_Models.ipynb">ipynb/colab </a></li>
-        <li> Notebook 19.1 - Markov decision processes: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap19/19_1_Markov_Decision_Processes.ipynb">ipynb/colab </a></li>
-        <li> Notebook 19.2 - Dynamic programming: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap19/19_2_Dynamic_Programming.ipynb">ipynb/colab </a></li>
-        <li> Notebook 19.3 - Monte-Carlo methods: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap19/19_3_Monte_Carlo_Methods.ipynb">ipynb/colab </a></li>
-        <li> Notebook 19.4 - Temporal difference methods: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap19/19_4_Temporal_Difference_Methods.ipynb">ipynb/colab </a></li>
-        <li> Notebook 19.5 - Control variates: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap19/19_5_Control_Variates.ipynb">ipynb/colab </a></li>
-        <li> Notebook 20.1 - Random data: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap20/20_1_Random_Data.ipynb">ipynb/colab </a></li>
-        <li> Notebook 20.2 - Full-batch gradient descent: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap20/20_2_Full_Batch_Gradient_Descent.ipynb">ipynb/colab </a></li>
-        <li> Notebook 20.3 - Lottery tickets: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap20/20_3_Lottery_Tickets.ipynb">ipynb/colab </a></li>
-        <li> Notebook 20.4 - Adversarial attacks: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap20/20_4_Adversarial_Attacks.ipynb">ipynb/colab </a></li>
-        <li> Notebook 21.1 - Bias mitigation: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap21/21_1_Bias_Mitigation.ipynb">ipynb/colab </a></li>
-        <li> Notebook 21.2 - Explainability: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap21/21_2_Explainability.ipynb">ipynb/colab </a></li>
-    </ul>
-
-
-    <br>
-    <h2>Citation</h2>
-    <pre><code>
- @book{prince2023understanding,
- author = "Simon J.D. Prince",
- title = "Understanding Deep Learning",
- publisher = "MIT Press",
- year = 2023,
- url = "http://udlbook.com"
-}
-    </code></pre>
-</div>
+        <div id="root"></div>
+        <script type="module" src="/src/index.jsx"></script>
    </body>
+</html>
--- a/jsconfig.json
+++ b/jsconfig.json
@@ -0,0 +1,8 @@
+{
+    "compilerOptions": {
+        "baseUrl": "./",
+        "paths": {
+            "@/*": ["src/*"]
+        }
+    }
+}
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@@ -0,0 +1,36 @@
+{
+    "name": "udlbook-website",
+    "version": "0.1.0",
+    "private": true,
+    "homepage": "https://udlbook.github.io/udlbook",
+    "type": "module",
+    "scripts": {
+        "dev": "vite",
+        "build": "vite build",
+        "preview": "vite preview",
+        "lint": "eslint . --ext js,jsx --report-unused-disable-directives --max-warnings 0",
+        "predeploy": "npm run build",
+        "deploy": "gh-pages -d dist",
+        "clean": "rm -rf node_modules dist",
+        "format": "prettier --write ."
+    },
+    "dependencies": {
+        "react": "^18.3.1",
+        "react-dom": "^18.3.1",
+        "react-icons": "^5.2.1",
+        "react-router-dom": "^6.23.1",
+        "react-scroll": "^1.8.4",
+        "styled-components": "^6.1.11"
+    },
+    "devDependencies": {
+        "@vitejs/plugin-react-swc": "^3.5.0",
+        "eslint": "^8.57.0",
+        "eslint-plugin-react": "^7.34.2",
+        "eslint-plugin-react-hooks": "^4.6.2",
+        "eslint-plugin-react-refresh": "^0.4.7",
+        "gh-pages": "^6.1.1",
+        "prettier": "^3.3.1",
+        "prettier-plugin-organize-imports": "^3.2.4",
+        "vite": "^5.2.12"
+    }
+}
--- a/public/NMI_Review.pdf
+++ b/public/NMI_Review.pdf
--- a/public/favicon.ico
+++ b/public/favicon.ico
--- a/src/App.jsx
+++ b/src/App.jsx
@@ -0,0 +1,12 @@
+import Index from "@/pages";
+import { BrowserRouter as Router, Route, Routes } from "react-router-dom";
+
+export default function App() {
+    return (
+        <Router>
+            <Routes>
+                <Route exact path="/udlbook" element={<Index />} />
+            </Routes>
+        </Router>
+    );
+}
--- a/src/README.md
+++ b/src/README.md
@@ -0,0 +1,34 @@
+# Understanding Deep Learning
+
+Understanding Deep Learning - Simon J.D. Prince
+
+## Website
+
+```shell
+# Install dependencies
+npm install
+
+# Run the website in development mode
+npm dev
+
+# Build the website
+npm build
+
+# Preview the built website
+npm preview
+
+# Format the code
+npm run format
+
+# Lint the code
+npm run lint
+
+# Clean the repository
+npm run clean
+
+# Prepare to deploy the website
+npm run predeploy
+
+# Deploy the website
+npm run deploy
+```
--- a/src/components/Footer/FooterElements.jsx
+++ b/src/components/Footer/FooterElements.jsx
@@ -0,0 +1,145 @@
+import { Link } from "react-router-dom";
+import styled from "styled-components";
+
+export const FooterContainer = styled.footer`
+    background-color: #101522;
+`;
+
+export const FooterWrap = styled.div`
+    padding: 48x 24px;
+    display: flex;
+    flex-direction: column;
+    justify-content: center;
+    align-items: center;
+    max-width: 1100px;
+    margin: 0 auto;
+`;
+
+export const FooterLinksContainer = styled.div`
+    display: flex;
+    justify-content: center;
+
+    @media screen and (max-width: 820px) {
+        padding-top: 32px;
+    }
+`;
+
+export const FooterLinksWrapper = styled.div`
+    display: flex;
+
+    @media screen and (max-width: 820px) {
+        flex-direction: column;
+    }
+`;
+
+export const FooterLinkItems = styled.div`
+    display: flex;
+    flex-direction: column;
+    align-items: flex-start;
+    margin: 16px;
+    text-align: left;
+    width: 160px;
+    box-sizing: border-box;
+    color: #fff;
+
+    @media screen and (max-width: 420px) {
+        margin: 0;
+        padding: 10px;
+        width: 100%;
+    }
+`;
+
+export const FooterLinkTitle = styled.h1`
+    font-size: 14px;
+    margin-bottom: 16px;
+`;
+
+export const FooterLink = styled(Link)`
+    color: #ffffff;
+    text-decoration: none;
+    margin-bottom: 0.5rem;
+    font-size: 14px;
+
+    &:hover {
+        color: #01bf71;
+        transition: 0.3s ease-in-out;
+    }
+`;
+
+export const SocialMedia = styled.section`
+    max-width: 1000px;
+    width: 100%;
+`;
+
+export const SocialMediaWrap = styled.div`
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    max-width: 1100px;
+    margin: 20px auto 0 auto;
+
+    @media screen and (max-width: 820px) {
+        flex-direction: column;
+    }
+`;
+
+export const SocialAttrWrap = styled.div`
+    color: #fff;
+    display: flex;
+    justify-content: center;
+    align-items: center;
+    max-width: 1100px;
+    margin: 10px auto 0 auto;
+
+    @media screen and (max-width: 820px) {
+        flex-direction: column;
+    }
+`;
+
+export const SocialLogo = styled(Link)`
+    color: #fff;
+    justify-self: start;
+    cursor: pointer;
+    text-decoration: none;
+    font-size: 1.5rem;
+    display: flex;
+    align-items: center;
+    margin-bottom: 16px;
+    font-weight: bold;
+
+    @media screen and (max-width: 768px) {
+        font-size: 20px;
+    }
+`;
+
+export const WebsiteRights = styled.small`
+    color: #fff;
+    margin-bottom: 8px;
+`;
+
+export const SocialIcons = styled.div`
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    width: 60px;
+    margin-bottom: 8px;
+`;
+
+export const SocialIconLink = styled.a`
+    color: #fff;
+    font-size: 24px;
+    margin-right: 8px;
+`;
+
+export const FooterImgWrap = styled.div`
+    max-width: 555px;
+    height: 100%;
+`;
+
+export const FooterImg = styled.img`
+    width: 100%;
+    margin-top: 0;
+    margin-right: 0;
+    margin-left: 10px;
+    padding-right: 0;
+`;
--- a/src/components/Footer/index.jsx
+++ b/src/components/Footer/index.jsx
@@ -0,0 +1,84 @@
+import {
+    FooterContainer,
+    FooterWrap,
+    SocialIconLink,
+    SocialIcons,
+    SocialLogo,
+    SocialMedia,
+    SocialMediaWrap,
+    WebsiteRights,
+} from "@/components/Footer/FooterElements";
+import { FaGithub, FaLinkedin } from "react-icons/fa";
+import { FaSquareXTwitter } from "react-icons/fa6";
+import { animateScroll as scroll } from "react-scroll";
+
+const images = [
+    "https://freepik.com/free-vector/hand-coding-concept-illustration_21864184.htm#query=coding&position=17&from_view=search&track=sph&uuid=5896d847-38e4-4cb9-8fe1-103041c7c933",
+    "https://freepik.com/free-vector/mathematics-concept-illustration_10733824.htm#query=professor&position=13&from_view=search&track=sph&uuid=5b1a188a-64c5-45af-aae2-8573bc1bed3c",
+    "https://freepik.com/free-vector/content-concept-illustration_7171429.htm#query=media&position=3&from_view=search&track=sph&uuid=c7e35cf2-d85d-4bba-91a6-1cd883dcf153",
+    "https://freepik.com/free-vector/library-concept-illustration_9148008.htm#query=library&position=40&from_view=search&track=sph&uuid=abecc792-b6b2-4ec0-b318-5e6cc73ba649",
+];
+
+const socials = [
+    {
+        href: "https://twitter.com/SimonPrinceAI",
+        icon: FaSquareXTwitter,
+        alt: "Twitter",
+    },
+    {
+        href: "https://linkedin.com/in/simon-prince-615bb9165/",
+        icon: FaLinkedin,
+        alt: "LinkedIn",
+    },
+    {
+        href: "https://github.com/udlbook/udlbook",
+        icon: FaGithub,
+        alt: "GitHub",
+    },
+];
+
+export default function Footer() {
+    const scrollToHome = () => {
+        scroll.scrollToTop();
+    };
+
+    return (
+        <>
+            <FooterContainer>
+                <FooterWrap>
+                    <SocialMedia>
+                        <SocialMediaWrap>
+                            <SocialLogo to="/udlbook" onClick={scrollToHome}>
+                                Understanding Deep Learning
+                            </SocialLogo>
+                            <WebsiteRights>
+                                &copy; {new Date().getFullYear()} Simon J.D. Prince
+                            </WebsiteRights>
+                            <WebsiteRights>
+                                Images by StorySet on FreePik:{" "}
+                                {images.map((image, index) => (
+                                    <a key={index} href={image}>
+                                        [{index + 1}]
+                                    </a>
+                                ))}
+                            </WebsiteRights>
+                            <SocialIcons>
+                                {socials.map((social, index) => (
+                                    <SocialIconLink
+                                        key={index}
+                                        href={social.href}
+                                        target="_blank"
+                                        aria-label={social.alt}
+                                        alt={social.alt}
+                                    >
+                                        <social.icon />
+                                    </SocialIconLink>
+                                ))}
+                            </SocialIcons>
+                        </SocialMediaWrap>
+                    </SocialMedia>
+                </FooterWrap>
+            </FooterContainer>
+        </>
+    );
+}
--- a/src/components/HeroSection/HeroElements.jsx
+++ b/src/components/HeroSection/HeroElements.jsx
@@ -0,0 +1,294 @@
+import styled from "styled-components";
+
+export const HeroContainer = styled.div`
+    background: #57c6d1;
+    display: flex;
+    justify-content: center;
+    align-items: center;
+    padding: 0 0px;
+    position: static;
+    z-index: 1;
+`;
+
+export const HeroContent = styled.div`
+    z-index: 3;
+    width: 100%;
+    max-width: 1100px;
+    position: static;
+    padding: 8px 24px;
+    margin: 80px 0px;
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+`;
+
+export const HeroH1 = styled.h1`
+    color: #fff;
+    font-size: 48px;
+    text-align: center;
+
+    @media screen and (max-width: 768px) {
+        font-size: 40px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 32px;
+    }
+`;
+
+export const HeroP = styled.p`
+    margin-top: 24px;
+    color: #fff;
+    font-size: 24px;
+    text-align: center;
+    max-width: 600px;
+
+    @media screen and (max-width: 768px) {
+        font-size: 24px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 18px;
+    }
+`;
+
+export const HeroBtnWrapper = styled.div`
+    margin-top: 32px;
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+`;
+
+export const HeroRow = styled.div`
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 20px;
+    align-items: top;
+    grid-template-areas: "col1 col2";
+
+    @media screen and (max-width: 768px) {
+        grid-template-columns: 1fr;
+        grid-template-areas:
+            "col2"
+            "col1";
+    }
+`;
+
+export const HeroNewsItem = styled.div`
+    margin-left: 4px;
+    color: #000000;
+    font-size: 16px;
+    margin-bottom: 16px;
+    display: flex;
+    justify-content: start;
+`;
+
+export const HeroNewsItemDate = styled.div`
+    width: 20%;
+    margin-right: 20px;
+
+    @media screen and (max-width: 768px) {
+        font-size: 12px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 12px;
+    }
+`;
+
+export const HeroNewsItemContent = styled.div`
+    width: 80%;
+    color: #000000;
+
+    @media screen and (max-width: 768px) {
+        font-size: 12px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 12px;
+    }
+`;
+
+export const HeroColumn1 = styled.div`
+    margin-bottom: 15px;
+    margin-left: 12px;
+    margin-top: 60px;
+    padding: 10px 15px;
+    grid-area: col1;
+    display: flex;
+    flex-direction: column;
+    justify-content: space-between;
+
+    @media screen and (max-width: 768px) {
+        margin-left: 0;
+        margin-top: 20px;
+        padding: 0;
+    }
+`;
+
+export const HeroColumn2 = styled.div`
+    margin-bottom: 15px;
+    padding: 0 15px;
+    grid-area: col2;
+    display: flex;
+    align-items: center;
+    flex-direction: column;
+
+    @media screen and (max-width: 768px) {
+        padding: 0;
+    }
+`;
+
+export const TextWrapper = styled.div`
+    max-width: 540px;
+    padding-top: 0;
+    padding-bottom: 0;
+`;
+
+export const HeroImgWrap = styled.div`
+    max-width: 555px;
+    height: 100%;
+`;
+
+export const Img = styled.img`
+    width: 100%;
+    margin-top: 0;
+    margin-right: 0;
+    margin-left: 10px;
+    padding-right: 0;
+`;
+
+export const HeroDownloadsImg = styled.img`
+    margin-top: 5px;
+    margin-right: 0;
+    margin-left: 0;
+    padding-right: 0;
+    margin-bottom: 10px;
+`;
+
+export const HeroLink = styled.a`
+    color: #fff;
+    text-decoration: none;
+    padding: 0.6rem 0rem 0rem 0rem;
+    cursor: pointer;
+    position: relative;
+
+    &:before {
+        position: absolute;
+        margin: 0 auto;
+        top: 100%;
+        left: 0;
+        width: 100%;
+        height: 2px;
+        background-color: #fff;
+        content: "";
+        opacity: 0.3;
+        -webkit-transform: scaleX(1);
+        transition-property:
+            opacity,
+            -webkit-transform;
+        transition-duration: 0.3s;
+    }
+
+    &:hover:before {
+        opacity: 1;
+        -webkit-transform: scaleX(1.05);
+    }
+`;
+
+export const UDLLink = styled.a`
+    text-decoration: none;
+    color: #000;
+    font-weight: 300;
+    margin: 0 2px;
+    position: relative;
+
+    &:before {
+        position: absolute;
+        margin: 0 auto;
+        top: 100%;
+        left: 0;
+        width: 100%;
+        height: 2px;
+        background-color: #000;
+        content: "";
+        opacity: 0.3;
+        -webkit-transform: scaleX(1);
+        transition-property:
+            opacity,
+            -webkit-transform;
+        transition-duration: 0.3s;
+    }
+
+    &:hover:before {
+        opacity: 1;
+        -webkit-transform: scaleX(1.05);
+    }
+`;
+
+export const HeroNewsTitle = styled.div`
+    margin-left: 0px;
+    color: #000000;
+    font-size: 16px;
+    font-weight: bold;
+    line-height: 16px;
+    margin-bottom: 36px;
+
+    @media screen and (max-width: 768px) {
+        font-size: 24px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 18px;
+    }
+`;
+
+export const HeroCitationTitle = styled.div`
+    margin-left: 0px;
+    color: #000000;
+    font-size: 16px;
+    font-weight: bold;
+    line-height: 16px;
+    margin-bottom: 10px;
+    margin-top: 36px;
+
+    @media screen and (max-width: 768px) {
+        font-size: 24px;
+    }
+
+    @media screen and (max-width: 480px) {
+        font-size: 18px;
+    }
+`;
+
+export const HeroNewsBlock = styled.div``;
+
+export const HeroCitationBlock = styled.div`
+    font-size: 14px;
+    margin-bottom: 0px;
+    margin-top: 0px;
+`;
+
+export const HeroFollowBlock = styled.div`
+    @media screen and (max-width: 768px) {
+        font-size: 14px;
+    }
+`;
+
+export const HeroNewsMoreButton = styled.button`
+    background: #fff;
+    color: #000;
+    font-size: 16px;
+    padding: 10px 24px;
+    border: none;
+    border-radius: 4px;
+    cursor: pointer;
+    margin-top: 20px;
+    margin-bottom: 20px;
+    align-self: center;
+
+    &:hover {
+        background: #000;
+        color: #fff;
+    }
+`;
--- a/Show More
+++ b/Show More