Merge branch 'main' of https://github.com/udlbook/udlbook

2024-04-18 17:41:24 -04:00
parent 2d300a16a1 d057548be9
commit 4b939b7426
22 changed files with 1699 additions and 127 deletions
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "s5zzKSOusPOB"
@@ -41,7 +39,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "WV2Dl6owme2d"
@@ -49,11 +46,11 @@
      "source": [
        "**Linear functions**<br> We will be using the term *linear equation* to mean a weighted sum of inputs plus an offset. If there is just one input $x$, then this is a straight line:\n",
        "\n",
-        "\\begin{equation}y=\\beta+\\omega x,\\end{equation} \n",
+        "\\begin{equation}y=\\beta+\\omega x,\\end{equation}\n",
        "\n",
        "where $\\beta$ is the y-intercept of the linear and $\\omega$ is the slope of the line. When there are two inputs $x_{1}$ and $x_{2}$, then this becomes:\n",
        "\n",
-        "\\begin{equation}y=\\beta+\\omega_1 x_1 + \\omega_2 x_2.\\end{equation} \n",
+        "\\begin{equation}y=\\beta+\\omega_1 x_1 + \\omega_2 x_2.\\end{equation}\n",
        "\n",
        "Any other functions are by definition **non-linear**.\n",
        "\n",
@@ -99,7 +96,7 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([0,10]);ax.set_xlim([0,10])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('y')\n",
-        "plt.show\n",
+        "plt.show()\n",
        "\n",
        "# TODO -- experiment with changing the values of beta and omega\n",
        "# to understand what they do.  Try to make a line\n",
@@ -107,7 +104,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "AedfvD9dxShZ"
@@ -192,7 +188,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "i8tLwpls476R"
@@ -236,7 +231,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "fGzVJQ6N-mHJ"
@@ -275,11 +269,10 @@
        "# Compute with vector/matrix form\n",
        "y_vec = beta_vec+np.matmul(omega_mat, x_vec)\n",
        "print(\"Matrix/vector form\")\n",
-        "print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0],y_vec[1])))\n"
+        "print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0][0],y_vec[1][0])))\n"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "3LGRoTMLU8ZU"
@@ -293,7 +286,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "7Y5zdKtKZAB2"
@@ -325,11 +317,10 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([0,100]);ax.set_xlim([-5,5])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('exp[x]')\n",
-        "plt.show"
+        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "XyrT8257IWCu"
@@ -345,7 +336,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "R6A4e5IxIWCu"
@@ -373,11 +363,10 @@
        "ax.plot(x,y,'r-')\n",
        "ax.set_ylim([-5,5]);ax.set_xlim([0,5])\n",
        "ax.set_xlabel('x'); ax.set_ylabel('$\\log[x]$')\n",
-        "plt.show"
+        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "yYWrL5AXIWCv"
@@ -397,8 +386,8 @@
  ],
  "metadata": {
    "colab": {
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
@@ -420,4 +409,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOmndC0N7dFV7W3Mh5ljOLl",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -235,8 +234,8 @@
        "levels = 40\n",
        "ax.contour(phi0_mesh, phi1_mesh, all_losses ,levels, colors=['#80808080'])\n",
        "ax.set_ylim([1,-1])\n",
-        "ax.set_xlabel('Intercept, $\\phi_0$')\n",
-        "ax.set_ylabel('Slope, $\\phi_1$')\n",
+        "ax.set_xlabel(r'Intercept, $\\phi_0$')\n",
+        "ax.set_ylabel(r'Slope, $\\phi_1$')\n",
        "\n",
        "# Plot the position of your best fitting line on the loss function\n",
        "# It should be close to the minimum\n",
@@ -250,4 +249,4 @@
      "outputs": []
    }
  ]
-}
+}
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "1Z6LB4Ybn1oN"
@@ -42,7 +40,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "wQDy9UzXpnf5"
@@ -102,8 +99,8 @@
      "source": [
        "# Define a shallow neural network with, one input, one output, and three hidden units\n",
        "def shallow_1_1_3(x, activation_fn, phi_0,phi_1,phi_2,phi_3, theta_10, theta_11, theta_20, theta_21, theta_30, theta_31):\n",
-        "  # TODO Replace the lines below to compute the three initial lines\n",
-        "  # (figure 3.3a-c) from the theta parameters.  These are the preactivations\n",
+        "  # TODO Replace the code below to compute the three initial lines\n",
+        "  # from the theta parameters (i.e. implement equations at bottom of figure 3.3a-c).  These are the preactivations\n",
        "  pre_1 = np.zeros_like(x)\n",
        "  pre_2 = np.zeros_like(x)\n",
        "  pre_3 = np.zeros_like(x)\n",
@@ -199,7 +196,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "T34bszToImKQ"
@@ -210,7 +206,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "jhaBSS8oIWSX"
@@ -269,7 +264,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "osonHsEqVp2I"
@@ -354,9 +348,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyPBNztJrxnUt1ELWfm1Awa3",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
@@ -368,4 +361,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyPkFrjmRAUf0fxN07RC4xMI",
+      "authorship_tag": "ABX9TyPZzptvvf7OPZai8erQ/0xT",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -127,26 +127,26 @@
        "    fig, ax = plt.subplots(3,3)\n",
        "    fig.set_size_inches(8.5, 8.5)\n",
        "    fig.tight_layout(pad=3.0)\n",
-        "    ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel('$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
-        "    ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel('$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
-        "    ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel('$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
-        "    ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(\"$h_{1}^{'}$\")\n",
-        "    ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(\"$h_{2}^{'}$\")\n",
-        "    ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(\"$h_{3}^{'}$\")\n",
-        "    ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(\"$\\phi_1 h_{1}^{'}$\")\n",
-        "    ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(\"$\\phi_2 h_{2}^{'}$\")\n",
-        "    ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(\"$\\phi_3 h_{3}^{'}$\")\n",
+        "    ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel(r'$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
+        "    ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel(r'$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
+        "    ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel(r'$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
+        "    ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(r\"$h_{1}^{'}$\")\n",
+        "    ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(r\"$h_{2}^{'}$\")\n",
+        "    ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(r\"$h_{3}^{'}$\")\n",
+        "    ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(r\"$\\phi_1 h_{1}^{'}$\")\n",
+        "    ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(r\"$\\phi_2 h_{2}^{'}$\")\n",
+        "    ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(r\"$\\phi_3 h_{3}^{'}$\")\n",
        "\n",
        "    for plot_y in range(3):\n",
        "      for plot_x in range(3):\n",
        "        ax[plot_y,plot_x].set_xlim([0,1]);ax[plot_x,plot_y].set_ylim([-1,1])\n",
        "        ax[plot_y,plot_x].set_aspect(0.5)\n",
-        "      ax[2,plot_y].set_xlabel('Input, $x$');\n",
+        "      ax[2,plot_y].set_xlabel(r'Input, $x$');\n",
        "    plt.show()\n",
        "\n",
        "    fig, ax = plt.subplots()\n",
        "    ax.plot(x,y)\n",
-        "    ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
+        "    ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
        "    ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
        "    ax.set_aspect(0.5)\n",
        "    plt.show()"
@@ -118,7 +118,7 @@
        "  ax.plot(x_model,y_model)\n",
        "  if sigma_model is not None:\n",
        "    ax.fill_between(x_model, y_model-2*sigma_model, y_model+2*sigma_model, color='lightgray')\n",
-        "  ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
+        "  ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
        "  ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
        "  ax.set_aspect(0.5)\n",
        "  if title is not None:\n",
@@ -222,7 +222,7 @@
        "gauss_prob = normal_distribution(y_gauss, mu, sigma)\n",
        "fig, ax = plt.subplots()\n",
        "ax.plot(y_gauss, gauss_prob)\n",
-        "ax.set_xlabel('Input, $y$'); ax.set_ylabel('Probability $Pr(y)$')\n",
+        "ax.set_xlabel(r'Input, $y$'); ax.set_ylabel(r'Probability $Pr(y)$')\n",
        "ax.set_xlim([-5,5]);ax.set_ylim([0,1.0])\n",
        "plt.show()\n",
        "\n",
@@ -590,4 +590,4 @@
      }
    }
  ]
-}
+}
@@ -119,12 +119,12 @@
        "  fig.set_size_inches(7.0, 3.5)\n",
        "  fig.tight_layout(pad=3.0)\n",
        "  ax[0].plot(x_model,out_model)\n",
-        "  ax[0].set_xlabel('Input, $x$'); ax[0].set_ylabel('Model output')\n",
+        "  ax[0].set_xlabel(r'Input, $x$'); ax[0].set_ylabel(r'Model output')\n",
        "  ax[0].set_xlim([0,1]);ax[0].set_ylim([-4,4])\n",
        "  if title is not None:\n",
        "    ax[0].set_title(title)\n",
        "  ax[1].plot(x_model,lambda_model)\n",
-        "  ax[1].set_xlabel('Input, $x$'); ax[1].set_ylabel('$\\lambda$ or Pr(y=1|x)')\n",
+        "  ax[1].set_xlabel(r'Input, $x$'); ax[1].set_ylabel(r'$\\lambda$ or Pr(y=1|x)')\n",
        "  ax[1].set_xlim([0,1]);ax[1].set_ylim([-0.05,1.05])\n",
        "  if title is not None:\n",
        "    ax[1].set_title(title)\n",
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyN4E9Vtuk6t2BhZ0Ajv5SW3",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -67,7 +66,7 @@
        "  fig,ax = plt.subplots()\n",
        "  ax.plot(phi_plot,loss_function(phi_plot),'r-')\n",
        "  ax.set_xlim(0,1); ax.set_ylim(0,1)\n",
-        "  ax.set_xlabel('$\\phi$'); ax.set_ylabel('$L[\\phi]$')\n",
+        "  ax.set_xlabel(r'$\\phi$'); ax.set_ylabel(r'$L[\\phi]$')\n",
        "  if a is not None and b is not None and c is not None and d is not None:\n",
        "      plt.axvspan(a, d, facecolor='k', alpha=0.2)\n",
        "      ax.plot([a,a],[0,1],'b-')\n",
@@ -189,4 +188,4 @@
      "outputs": []
    }
  ]
-}
+}
@@ -108,8 +108,8 @@
        "    ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
        "    ax.plot(opt_path[0,:], opt_path[1,:],'-', color='#a0d9d3ff')\n",
        "    ax.plot(opt_path[0,:], opt_path[1,:],'.', color='#a0d9d3ff',markersize=10)\n",
-        "    ax.set_xlabel(\"$\\phi_{0}$\")\n",
-        "    ax.set_ylabel(\"$\\phi_{1}$\")\n",
+        "    ax.set_xlabel(r\"$\\phi_{0}$\")\n",
+        "    ax.set_ylabel(r\"$\\phi_{1}$\")\n",
        "    plt.show()"
      ],
      "metadata": {
@@ -83,6 +83,8 @@
    {
      "cell_type": "code",
      "source": [
+        "!mkdir ./sample_data\n",
+        "\n",
        "args = mnist1d.data.get_dataset_args()\n",
        "data = mnist1d.data.get_dataset(args, path='./sample_data/mnist1d_data.pkl', download=False, regenerate=False)\n",
        "\n",
@@ -136,7 +138,6 @@
        "optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
        "# object that decreases learning rate by half every 10 epochs\n",
        "scheduler = StepLR(optimizer, step_size=10, gamma=0.5)\n",
-        "# create 100 dummy data points and store in data loader class\n",
        "x_train = torch.tensor(data['x'].astype('float32'))\n",
        "y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
        "x_test= torch.tensor(data['x_test'].astype('float32'))\n",
@@ -235,4 +236,4 @@
      }
    }
  ]
-}
+}
@@ -92,7 +92,7 @@
    {
      "cell_type": "code",
      "source": [
-        "# Draw the fitted function, together win uncertainty used to generate points\n",
+        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
@@ -203,7 +203,7 @@
        "# Closed form solution\n",
        "beta, omega = fit_model_closed_form(x_data,y_data,n_hidden=3)\n",
        "\n",
-        "# Get prediction for model across graph grange\n",
+        "# Get prediction for model across graph range\n",
        "x_model = np.linspace(0,1,100);\n",
        "y_model = network(x_model, beta, omega)\n",
        "\n",
@@ -302,7 +302,7 @@
        "sigma_func = 0.3\n",
        "n_hidden = 5\n",
        "\n",
-        "# Set random seed so that get same result every time\n",
+        "# Set random seed so that we get the same result every time\n",
        "np.random.seed(1)\n",
        "\n",
        "for c_hidden in range(len(hidden_variables)):\n",
@@ -344,4 +344,4 @@
      "outputs": []
    }
  ]
-}
+}
@@ -124,7 +124,7 @@
        "  D_k = n_hidden   # Hidden dimensions\n",
        "  D_o = 10    # Output dimensions\n",
        "\n",
-        "  # Define a model with two hidden layers of size 100\n",
+        "  # Define a model with two hidden layers\n",
        "  # And ReLU activations between them\n",
        "  model = nn.Sequential(\n",
        "  nn.Linear(D_i, D_k),\n",
@@ -157,7 +157,6 @@
        "  optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)\n",
        "\n",
        "\n",
-        "  # create 100 dummy data points and store in data loader class\n",
        "  x_train = torch.tensor(data['x'].astype('float32'))\n",
        "  y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
        "  x_test= torch.tensor(data['x_test'].astype('float32'))\n",
@@ -267,4 +266,4 @@
      "outputs": []
    }
  ]
-}
+}
@@ -224,7 +224,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "You should see see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
+        "You should see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
        "\n",
        "The conclusion of all of this is that in high dimensions you should be sceptical of your intuitions about how things work.  I have tried to visualize many things in one or two dimensions in the book, but you should also be sceptical about these visualizations!"
      ],
@@ -233,4 +233,4 @@
      }
    }
  ]
-}
+}
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyOR3WOJwfTlMD8eOLsPfPrz",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -140,7 +139,7 @@
        "    fig.set_size_inches(7,7)\n",
        "    ax.contourf(phi0mesh, phi1mesh, loss_function, 256, cmap=my_colormap);\n",
        "    ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
-        "    ax.set_xlabel('$\\phi_{0}$'); ax.set_ylabel('$\\phi_{1}$')\n",
+        "    ax.set_xlabel(r'$\\phi_{0}$'); ax.set_ylabel(r'$\\phi_{1}$')\n",
        "\n",
        "    if grad_path_typical_lr is not None:\n",
        "        ax.plot(grad_path_typical_lr[0,:], grad_path_typical_lr[1,:],'ro-')\n",
@@ -335,4 +334,4 @@
      }
    }
  ]
-}
+}
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "el8l05WQEO46"
@@ -159,7 +157,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "i8T_QduzeBmM"
@@ -195,7 +192,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "JojV6ueRk49G"
@@ -211,7 +207,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "YX0O_Ciwp4W1"
@@ -277,7 +272,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "GjPnlG4q0UFK"
@@ -334,7 +328,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "GiNg5EroUiUb"
@@ -343,17 +336,16 @@
        "Now we need to perform inference for a new data points $\\mathbf{x}^*$ with corresponding hidden values $\\mathbf{h}^*$.  Instead of having a single estimate of the parameters, we have a distribution over the possible parameters.  So we marginalize (integrate) over this distribution to account for all possible values:\n",
        "\n",
        "\\begin{align}\n",
-        "Pr(y^*|\\mathbf{x}^*)  &=& \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
-        "&=& \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
-        "&=& \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},  [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
-        "[\\mathbf{h}^*;1]\\biggr]\n",
+        "Pr(y^*|\\mathbf{x}^*)  &= \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
+        "&= \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
+        "&= \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},  [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
+        "[\\mathbf{h}^*;1]\\biggr],\n",
        "\\end{align}\n",
        "\n",
+        "where the notation $[\\mathbf{h}^{*T},1]$ is a row vector containing $\\mathbf{h}^{T}$ with a one appended to the end and $[\\mathbf{h};1 ]$ is a column vector containing $\\mathbf{h}$ with a one appended to the end.\n",
        "\n",
        "\n",
-        "\n",
-        "To compute this, we reformulated the integrand using the relations from appendices\n",
-        "C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
+        "To compute this, we reformulated the integrand using the relations from appendices C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
        "to $\\boldsymbol\\phi$. The integral of the normal distribution must be one, and so the final result is just the constant. This constant is itself a normal distribution in $y^*$. <br>\n",
        "\n",
        "If you feel so inclined you can work through the math of this yourself.\n",
@@ -404,7 +396,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "8Hcbe_16sK0F"
@@ -419,9 +410,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyMB8B4269DVmrcLoCWrhzKF",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
@@ -433,4 +423,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
@@ -4,7 +4,7 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMLKg5ZmXqojcVrZD5BGm9g",
+      "authorship_tag": "ABX9TyP3VmRg51U+7NCfSYjRRrgv",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -267,8 +267,8 @@
        "  fig,ax = plt.subplots()\n",
        "  ax.plot(np.squeeze(x_in), np.squeeze(dydx), 'b-')\n",
        "  ax.set_xlim(-2,2)\n",
-        "  ax.set_xlabel('Input, $x$')\n",
-        "  ax.set_ylabel('Gradient, $dy/dx$')\n",
+        "  ax.set_xlabel(r'Input, $x$')\n",
+        "  ax.set_ylabel(r'Gradient, $dy/dx$')\n",
        "  ax.set_title('No layers = %d'%(K))\n",
        "  plt.show()"
      ],
@@ -4,7 +4,6 @@
  "metadata": {
    "colab": {
      "provenance": [],
-      "authorship_tag": "ABX9TyMSk8qTqDYqFnRJVZKlsue0",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -147,9 +146,7 @@
        "  exp_values = np.exp(data_in) ;\n",
        "  # Sum over columns\n",
        "  denom = np.sum(exp_values, axis = 0);\n",
-        "  # Replicate denominator to N rows\n",
-        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
-        "  # Compute softmax\n",
+        "  # Compute softmax (numpy broadcasts denominator to all rows automatically)\n",
        "  softmax = exp_values / denom\n",
        "  # return the answer\n",
        "  return softmax"
@@ -128,7 +128,7 @@
    {
      "cell_type": "code",
      "source": [
-        "draw_2D_heatmap(dist_mat,'Distance $|i-j|$', my_colormap)"
+        "draw_2D_heatmap(dist_mat,r'Distance $|i-j|$', my_colormap)"
      ],
      "metadata": {
        "id": "G0HFPBXyHT6V"
@@ -197,7 +197,7 @@
      "cell_type": "code",
      "source": [
        "TP = np.array(opt.x).reshape(10,10)\n",
-        "draw_2D_heatmap(TP,'Transport plan $\\mathbf{P}$', my_colormap)"
+        "draw_2D_heatmap(TP,r'Transport plan $\\mathbf{P}$', my_colormap)"
      ],
      "metadata": {
        "id": "nZGfkrbRV_D0"
@@ -1,18 +1,16 @@
 {
  "cells": [
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "t9vk9Elugvmi"
@@ -40,7 +38,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "paLz5RukZP1J"
@@ -114,7 +111,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "r5Hl2QkimWx9"
@@ -139,13 +135,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, expected_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "zTCykVeWqj_O"
@@ -253,13 +248,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, deriv_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ASu4yKSwAEYI"
@@ -269,7 +263,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "xoFR1wifc8-b"
@@ -366,13 +359,12 @@
        "\n",
        "fig,ax = plt.subplots()\n",
        "ax.plot(phi_vals, deriv_vals,'r-')\n",
-        "ax.set_xlabel('Parameter $\\phi$')\n",
-        "ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
+        "ax.set_xlabel(r'Parameter $\\phi$')\n",
+        "ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
        "plt.show()"
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "1TWBiUC7bQSw"
@@ -403,7 +395,6 @@
      ]
    },
    {
-      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "d-0tntSYdKPR"
@@ -415,9 +406,8 @@
  ],
  "metadata": {
    "colab": {
-      "authorship_tag": "ABX9TyOxO2/0DTH4n4zhC97qbagY",
-      "include_colab_link": true,
-      "provenance": []
+      "provenance": [],
+      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
@@ -429,4 +419,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}