diff --git a/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb b/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb index c0a4117..25eac22 100644 --- a/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb +++ b/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb @@ -1,33 +1,22 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "authorship_tag": "ABX9TyMB8B4269DVmrcLoCWrhzKF", - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { + "attachments": {}, "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "el8l05WQEO46" + }, "source": [ "# **Notebook 9.4: Bayesian approach**\n", "\n", @@ -36,10 +25,7 @@ "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n", "\n", "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n" - ], - "metadata": { - "id": "el8l05WQEO46" - } + ] }, { "cell_type": "code", @@ -58,20 +44,25 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3hpqmFyQNrbt" + }, + "outputs": [], "source": [ "# The true function that we are trying to estimate, defined on [0,1]\n", "def true_function(x):\n", " y = np.exp(np.sin(x*(2*3.1413)))\n", " return y" - ], - "metadata": { - "id": "3hpqmFyQNrbt" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "skZMM5TbNwq4" + }, + "outputs": [], "source": [ "# Generate some data points with or without noise\n", "def generate_data(n_data, sigma_y=0.3):\n", @@ -86,15 +77,15 @@ " y[i] = true_function(x[i])\n", " y[i] += np.random.normal(0, sigma_y, 1)\n", " return x,y" - ], - "metadata": { - "id": "skZMM5TbNwq4" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ziwD_R7lN0DY" + }, + "outputs": [], "source": [ "# Draw the fitted function, together win uncertainty used to generate points\n", "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n", @@ -117,15 +108,15 @@ " ax.set_xlabel('Input, $x$')\n", " ax.set_ylabel('Output, $y$')\n", " plt.show()" - ], - "metadata": { - "id": "ziwD_R7lN0DY" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2CgKanwaN3NM" + }, + "outputs": [], "source": [ "# Generate true function\n", "x_func = np.linspace(0, 1.0, 100)\n", @@ -139,15 +130,15 @@ "\n", "# Plot the function, data and uncertainty\n", "plot_function(x_func, y_func, x_data, y_data, sigma_func=sigma_func)" - ], - "metadata": { - "id": "2CgKanwaN3NM" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gorZ6i97N7AR" + }, + "outputs": [], "source": [ "# Define model -- beta is a scalar and omega has size n_hidden,1\n", "def network(x, beta, omega):\n", @@ -165,15 +156,14 @@ " y = y + beta\n", "\n", " return y" - ], - "metadata": { - "id": "gorZ6i97N7AR" - }, - "execution_count": null, - "outputs": [] + ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "i8T_QduzeBmM" + }, "source": [ "Now let's compute a probability distribution over the model parameters using Bayes's rule:\n", "\n", @@ -184,69 +174,73 @@ "We'll define the prior $Pr(\\boldsymbol\\phi)$ as:\n", "\n", "\\begin{equation}\n", - "Pr(\\boldsymbol\\phi) = \\mbox{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\n", + "Pr(\\boldsymbol\\phi) = \\text{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\n", "\\end{equation}\n", "\n", "where $\\phi=[\\omega_1,\\omega_2\\ldots \\omega_n, \\beta]^T$ and $\\sigma^2_{p}$ is the prior variance.\n", "\n", "The likelihood term $\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi)$ is given by:\n", "\n", - "\\begin{eqnarray}\n", - "\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) &=& \\prod_{i=1}^{I} \\mbox{Norm}_{y_i}\\bigl[\\mbox{f}[\\mathbf{x}_{i},\\boldsymbol\\phi],\\sigma_d^2\\bigr]\\\\\n", - "&=& \\prod_{i=1}^{I} \\mbox{Norm}_{y_i}\\bigl[\\boldsymbol\\omega\\mathbf{h}_i+\\beta,\\sigma_d^2\\bigr]\\\\\n", - "&=& \\mbox{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\mathbf{I}\\bigr].\n", - "\\end{eqnarray}\n", + "\\begin{align}\n", + "\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) &=& \\prod_{i=1}^{I} \\text{Norm}_{y_i}\\bigl[\\text{f}[\\mathbf{x}_{i},\\boldsymbol\\phi],\\sigma_d^2\\bigr]\\\\\n", + "&=& \\prod_{i=1}^{I} \\text{Norm}_{y_i}\\bigl[\\boldsymbol\\omega\\mathbf{h}_i+\\beta,\\sigma_d^2\\bigr]\\\\\n", + "&=& \\text{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\mathbf{I}\\bigr].\n", + "\\end{align}\n", "\n", "where $\\sigma^2$ is the measurement noise and $\\mathbf{h}_{i}$ is the column vector of hidden variables for the $i^{th}$ input. Here the vector $\\mathbf{y}$ and matrix $\\mathbf{H}$ are defined as:\n", "\n", "\\begin{equation}\n", - "\\mathbf{y} = \\begin{bmatrix}y_1\\\\y_2\\\\\\vdots\\\\y_{I}\\end{bmatrix}\\quad\\mbox{and}\\quad \\mathbf{H} = \\begin{bmatrix}\\mathbf{h}_{1}&\\mathbf{h}_{2}&\\cdots&\\mathbf{h}_{I}\\\\1&1&\\cdots &1\\end{bmatrix}.\n", + "\\mathbf{y} = \\begin{bmatrix}y_1\\\\y_2\\\\\\vdots\\\\y_{I}\\end{bmatrix}\\quad\\text{and}\\quad \\mathbf{H} = \\begin{bmatrix}\\mathbf{h}_{1}&\\mathbf{h}_{2}&\\cdots&\\mathbf{h}_{I}\\\\1&1&\\cdots &1\\end{bmatrix}.\n", "\\end{equation}\n" - ], - "metadata": { - "id": "i8T_QduzeBmM" - } + ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "JojV6ueRk49G" + }, "source": [ "To make progress we use the change of variable relation (Appendix C.3.4 of the book) to rewrite the likelihood term as a normal distribution in the parameters $\\boldsymbol\\phi$:\n", "\n", - "\\begin{eqnarray}\n", + "\\begin{align}\n", "\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi+\\beta)\n", - "&=& \\mbox{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\bigr]\\\\\n", - "&\\propto& \\mbox{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^t)^{-1}\\bigr]\n", - "\\end{eqnarray}\n" - ], - "metadata": { - "id": "JojV6ueRk49G" - } + "&=& \\text{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\bigr]\\\\\n", + "&\\propto& \\text{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^t)^{-1}\\bigr]\n", + "\\end{align}\n" + ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "YX0O_Ciwp4W1" + }, "source": [ "Finally, we can combine the likelihood and prior terms using the product of two normal distributions relation (Appendix C.3.3).\n", "\n", - "\\begin{eqnarray}\n", + "\\begin{align}\n", " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &\\propto& \\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) Pr(\\boldsymbol\\phi)\\\\\n", - " &\\propto&\\mbox{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^T)^{-1}\\bigr] \\mbox{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\\\\\n", - " &\\propto&\\mbox{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n", - "\\end{eqnarray}\n", + " &\\propto&\\text{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^T)^{-1}\\bigr] \\text{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\\\\\n", + " &\\propto&\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n", + "\\end{align}\n", "\n", "In fact, since this already a normal distribution, the constant of proportionality must be one and we can write\n", "\n", - "\\begin{eqnarray}\n", - " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &=& \\mbox{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n", - "\\end{eqnarray}\n", + "\\begin{align}\n", + " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &=& \\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n", + "\\end{align}\n", "\n", "TODO -- On a piece of paper, use the relations in Appendix C.3.3 and C.3.4 to fill in the missing steps and establish that this is the correct formula for the posterior." - ], - "metadata": { - "id": "YX0O_Ciwp4W1" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nF1AcgNDwm4t" + }, + "outputs": [], "source": [ "def compute_H(x_data, n_hidden):\n", " psi1 = np.ones((n_hidden+1,1));\n", @@ -280,24 +274,25 @@ "\n", "\n", " return phi_mean, phi_covar" - ], - "metadata": { - "id": "nF1AcgNDwm4t" - }, - "execution_count": null, - "outputs": [] + ] }, { + "attachments": {}, "cell_type": "markdown", - "source": [ - "Now we can draw samples from this distribution" - ], "metadata": { "id": "GjPnlG4q0UFK" - } + }, + "source": [ + "Now we can draw samples from this distribution" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "K4vYc82D0BMq" + }, + "outputs": [], "source": [ "# Define parameters\n", "n_hidden = 5\n", @@ -313,15 +308,15 @@ "x_model = x_func\n", "y_model_mean = network(x_model, phi_mean[-1], phi_mean[0:n_hidden])\n", "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_mean)" - ], - "metadata": { - "id": "K4vYc82D0BMq" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TVIjhubkSw-R" + }, + "outputs": [], "source": [ "# TODO Draw two samples from the normal distribution over the parameters\n", "# Replace these lines\n", @@ -336,37 +331,42 @@ "# Draw the two models\n", "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_sample1)\n", "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_sample2)" - ], - "metadata": { - "id": "TVIjhubkSw-R" - }, - "execution_count": null, - "outputs": [] + ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "GiNg5EroUiUb" + }, "source": [ "Now we need to perform inference for a new data points $\\mathbf{x}^*$ with corresponding hidden values $\\mathbf{h}^*$. Instead of having a single estimate of the parameters, we have a distribution over the possible parameters. So we marginalize (integrate) over this distribution to account for all possible values:\n", "\n", - "\\begin{eqnarray}\n", + "\\begin{align}\n", "Pr(y^*|\\mathbf{x}^*) &=& \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n", - "&=& \\int \\mbox{Norm}_{y^*}\\bigl[\\begin{bmatrix}\\mathbf{h}^{*T}&1\\end{bmatrix}\\boldsymbol\\phi,\\sigma^2]\\cdot\\mbox{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n", - "&=& \\mbox{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} \\begin{bmatrix}\\mathbf{h}^{*T}&1\\end{bmatrix}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y}, \\begin{bmatrix}\\mathbf{h}^{*T}&1\\end{bmatrix}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n", - "\\begin{bmatrix}\\mathbf{h}^*\\\\1\\end{bmatrix}\\biggr]\n", - "\\end{eqnarray}\n", + "&=& \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n", + "&=& \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y}, [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n", + "[\\mathbf{h}^*;1]\\biggr]\n", + "\\end{align}\n", + "\n", + "\n", + "\n", "\n", "To compute this, we reformulated the integrand using the relations from appendices\n", "C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n", "to $\\boldsymbol\\phi$. The integral of the normal distribution must be one, and so the final result is just the constant. This constant is itself a normal distribution in $y^*$.
\n", "\n", - "If you feel so inclined you can work through the math of this yourself." - ], - "metadata": { - "id": "GiNg5EroUiUb" - } + "If you feel so inclined you can work through the math of this yourself.\n", + "\n" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ILxT4EfW2lUm" + }, + "outputs": [], "source": [ "# Predict mean and variance of y_star from x_star\n", "def inference(x_star, x_data, y_data, sigma_sq, sigma_p_sq, n_hidden):\n", @@ -381,15 +381,15 @@ " y_star_var = 1\n", "\n", " return y_star_mean, y_star_var" - ], - "metadata": { - "id": "ILxT4EfW2lUm" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "87cjUjMaixHZ" + }, + "outputs": [], "source": [ "x_model = x_func\n", "y_model = np.zeros_like(x_model)\n", @@ -401,24 +401,36 @@ "\n", "# Draw the model\n", "plot_function(x_func, y_func, x_data, y_data, x_model, y_model, sigma_model=y_model_std)\n" - ], - "metadata": { - "id": "87cjUjMaixHZ" - }, - "execution_count": null, - "outputs": [] + ] }, { + "attachments": {}, "cell_type": "markdown", + "metadata": { + "id": "8Hcbe_16sK0F" + }, "source": [ "TODO:\n", "\n", "1. Experiment running this again with different numbers of hidden units. Make a prediction for what will happen when you increase / decrease them.\n", "2. Experiment with what happens if you make the prior variance $\\sigma^2_p$ to a smaller value like 1. How do you explain the results?" - ], - "metadata": { - "id": "8Hcbe_16sK0F" - } + ] } - ] -} \ No newline at end of file + ], + "metadata": { + "colab": { + "authorship_tag": "ABX9TyMB8B4269DVmrcLoCWrhzKF", + "include_colab_link": true, + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}