udlbook/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "el8l05WQEO46"
      },
      "source": [
        "# **Notebook 9.4: Bayesian approach**\n",
        "\n",
        "This notebook investigates the Bayesian approach to model fitting and reproduces figure 9.11 from the book.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xhmIOLiZELV_"
      },
      "outputs": [],
      "source": [
        "# import libraries\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "# Define seed to get same results each time\n",
        "np.random.seed(1)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3hpqmFyQNrbt"
      },
      "outputs": [],
      "source": [
        "# The true function that we are trying to estimate, defined on [0,1]\n",
        "def true_function(x):\n",
        "    y = np.exp(np.sin(x*(2*3.1413)))\n",
        "    return y"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "skZMM5TbNwq4"
      },
      "outputs": [],
      "source": [
        "# Generate some data points with or without noise\n",
        "def generate_data(n_data, sigma_y=0.3):\n",
        "    # Generate x values quasi uniformly\n",
        "    x = np.ones(n_data)\n",
        "    for i in range(n_data):\n",
        "        x[i] = np.random.uniform(i/n_data, (i+1)/n_data, 1)\n",
        "\n",
        "    # y value from running through function and adding noise\n",
        "    y = np.ones(n_data)\n",
        "    for i in range(n_data):\n",
        "        y[i] = true_function(x[i])\n",
        "        y[i] += np.random.normal(0, sigma_y, 1)\n",
        "    return x,y"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ziwD_R7lN0DY"
      },
      "outputs": [],
      "source": [
        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
        "    ax.plot(x_func, y_func, 'k-')\n",
        "    if sigma_func is not None:\n",
        "      ax.fill_between(x_func, y_func-2*sigma_func, y_func+2*sigma_func, color='lightgray')\n",
        "\n",
        "    if x_data is not None:\n",
        "        ax.plot(x_data, y_data, 'o', color='#d18362')\n",
        "\n",
        "    if x_model is not None:\n",
        "        ax.plot(x_model, y_model, '-', color='#7fe7de')\n",
        "\n",
        "    if sigma_model is not None:\n",
        "      ax.fill_between(x_model, y_model-2*sigma_model, y_model+2*sigma_model, color='lightgray')\n",
        "\n",
        "    ax.set_xlim(0,1)\n",
        "    ax.set_xlabel('Input, $x$')\n",
        "    ax.set_ylabel('Output, $y$')\n",
        "    plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2CgKanwaN3NM"
      },
      "outputs": [],
      "source": [
        "# Generate true function\n",
        "x_func = np.linspace(0, 1.0, 100)\n",
        "y_func = true_function(x_func);\n",
        "\n",
        "# Generate some data points\n",
        "np.random.seed(1)\n",
        "sigma_func = 0.3\n",
        "n_data = 15\n",
        "x_data,y_data = generate_data(n_data, sigma_func)\n",
        "\n",
        "# Plot the function, data and uncertainty\n",
        "plot_function(x_func, y_func, x_data, y_data, sigma_func=sigma_func)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gorZ6i97N7AR"
      },
      "outputs": [],
      "source": [
        "# Define model -- beta is a scalar and omega has size n_hidden,1\n",
        "def network(x, beta, omega):\n",
        "    # Retrieve number of hidden units\n",
        "    n_hidden = omega.shape[0]\n",
        "\n",
        "    y = np.zeros_like(x)\n",
        "    for c_hidden in range(n_hidden):\n",
        "        # Evaluate activations based on shifted lines (figure 8.4b-d)\n",
        "        line_vals =  x  - c_hidden/n_hidden\n",
        "        h =  line_vals * (line_vals > 0)\n",
        "        # Weight activations by omega parameters and sum\n",
        "        y = y + omega[c_hidden] * h\n",
        "    # Add bias, beta\n",
        "    y = y + beta\n",
        "\n",
        "    return y"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i8T_QduzeBmM"
      },
      "source": [
        "Now let's compute a probability distribution over the model parameters using Bayes's rule:\n",
        "\n",
        "\\begin{equation}\n",
        " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) = \\frac{\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) Pr(\\boldsymbol\\phi)}{\\int \\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) Pr(\\boldsymbol\\phi)d\\boldsymbol\\phi } ,\n",
        "\\end{equation}\n",
        "\n",
        "We'll define the prior $Pr(\\boldsymbol\\phi)$ as:\n",
        "\n",
        "\\begin{equation}\n",
        "Pr(\\boldsymbol\\phi) = \\text{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\n",
        "\\end{equation}\n",
        "\n",
        "where $\\phi=[\\omega_1,\\omega_2\\ldots \\omega_n, \\beta]^T$ and $\\sigma^2_{p}$  is the prior variance.\n",
        "\n",
        "The likelihood term $\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi)$ is given by:\n",
        "\n",
        "\\begin{align}\n",
        "\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) &=& \\prod_{i=1}^{I} \\text{Norm}_{y_i}\\bigl[\\text{f}[\\mathbf{x}_{i},\\boldsymbol\\phi],\\sigma_d^2\\bigr]\\\\\n",
        "&=& \\prod_{i=1}^{I} \\text{Norm}_{y_i}\\bigl[\\boldsymbol\\omega\\mathbf{h}_i+\\beta,\\sigma_d^2\\bigr]\\\\\n",
        "&=& \\text{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\mathbf{I}\\bigr].\n",
        "\\end{align}\n",
        "\n",
        "where $\\sigma^2$ is the measurement noise and $\\mathbf{h}_{i}$ is the column vector of hidden variables for the $i^{th}$ input.  Here the vector $\\mathbf{y}$ and matrix $\\mathbf{H}$ are defined as:\n",
        "\n",
        "\\begin{equation}\n",
        "\\mathbf{y} = \\begin{bmatrix}y_1\\\\y_2\\\\\\vdots\\\\y_{I}\\end{bmatrix}\\quad\\text{and}\\quad \\mathbf{H} = \\begin{bmatrix}\\mathbf{h}_{1}&\\mathbf{h}_{2}&\\cdots&\\mathbf{h}_{I}\\\\1&1&\\cdots &1\\end{bmatrix}.\n",
        "\\end{equation}\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JojV6ueRk49G"
      },
      "source": [
        "To make progress we use the change of variable relation (Appendix C.3.4 of the book) to rewrite the likelihood term as a normal distribution in the parameters $\\boldsymbol\\phi$:\n",
        "\n",
        "\\begin{align}\n",
        "\\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi+\\beta)\n",
        "&=& \\text{Norm}_{\\mathbf{y}}\\bigl[\\mathbf{H}^T\\boldsymbol\\phi,\\sigma^2\\bigr]\\\\\n",
        "&\\propto& \\text{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^t)^{-1}\\bigr]\n",
        "\\end{align}\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YX0O_Ciwp4W1"
      },
      "source": [
        "Finally, we can combine the likelihood and prior terms using the product of two normal distributions relation (Appendix C.3.3).\n",
        "\n",
        "\\begin{align}\n",
        " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &\\propto& \\prod_{i=1}^{I} Pr(\\mathbf{y}_{i}|\\mathbf{x}_{i},\\boldsymbol\\phi) Pr(\\boldsymbol\\phi)\\\\\n",
        " &\\propto&\\text{Norm}_{\\boldsymbol\\phi}\\bigl[(\\mathbf{H}\\mathbf{H}^T)^{-1}\\mathbf{H}\\mathbf{y},\\sigma^2(\\mathbf{H}\\mathbf{H}^T)^{-1}\\bigr] \\text{Norm}_{\\boldsymbol\\phi}\\bigl[\\mathbf{0},\\sigma^2_p\\mathbf{I}\\bigr]\\\\\n",
        " &\\propto&\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n",
        "\\end{align}\n",
        "\n",
        "In fact, since this is already a normal distribution, the constant of proportionality must be one and we can write\n",
        "\n",
        "\\begin{align}\n",
        " Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) &=& \\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr].\n",
        "\\end{align}\n",
        "\n",
        "TODO -- On a piece of paper, use the relations in Appendix C.3.3 and C.3.4 to fill in the missing steps and establish that this is the correct formula for the posterior."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nF1AcgNDwm4t"
      },
      "outputs": [],
      "source": [
        "def compute_H(x_data, n_hidden):\n",
        "  psi1 = np.ones((n_hidden+1,1));\n",
        "  psi0 = np.linspace(0.0, 1.0, num=n_hidden, endpoint=False) * -1\n",
        "\n",
        "  n_data = x_data.size\n",
        "  # First compute the hidden variables\n",
        "  H = np.ones((n_hidden+1, n_data))\n",
        "  for i in range(n_hidden):\n",
        "    for j in range(n_data):\n",
        "      # Compute preactivation\n",
        "      H[i,j] = psi1[i] * x_data[j]+psi0[i]\n",
        "      # Apply ReLU to get activation\n",
        "      if H[i,j] < 0:\n",
        "        H[i,j] = 0;\n",
        "\n",
        "  return H\n",
        "\n",
        "def compute_param_mean_covar(x_data, y_data, n_hidden, sigma_sq, sigma_p_sq):\n",
        "  # Retrieve the matrix containing the hidden variables\n",
        "  H = compute_H(x_data, n_hidden) ;\n",
        "\n",
        "  # TODO -- Compute the covariance matrix (you will need np.transpose(), np.matmul(), np.linalg.inv())\n",
        "  # Replace this line\n",
        "  phi_covar = np.identity(n_hidden+1)\n",
        "\n",
        "\n",
        "  # TODO -- Compute the mean matrix\n",
        "  # Replace this line\n",
        "  phi_mean = np.zeros((n_hidden+1,1))\n",
        "\n",
        "\n",
        "  return phi_mean, phi_covar"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GjPnlG4q0UFK"
      },
      "source": [
        "Now we can draw samples from this distribution"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "K4vYc82D0BMq"
      },
      "outputs": [],
      "source": [
        "# Define parameters\n",
        "n_hidden = 5\n",
        "sigma_sq = sigma_func * sigma_func\n",
        "# Arbitrary large value reflecting the fact we are uncertain about the\n",
        "# parameters before we see any data\n",
        "sigma_p_sq = 1000\n",
        "\n",
        "# Compute the mean and covariance matrix\n",
        "phi_mean, phi_covar = compute_param_mean_covar(x_data, y_data, n_hidden, sigma_sq, sigma_p_sq)\n",
        "\n",
        "# Let's draw the mean model\n",
        "x_model = x_func\n",
        "y_model_mean = network(x_model, phi_mean[-1], phi_mean[0:n_hidden])\n",
        "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_mean)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TVIjhubkSw-R"
      },
      "outputs": [],
      "source": [
        "# TODO Draw two samples from the normal distribution over the parameters\n",
        "# Replace these lines\n",
        "phi_sample1 = np.zeros((n_hidden+1,1))\n",
        "phi_sample2 = np.zeros((n_hidden+1,1))\n",
        "\n",
        "\n",
        "# Run the network for these two sample sets of parameters\n",
        "y_model_sample1 = network(x_model, phi_sample1[-1], phi_sample1[0:n_hidden])\n",
        "y_model_sample2 = network(x_model, phi_sample2[-1], phi_sample2[0:n_hidden])\n",
        "\n",
        "# Draw the two models\n",
        "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_sample1)\n",
        "plot_function(x_func, y_func, x_data, y_data, x_model, y_model_sample2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GiNg5EroUiUb"
      },
      "source": [
        "Now we need to perform inference for a new data points $\\mathbf{x}^*$ with corresponding hidden values $\\mathbf{h}^*$.  Instead of having a single estimate of the parameters, we have a distribution over the possible parameters.  So we marginalize (integrate) over this distribution to account for all possible values:\n",
        "\n",
        "\\begin{align}\n",
        "Pr(y^*|\\mathbf{x}^*)  &= \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
        "&= \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
        "&= \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},  [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
        "[\\mathbf{h}^*;1]\\biggr],\n",
        "\\end{align}\n",
        "\n",
        "where the notation $[\\mathbf{h}^{*T},1]$ is a row vector containing $\\mathbf{h}^{T}$ with a one appended to the end and $[\\mathbf{h};1 ]$ is a column vector containing $\\mathbf{h}$ with a one appended to the end.\n",
        "\n",
        "\n",
        "To compute this, we reformulated the integrand using the relations from appendices C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
        "to $\\boldsymbol\\phi$. The integral of the normal distribution must be one, and so the final result is just the constant. This constant is itself a normal distribution in $y^*$. <br>\n",
        "\n",
        "If you feel so inclined you can work through the math of this yourself.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ILxT4EfW2lUm"
      },
      "outputs": [],
      "source": [
        "# Predict mean and variance of y_star from x_star\n",
        "def inference(x_star, x_data, y_data, sigma_sq, sigma_p_sq, n_hidden):\n",
        "\n",
        "  # Compute hidden variables\n",
        "  h_star = compute_H(x_star, n_hidden);\n",
        "  H = compute_H(x_data, n_hidden);\n",
        "\n",
        "  # TODO: Compute mean and variance of y*\n",
        "  # Replace these lines:\n",
        "  y_star_mean = 0\n",
        "  y_star_var =  1\n",
        "\n",
        "  return y_star_mean, y_star_var"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "87cjUjMaixHZ"
      },
      "outputs": [],
      "source": [
        "x_model = x_func\n",
        "y_model = np.zeros_like(x_model)\n",
        "y_model_std = np.zeros_like(x_model)\n",
        "for c_model in range(len(x_model)):\n",
        "  y_star_mean, y_star_var = inference(x_model[c_model]*np.ones((1,1)), x_data, y_data, sigma_sq, sigma_p_sq, n_hidden)\n",
        "  y_model[c_model] = y_star_mean\n",
        "  y_model_std[c_model] = np.sqrt(y_star_var)\n",
        "\n",
        "# Draw the model\n",
        "plot_function(x_func, y_func, x_data, y_data, x_model, y_model, sigma_model=y_model_std)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8Hcbe_16sK0F"
      },
      "source": [
        "TODO:\n",
        "\n",
        "1.  Experiment running this again with different numbers of hidden units.  Make a prediction for what will happen when you increase / decrease them.\n",
        "2.  Experiment with what happens if you make the prior variance $\\sigma^2_p$ to a smaller value like 1.  How do you explain the results?"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}