udlbook/Notebooks/Chap09/9_3_Ensembling.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyOAC7YLEqN5qZhJXqRj+aHB",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap09/9_3_Ensembling.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# **Notebook 9.3: Ensembling**\n",
        "\n",
        "This notebook investigates how ensembling can improve the performance of models. We'll work with the simplified neural network model (figure 8.4 of book) which we can fit in closed form, and so we can eliminate any errors due to not finding the global maximum.\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TODO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n"
      ],
      "metadata": {
        "id": "el8l05WQEO46"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xhmIOLiZELV_"
      },
      "outputs": [],
      "source": [
        "# import libraries\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "# Define seed to get same results each time\n",
        "np.random.seed(1)"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# The true function that we are trying to estimate, defined on [0,1]\n",
        "def true_function(x):\n",
        "    y = np.exp(np.sin(x*(2*3.1413)))\n",
        "    return y"
      ],
      "metadata": {
        "id": "3hpqmFyQNrbt"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Generate some data points with or without noise\n",
        "def generate_data(n_data, sigma_y=0.3):\n",
        "    # Generate x values quasi uniformly\n",
        "    x = np.ones(n_data)\n",
        "    for i in range(n_data):\n",
        "        x[i] = np.random.uniform(i/n_data, (i+1)/n_data, 1)\n",
        "\n",
        "    # y value from running through function and adding noise\n",
        "    y = np.ones(n_data)\n",
        "    for i in range(n_data):\n",
        "        y[i] = true_function(x[i])\n",
        "        y[i] += np.random.normal(0, sigma_y, 1)\n",
        "    return x,y"
      ],
      "metadata": {
        "id": "skZMM5TbNwq4"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Draw the fitted function, together with uncertainty used to generate points\n",
        "def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
        "\n",
        "    fig,ax = plt.subplots()\n",
        "    ax.plot(x_func, y_func, 'k-')\n",
        "    if sigma_func is not None:\n",
        "      ax.fill_between(x_func, y_func-2*sigma_func, y_func+2*sigma_func, color='lightgray')\n",
        "\n",
        "    if x_data is not None:\n",
        "        ax.plot(x_data, y_data, 'o', color='#d18362')\n",
        "\n",
        "    if x_model is not None:\n",
        "        ax.plot(x_model, y_model, '-', color='#7fe7de')\n",
        "\n",
        "    if sigma_model is not None:\n",
        "      ax.fill_between(x_model, y_model-2*sigma_model, y_model+2*sigma_model, color='lightgray')\n",
        "\n",
        "    ax.set_xlim(0,1)\n",
        "    ax.set_xlabel('Input, $x$')\n",
        "    ax.set_ylabel('Output, $y$')\n",
        "    plt.show()"
      ],
      "metadata": {
        "id": "ziwD_R7lN0DY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Generate true function\n",
        "x_func = np.linspace(0, 1.0, 100)\n",
        "y_func = true_function(x_func);\n",
        "\n",
        "# Generate some data points\n",
        "np.random.seed(1)\n",
        "sigma_func = 0.3\n",
        "n_data = 15\n",
        "x_data,y_data = generate_data(n_data, sigma_func)\n",
        "\n",
        "# Plot the function, data and uncertainty\n",
        "plot_function(x_func, y_func, x_data, y_data, sigma_func=sigma_func)"
      ],
      "metadata": {
        "id": "2CgKanwaN3NM"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Define model -- beta is a scalar and omega has size n_hidden,1\n",
        "def network(x, beta, omega):\n",
        "    # Retrieve number of hidden units\n",
        "    n_hidden = omega.shape[0]\n",
        "\n",
        "    y = np.zeros_like(x)\n",
        "    for c_hidden in range(n_hidden):\n",
        "        # Evaluate activations based on shifted lines (figure 8.4b-d)\n",
        "        line_vals =  x  - c_hidden/n_hidden\n",
        "        h =  line_vals * (line_vals > 0)\n",
        "        # Weight activations by omega parameters and sum\n",
        "        y = y + omega[c_hidden] * h\n",
        "    # Add bias, beta\n",
        "    y = y + beta\n",
        "\n",
        "    return y"
      ],
      "metadata": {
        "id": "gorZ6i97N7AR"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# This fits the n_hidden+1 parameters (see fig 8.4a) in closed form.\n",
        "# If you have studied linear algebra, then you will know it is a least\n",
        "# squares solution of the form (A^TA)^-1A^Tb.  If you don't recognize that,\n",
        "# then just take it on trust that this gives you the best possible solution.\n",
        "def fit_model_closed_form(x,y,n_hidden):\n",
        "  n_data = len(x)\n",
        "  A = np.ones((n_data, n_hidden+1))\n",
        "  for i in range(n_data):\n",
        "      for j in range(1,n_hidden+1):\n",
        "          # Compute preactivation\n",
        "          A[i,j] = x[i]-(j-1)/n_hidden\n",
        "          # Apply the ReLU function\n",
        "          if A[i,j] < 0:\n",
        "              A[i,j] = 0;\n",
        "\n",
        "  # Add a tiny bit of regularization\n",
        "  reg_value = 0.00001\n",
        "  regMat = reg_value * np.identity(n_hidden+1)\n",
        "  regMat[0,0] = 0\n",
        "\n",
        "  ATA = np.matmul(np.transpose(A), A) +regMat\n",
        "  ATAInv = np.linalg.inv(ATA)\n",
        "  ATAInvAT = np.matmul(ATAInv, np.transpose(A))\n",
        "  beta_omega = np.matmul(ATAInvAT,y)\n",
        "  beta = beta_omega[0]\n",
        "  omega = beta_omega[1:]\n",
        "\n",
        "  return beta, omega"
      ],
      "metadata": {
        "id": "bMrLZUIqOwiM"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Closed form solution\n",
        "beta, omega = fit_model_closed_form(x_data,y_data,n_hidden=14)\n",
        "\n",
        "# Get prediction for model across graph range\n",
        "x_model = np.linspace(0,1,100);\n",
        "y_model = network(x_model, beta, omega)\n",
        "\n",
        "# Draw the function and the model\n",
        "plot_function(x_func, y_func, x_data,y_data, x_model, y_model)\n",
        "\n",
        "# Compute the mean squared error between the fitted model (cyan) and the true curve (black)\n",
        "mean_sq_error = np.mean((y_model-y_func) * (y_model-y_func))\n",
        "print(f\"Mean square error = {mean_sq_error:3.3f}\")"
      ],
      "metadata": {
        "id": "mzmtdY8DOz16"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Now let's resample the data with replacement four times.\n",
        "n_model = 4\n",
        "# Array to store the prediction from all of our models\n",
        "all_y_model = np.zeros((n_model, len(y_model)))\n",
        "\n",
        "# For each model\n",
        "for c_model in range(n_model):\n",
        "    # TODO Sample data indices with replacement (use np.random.choice)\n",
        "    # Replace this line\n",
        "    resampled_indices = np.arange(0,n_data,1);\n",
        "\n",
        "    # Extract the resampled x and y data\n",
        "    x_data_resampled = x_data[resampled_indices]\n",
        "    y_data_resampled = y_data[resampled_indices]\n",
        "\n",
        "    # Fit the model\n",
        "    beta, omega = fit_model_closed_form(x_data_resampled,y_data_resampled,n_hidden=14)\n",
        "\n",
        "    # Run the model\n",
        "    y_model_resampled = network(x_model, beta, omega)\n",
        "\n",
        "    # Store the results\n",
        "    all_y_model[c_model,:] = y_model_resampled\n",
        "\n",
        "    # Draw the function and the model\n",
        "    plot_function(x_func, y_func, x_data,y_data, x_model, y_model_resampled)\n",
        "\n",
        "    # Compute the mean squared error between the fitted model (cyan) and the true curve (black)\n",
        "    mean_sq_error = np.mean((y_model_resampled-y_func) * (y_model_resampled-y_func))\n",
        "    print(f\"Mean square error = {mean_sq_error:3.3f}\")"
      ],
      "metadata": {
        "id": "UKrAOEiKO8Go"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Plot the median of the results\n",
        "# TODO -- find the median prediction\n",
        "# Replace this line\n",
        "y_model_median = all_y_model[0,:]\n",
        "\n",
        "# Draw the function and the model\n",
        "plot_function(x_func, y_func, x_data,y_data, x_model, y_model_median)\n",
        "\n",
        "# Compute the mean squared error between the fitted model (cyan) and the true curve (black)\n",
        "mean_sq_error = np.mean((y_model_median-y_func) * (y_model_median-y_func))\n",
        "print(f\"Mean square error = {mean_sq_error:3.3f}\")"
      ],
      "metadata": {
        "id": "cUTaW_GMS6nb"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Plot the mean of the results\n",
        "# TODO -- find the mean prediction\n",
        "# Replace this line\n",
        "y_model_mean = all_y_model[0,:]\n",
        "\n",
        "# Draw the function and the model\n",
        "plot_function(x_func, y_func, x_data,y_data, x_model, y_model_mean)\n",
        "\n",
        "# Compute the mean squared error between the fitted model (cyan) and the true curve (black)\n",
        "mean_sq_error = np.mean((y_model_mean-y_func) * (y_model_mean-y_func))\n",
        "print(f\"Mean square error = {mean_sq_error:3.3f}\")"
      ],
      "metadata": {
        "id": "2MKxwVxuRvCx"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "You should see that both the median and mean models are better than any of the individual models. We have improved our performance at the cost of four times as much training time, storage, and inference time."
      ],
      "metadata": {
        "id": "K-jDZrfjWwBa"
      }
    }
  ]
}