Compare commits
22 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
70878c94be | ||
|
|
832a3c9104 | ||
|
|
68cd38bf2f | ||
|
|
964d26c684 | ||
|
|
c1d56880a6 | ||
|
|
0cd321fb96 | ||
|
|
043969fb79 | ||
|
|
4052d29b0a | ||
|
|
19ee3afd90 | ||
|
|
91f99c7398 | ||
|
|
96fe95cea1 | ||
|
|
fc35dc4168 | ||
|
|
80497e298d | ||
|
|
8c658ac321 | ||
|
|
4a08fa44bf | ||
|
|
c5755efe68 | ||
|
|
55b425b41b | ||
|
|
e000397470 | ||
|
|
b4d0b49776 | ||
|
|
f5c1b2af2e | ||
|
|
1cf990bfe7 | ||
|
|
ae967f8e7e |
@@ -67,7 +67,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# Define a linear function with just one input, x\n",
|
"# Define a linear function with just one input, x\n",
|
||||||
"def linear_function_1D(x,beta,omega):\n",
|
"def linear_function_1D(x,beta,omega):\n",
|
||||||
" # TODO -- replace the code lin below with formula for 1D linear equation\n",
|
" # TODO -- replace the code line below with formula for 1D linear equation\n",
|
||||||
" y = x\n",
|
" y = x\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return y"
|
" return y"
|
||||||
|
|||||||
@@ -4,7 +4,7 @@
|
|||||||
"metadata": {
|
"metadata": {
|
||||||
"colab": {
|
"colab": {
|
||||||
"provenance": [],
|
"provenance": [],
|
||||||
"authorship_tag": "ABX9TyOwOpROPBel8eYGzp5DGRkt",
|
"authorship_tag": "ABX9TyP5wHK5E7/el+vxU947K3q8",
|
||||||
"include_colab_link": true
|
"include_colab_link": true
|
||||||
},
|
},
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
@@ -55,7 +55,7 @@
|
|||||||
"This is a composition of the functions $\\cos[\\bullet],\\exp[\\bullet],\\sin[\\bullet]$. I chose these just because you probably already know the derivatives of these functions:\n",
|
"This is a composition of the functions $\\cos[\\bullet],\\exp[\\bullet],\\sin[\\bullet]$. I chose these just because you probably already know the derivatives of these functions:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\\begin{eqnarray*}\n",
|
"\\begin{eqnarray*}\n",
|
||||||
" \\frac{\\partial \\cos[z]}{\\partial z} = -\\sin[z] \\quad\\quad \\frac{\\partial \\exp[z]}{\\partial z} = \\exp[z] \\quad\\quad \\frac{\\partial \\sin[z]}{\\partial z} = -\\cos[z].\n",
|
" \\frac{\\partial \\cos[z]}{\\partial z} = -\\sin[z] \\quad\\quad \\frac{\\partial \\exp[z]}{\\partial z} = \\exp[z] \\quad\\quad \\frac{\\partial \\sin[z]}{\\partial z} = \\cos[z].\n",
|
||||||
"\\end{eqnarray*}\n",
|
"\\end{eqnarray*}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Suppose that we have a least squares loss function:\n",
|
"Suppose that we have a least squares loss function:\n",
|
||||||
@@ -107,8 +107,8 @@
|
|||||||
" return beta3+omega3 * np.cos(beta2 + omega2 * np.exp(beta1 + omega1 * np.sin(beta0 + omega0 * x)))\n",
|
" return beta3+omega3 * np.cos(beta2 + omega2 * np.exp(beta1 + omega1 * np.sin(beta0 + omega0 * x)))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def likelihood(x, y, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3):\n",
|
"def likelihood(x, y, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3):\n",
|
||||||
" diff = fn(x, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3) - y ;\n",
|
" diff = fn(x, beta0, beta1, beta2, beta3, omega0, omega1, omega2, omega3) - y\n",
|
||||||
" return diff * diff ;"
|
" return diff * diff"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -123,8 +123,8 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"source": [
|
"source": [
|
||||||
"beta0 = 1.0; beta1 = 2.0; beta2 = -3.0; beta3 = 0.4;\n",
|
"beta0 = 1.0; beta1 = 2.0; beta2 = -3.0; beta3 = 0.4\n",
|
||||||
"omega0 = 0.1; omega1 = -0.4; omega2 = 2.0; omega3 = 3.0;\n",
|
"omega0 = 0.1; omega1 = -0.4; omega2 = 2.0; omega3 = 3.0\n",
|
||||||
"x = 2.3; y =2.0\n",
|
"x = 2.3; y =2.0\n",
|
||||||
"l_i_func = likelihood(x,y,beta0,beta1,beta2,beta3,omega0,omega1,omega2,omega3)\n",
|
"l_i_func = likelihood(x,y,beta0,beta1,beta2,beta3,omega0,omega1,omega2,omega3)\n",
|
||||||
"print('l_i=%3.3f'%l_i_func)"
|
"print('l_i=%3.3f'%l_i_func)"
|
||||||
|
|||||||
392
Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
Normal file
392
Notebooks/Chap11/11_1_Shattered_Gradients.ipynb
Normal file
@@ -0,0 +1,392 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyMrF4rB2hTKq7XzLuYsURdL",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 11.1: Shattered gradients**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook investigates the phenomenon of shattered gradients as discussed in section 11.1.1. It replicates some of the experiments in [Balduzzi et al. (2017)](https://arxiv.org/abs/1702.08591).\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "pOZ6Djz0dhoy"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "iaFyNGhU21VJ"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"First let's define a neural network. We'll initialize both the weights and biases randomly with Glorot initialization (He initialization without the factor of two)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YcNlAxnE3XXn"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# K is width, D is number of hidden units in each layer\n",
|
||||||
|
"def init_params(K, D):\n",
|
||||||
|
" # Set seed so we always get the same random numbers\n",
|
||||||
|
" np.random.seed(1)\n",
|
||||||
|
"\n",
|
||||||
|
" # Input layer\n",
|
||||||
|
" D_i = 1\n",
|
||||||
|
" # Output layer\n",
|
||||||
|
" D_o = 1\n",
|
||||||
|
"\n",
|
||||||
|
" # Glorot initialization\n",
|
||||||
|
" sigma_sq_omega = 1.0/D\n",
|
||||||
|
"\n",
|
||||||
|
" # Make empty lists\n",
|
||||||
|
" all_weights = [None] * (K+1)\n",
|
||||||
|
" all_biases = [None] * (K+1)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create parameters for input and output layers\n",
|
||||||
|
" all_weights[0] = np.random.normal(size=(D, D_i))*np.sqrt(sigma_sq_omega)\n",
|
||||||
|
" all_weights[-1] = np.random.normal(size=(D_o, D)) * np.sqrt(sigma_sq_omega)\n",
|
||||||
|
" all_biases[0] = np.random.normal(size=(D,1))* np.sqrt(sigma_sq_omega)\n",
|
||||||
|
" all_biases[-1]= np.random.normal(size=(D_o,1))* np.sqrt(sigma_sq_omega)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create intermediate layers\n",
|
||||||
|
" for layer in range(1,K):\n",
|
||||||
|
" all_weights[layer] = np.random.normal(size=(D,D))*np.sqrt(sigma_sq_omega)\n",
|
||||||
|
" all_biases[layer] = np.random.normal(size=(D,1))* np.sqrt(sigma_sq_omega)\n",
|
||||||
|
"\n",
|
||||||
|
" return all_weights, all_biases"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "kr-q7hc23Bn9"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The next two functions define the forward pass of the algorithm"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "kwcn5z7-dq_1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define the Rectified Linear Unit (ReLU) function\n",
|
||||||
|
"def ReLU(preactivation):\n",
|
||||||
|
" activation = preactivation.clip(0.0)\n",
|
||||||
|
" return activation\n",
|
||||||
|
"\n",
|
||||||
|
"def forward_pass(net_input, all_weights, all_biases):\n",
|
||||||
|
"\n",
|
||||||
|
" # Retrieve number of layers\n",
|
||||||
|
" K = len(all_weights) -1\n",
|
||||||
|
"\n",
|
||||||
|
" # We'll store the pre-activations at each layer in a list \"all_f\"\n",
|
||||||
|
" # and the activations in a second list[all_h].\n",
|
||||||
|
" all_f = [None] * (K+1)\n",
|
||||||
|
" all_h = [None] * (K+1)\n",
|
||||||
|
"\n",
|
||||||
|
" #For convenience, we'll set\n",
|
||||||
|
" # all_h[0] to be the input, and all_f[K] will be the output\n",
|
||||||
|
" all_h[0] = net_input\n",
|
||||||
|
"\n",
|
||||||
|
" # Run through the layers, calculating all_f[0...K-1] and all_h[1...K]\n",
|
||||||
|
" for layer in range(K):\n",
|
||||||
|
" # Update preactivations and activations at this layer according to eqn 7.5\n",
|
||||||
|
" all_f[layer] = all_biases[layer] + np.matmul(all_weights[layer], all_h[layer])\n",
|
||||||
|
" all_h[layer+1] = ReLU(all_f[layer])\n",
|
||||||
|
"\n",
|
||||||
|
" # Compute the output from the last hidden layer\n",
|
||||||
|
" all_f[K] = all_biases[K] + np.matmul(all_weights[K], all_h[K])\n",
|
||||||
|
"\n",
|
||||||
|
" # Retrieve the output\n",
|
||||||
|
" net_output = all_f[K]\n",
|
||||||
|
"\n",
|
||||||
|
" return net_output, all_f, all_h"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "_2w-Tr7G3sYq"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The next two functions compute the gradient of the output with respect to the input using the back propagation algorithm."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "aM2l7QafeC8T"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# We'll need the indicator function\n",
|
||||||
|
"def indicator_function(x):\n",
|
||||||
|
" x_in = np.array(x)\n",
|
||||||
|
" x_in[x_in>=0] = 1\n",
|
||||||
|
" x_in[x_in<0] = 0\n",
|
||||||
|
" return x_in\n",
|
||||||
|
"\n",
|
||||||
|
"# Main backward pass routine\n",
|
||||||
|
"def calc_input_output_gradient(x_in, all_weights, all_biases):\n",
|
||||||
|
"\n",
|
||||||
|
" # Run the forward pass\n",
|
||||||
|
" y, all_f, all_h = forward_pass(x_in, all_weights, all_biases)\n",
|
||||||
|
"\n",
|
||||||
|
" # We'll store the derivatives dl_dweights and dl_dbiases in lists as well\n",
|
||||||
|
" all_dl_dweights = [None] * (K+1)\n",
|
||||||
|
" all_dl_dbiases = [None] * (K+1)\n",
|
||||||
|
" # And we'll store the derivatives of the loss with respect to the activation and preactivations in lists\n",
|
||||||
|
" all_dl_df = [None] * (K+1)\n",
|
||||||
|
" all_dl_dh = [None] * (K+1)\n",
|
||||||
|
" # Again for convenience we'll stick with the convention that all_h[0] is the net input and all_f[k] in the net output\n",
|
||||||
|
"\n",
|
||||||
|
" # Compute derivatives of net output with respect to loss\n",
|
||||||
|
" all_dl_df[K] = np.ones_like(all_f[K])\n",
|
||||||
|
"\n",
|
||||||
|
" # Now work backwards through the network\n",
|
||||||
|
" for layer in range(K,-1,-1):\n",
|
||||||
|
" all_dl_dbiases[layer] = np.array(all_dl_df[layer])\n",
|
||||||
|
" all_dl_dweights[layer] = np.matmul(all_dl_df[layer], all_h[layer].transpose())\n",
|
||||||
|
"\n",
|
||||||
|
" all_dl_dh[layer] = np.matmul(all_weights[layer].transpose(), all_dl_df[layer])\n",
|
||||||
|
"\n",
|
||||||
|
" if layer > 0:\n",
|
||||||
|
" all_dl_df[layer-1] = indicator_function(all_f[layer-1]) * all_dl_dh[layer]\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
" return all_dl_dh[0],y"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "DwR3eGMgV8bl"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Double check we have the gradient correct using finite differences"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "Ar_VmraReSWe"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"D = 200; K = 3\n",
|
||||||
|
"# Initialize parameters\n",
|
||||||
|
"all_weights, all_biases = init_params(K,D)\n",
|
||||||
|
"\n",
|
||||||
|
"x = np.ones((1,1))\n",
|
||||||
|
"dydx,y = calc_input_output_gradient(x, all_weights, all_biases)\n",
|
||||||
|
"\n",
|
||||||
|
"# Offset for finite gradients\n",
|
||||||
|
"delta = 0.00000001\n",
|
||||||
|
"x1 = x\n",
|
||||||
|
"y1,*_ = forward_pass(x1, all_weights, all_biases)\n",
|
||||||
|
"x2 = x+delta\n",
|
||||||
|
"y2,*_ = forward_pass(x2, all_weights, all_biases)\n",
|
||||||
|
"# Finite difference calculation\n",
|
||||||
|
"dydx_fd = (y2-y1)/delta\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"Gradient calculation=%f, Finite difference gradient=%f\"%(dydx,dydx_fd))\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "KJpQPVd36Haq"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Helper function that computes the derivatives for a 1D array of input values and plots them."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YC-LAYRKtbxp"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def plot_derivatives(K, D):\n",
|
||||||
|
"\n",
|
||||||
|
" # Initialize parameters\n",
|
||||||
|
" all_weights, all_biases = init_params(K,D)\n",
|
||||||
|
"\n",
|
||||||
|
" x_in = np.arange(-2,2, 4.0/256.0)\n",
|
||||||
|
" x_in = np.resize(x_in, (1,len(x_in)))\n",
|
||||||
|
" dydx,y = calc_input_output_gradient(x_in, all_weights, all_biases)\n",
|
||||||
|
"\n",
|
||||||
|
" fig,ax = plt.subplots()\n",
|
||||||
|
" ax.plot(np.squeeze(x_in), np.squeeze(dydx), 'b-')\n",
|
||||||
|
" ax.set_xlim(-2,2)\n",
|
||||||
|
" ax.set_xlabel('Input, $x$')\n",
|
||||||
|
" ax.set_ylabel('Gradient, $dy/dx$')\n",
|
||||||
|
" ax.set_title('No layers = %d'%(K))\n",
|
||||||
|
" plt.show()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "uJr5eDe648jF"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Build a model with one hidden layer and 200 neurons and plot derivatives\n",
|
||||||
|
"D = 200; K = 1\n",
|
||||||
|
"plot_derivatives(K,D)\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Interpret this result\n",
|
||||||
|
"# Why does the plot have some flat regions?\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Add code to plot the derivatives for models with 24 and 50 hidden layers\n",
|
||||||
|
"# with 200 neurons per layer\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Why does this graph not have visible flat regions?\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Why does the magnitude of the gradients decrease as we increase the number\n",
|
||||||
|
"# of hidden layers\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Do you find this a convincing replication of the experiment in the original paper? (I don't)\n",
|
||||||
|
"# Can you help me find why I have failed to replicate this result? udlbookmail@gmail.com"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "56gTMTCb49KO"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Let's look at the autocorrelation function now"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "f_0zjQbxuROQ"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def autocorr(dydx):\n",
|
||||||
|
" # TODO -- compute the autocorrelation function\n",
|
||||||
|
" # Use the numpy function \"correlate\" with the mode set to \"same\"\n",
|
||||||
|
" # Replace this line:\n",
|
||||||
|
" ac = np.ones((256,1))\n",
|
||||||
|
"\n",
|
||||||
|
" return ac"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "ggnO8hfoRN1e"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Helper function to plot the autocorrelation function and normalize so correlation is one with offset of zero"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "EctWSV1RuddK"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def plot_autocorr(K, D):\n",
|
||||||
|
"\n",
|
||||||
|
" # Initialize parameters\n",
|
||||||
|
" all_weights, all_biases = init_params(K,D)\n",
|
||||||
|
"\n",
|
||||||
|
" x_in = np.arange(-2.0,2.0, 4.0/256)\n",
|
||||||
|
" x_in = np.resize(x_in, (1,len(x_in)))\n",
|
||||||
|
" dydx,y = calc_input_output_gradient(x_in, all_weights, all_biases)\n",
|
||||||
|
" ac = autocorr(np.squeeze(dydx))\n",
|
||||||
|
" ac = ac / ac[128]\n",
|
||||||
|
"\n",
|
||||||
|
" y = ac[128:]\n",
|
||||||
|
" x = np.squeeze(x_in)[128:]\n",
|
||||||
|
" fig,ax = plt.subplots()\n",
|
||||||
|
" ax.plot(x,y, 'b-')\n",
|
||||||
|
" ax.set_xlim([0,2])\n",
|
||||||
|
" ax.set_xlabel('Distance')\n",
|
||||||
|
" ax.set_ylabel('Autocorrelation')\n",
|
||||||
|
" ax.set_title('No layers = %d'%(K))\n",
|
||||||
|
" plt.show()\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "2LKlZ9u_WQXN"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Plot the autocorrelation functions\n",
|
||||||
|
"D = 200; K =1\n",
|
||||||
|
"plot_autocorr(K,D)\n",
|
||||||
|
"D = 200; K =50\n",
|
||||||
|
"plot_autocorr(K,D)\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Do you find this a convincing replication of the experiment in the original paper? (I don't)\n",
|
||||||
|
"# Can you help me find why I have failed to replicate this result?"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "RD9JTdjNWw6p"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
277
Notebooks/Chap11/11_2_Residual_Networks.ipynb
Normal file
277
Notebooks/Chap11/11_2_Residual_Networks.ipynb
Normal file
@@ -0,0 +1,277 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyObut1y9atNUuowPT6dMY+I",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_2_Residual_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 11.2: Residual Networks**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook adapts the networks for MNIST1D to use residual connections.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
|
||||||
|
"!git clone https://github.com/greydanus/mnist1d"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "D5yLObtZCi9J"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import os\n",
|
||||||
|
"import torch, torch.nn as nn\n",
|
||||||
|
"from torch.utils.data import TensorDataset, DataLoader\n",
|
||||||
|
"from torch.optim.lr_scheduler import StepLR\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import mnist1d\n",
|
||||||
|
"import random"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YrXWAH7sUWvU"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"args = mnist1d.data.get_dataset_args()\n",
|
||||||
|
"data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
|
||||||
|
"\n",
|
||||||
|
"# The training and test input and outputs are in\n",
|
||||||
|
"# data['x'], data['y'], data['x_test'], and data['y_test']\n",
|
||||||
|
"print(\"Examples in training set: {}\".format(len(data['y'])))\n",
|
||||||
|
"print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
|
||||||
|
"print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "twI72ZCrCt5z"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Load in the data\n",
|
||||||
|
"train_data_x = data['x'].transpose()\n",
|
||||||
|
"train_data_y = data['y']\n",
|
||||||
|
"val_data_x = data['x_test'].transpose()\n",
|
||||||
|
"val_data_y = data['y_test']\n",
|
||||||
|
"# Print out sizes\n",
|
||||||
|
"print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
|
||||||
|
"print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "8bKADvLHbiV5"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Define the network"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "_sFvRDGrl4qe"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# There are 40 input dimensions and 10 output dimensions for this data\n",
|
||||||
|
"# The inputs correspond to the 40 offsets in the MNIST1D template.\n",
|
||||||
|
"D_i = 40\n",
|
||||||
|
"# The outputs correspond to the 10 digits\n",
|
||||||
|
"D_o = 10\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# We will adapt this model to have residual connections around the linear layers\n",
|
||||||
|
"# This is the same model we used in practical 8.1, but we can't use the sequential\n",
|
||||||
|
"# class for residual networks (which aren't strictly sequential). Hence, I've rewritten\n",
|
||||||
|
"# it as a model that inherits from a base class\n",
|
||||||
|
"\n",
|
||||||
|
"class ResidualNetwork(torch.nn.Module):\n",
|
||||||
|
" def __init__(self, input_size, output_size, hidden_size=100):\n",
|
||||||
|
" super(ResidualNetwork, self).__init__()\n",
|
||||||
|
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
|
||||||
|
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear4 = nn.Linear(hidden_size, output_size)\n",
|
||||||
|
" print(\"Initialized MLPBase model with {} parameters\".format(self.count_params()))\n",
|
||||||
|
"\n",
|
||||||
|
" def count_params(self):\n",
|
||||||
|
" return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
|
||||||
|
"\n",
|
||||||
|
"# # TODO -- Add residual connections to this model\n",
|
||||||
|
"# # The order of operations should similar to figure 11.5b\n",
|
||||||
|
"# # linear1 first, ReLU+linear2 in first residual block, ReLU+linear3 in second residual block), linear4 at end\n",
|
||||||
|
"# # Replace this function\n",
|
||||||
|
" def forward(self, x):\n",
|
||||||
|
" h1 = self.linear1(x).relu()\n",
|
||||||
|
" h2 = self.linear2(h1).relu()\n",
|
||||||
|
" h3 = self.linear3(h2).relu()\n",
|
||||||
|
" return self.linear4(h3)\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "FslroPJJffrh"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# He initialization of weights\n",
|
||||||
|
"def weights_init(layer_in):\n",
|
||||||
|
" if isinstance(layer_in, nn.Linear):\n",
|
||||||
|
" nn.init.kaiming_uniform_(layer_in.weight)\n",
|
||||||
|
" layer_in.bias.data.fill_(0.0)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YgLaex1pfhqz"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"#Define the model\n",
|
||||||
|
"model = ResidualNetwork(40, 10)\n",
|
||||||
|
"\n",
|
||||||
|
"# choose cross entropy loss function (equation 5.24 in the loss notes)\n",
|
||||||
|
"loss_function = nn.CrossEntropyLoss()\n",
|
||||||
|
"# construct SGD optimizer and initialize learning rate and momentum\n",
|
||||||
|
"optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
|
||||||
|
"# object that decreases learning rate by half every 20 epochs\n",
|
||||||
|
"scheduler = StepLR(optimizer, step_size=20, gamma=0.5)\n",
|
||||||
|
"# convert data to torch tensors\n",
|
||||||
|
"x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
|
||||||
|
"y_train = torch.tensor(train_data_y.astype('long'))\n",
|
||||||
|
"x_val= torch.tensor(val_data_x.transpose().astype('float32'))\n",
|
||||||
|
"y_val = torch.tensor(val_data_y.astype('long'))\n",
|
||||||
|
"\n",
|
||||||
|
"# load the data into a class that creates the batches\n",
|
||||||
|
"data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
|
||||||
|
"\n",
|
||||||
|
"# Initialize model weights\n",
|
||||||
|
"model.apply(weights_init)\n",
|
||||||
|
"\n",
|
||||||
|
"# loop over the dataset n_epoch times\n",
|
||||||
|
"n_epoch = 100\n",
|
||||||
|
"# store the loss and the % correct at each epoch\n",
|
||||||
|
"losses_train = np.zeros((n_epoch))\n",
|
||||||
|
"errors_train = np.zeros((n_epoch))\n",
|
||||||
|
"losses_val = np.zeros((n_epoch))\n",
|
||||||
|
"errors_val = np.zeros((n_epoch))\n",
|
||||||
|
"\n",
|
||||||
|
"for epoch in range(n_epoch):\n",
|
||||||
|
" # loop over batches\n",
|
||||||
|
" for i, data in enumerate(data_loader):\n",
|
||||||
|
" # retrieve inputs and labels for this batch\n",
|
||||||
|
" x_batch, y_batch = data\n",
|
||||||
|
" # zero the parameter gradients\n",
|
||||||
|
" optimizer.zero_grad()\n",
|
||||||
|
" # forward pass -- calculate model output\n",
|
||||||
|
" pred = model(x_batch)\n",
|
||||||
|
" # compute the loss\n",
|
||||||
|
" loss = loss_function(pred, y_batch)\n",
|
||||||
|
" # backward pass\n",
|
||||||
|
" loss.backward()\n",
|
||||||
|
" # SGD update\n",
|
||||||
|
" optimizer.step()\n",
|
||||||
|
"\n",
|
||||||
|
" # Run whole dataset to get statistics -- normally wouldn't do this\n",
|
||||||
|
" pred_train = model(x_train)\n",
|
||||||
|
" pred_val = model(x_val)\n",
|
||||||
|
" _, predicted_train_class = torch.max(pred_train.data, 1)\n",
|
||||||
|
" _, predicted_val_class = torch.max(pred_val.data, 1)\n",
|
||||||
|
" errors_train[epoch] = 100 - 100 * (predicted_train_class == y_train).float().sum() / len(y_train)\n",
|
||||||
|
" errors_val[epoch]= 100 - 100 * (predicted_val_class == y_val).float().sum() / len(y_val)\n",
|
||||||
|
" losses_train[epoch] = loss_function(pred_train, y_train).item()\n",
|
||||||
|
" losses_val[epoch]= loss_function(pred_val, y_val).item()\n",
|
||||||
|
" print(f'Epoch {epoch:5d}, train loss {losses_train[epoch]:.6f}, train error {errors_train[epoch]:3.2f}, val loss {losses_val[epoch]:.6f}, percent error {errors_val[epoch]:3.2f}')\n",
|
||||||
|
"\n",
|
||||||
|
" # tell scheduler to consider updating learning rate\n",
|
||||||
|
" scheduler.step()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "NYw8I_3mmX5c"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Plot the results\n",
|
||||||
|
"fig, ax = plt.subplots()\n",
|
||||||
|
"ax.plot(errors_train,'r-',label='train')\n",
|
||||||
|
"ax.plot(errors_val,'b-',label='test')\n",
|
||||||
|
"ax.set_ylim(0,100); ax.set_xlim(0,n_epoch)\n",
|
||||||
|
"ax.set_xlabel('Epoch'); ax.set_ylabel('Error')\n",
|
||||||
|
"ax.set_title('TrainError %3.2f, Val Error %3.2f'%(errors_train[-1],errors_val[-1]))\n",
|
||||||
|
"ax.legend()\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "CcP_VyEmE2sv"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The primary motivation of residual networks is to allow training of much deeper networks. \n",
|
||||||
|
"\n",
|
||||||
|
"TODO: Try running this network with and without the residual connections. Does adding the residual connections change the performance?"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "wMmqhmxuAx0M"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
328
Notebooks/Chap11/11_3_Batch_Normalization.ipynb
Normal file
328
Notebooks/Chap11/11_3_Batch_Normalization.ipynb
Normal file
@@ -0,0 +1,328 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyOoGS+lY+EhGthebSO4smpj",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap11/11_3_Batch_Normalization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 11.3: Batch normalization**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook investigates the use of batch normalization in residual networks.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
|
||||||
|
"!git clone https://github.com/greydanus/mnist1d"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "D5yLObtZCi9J"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import os\n",
|
||||||
|
"import torch, torch.nn as nn\n",
|
||||||
|
"from torch.utils.data import TensorDataset, DataLoader\n",
|
||||||
|
"from torch.optim.lr_scheduler import StepLR\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import mnist1d\n",
|
||||||
|
"import random"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YrXWAH7sUWvU"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"args = mnist1d.data.get_dataset_args()\n",
|
||||||
|
"data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
|
||||||
|
"\n",
|
||||||
|
"# The training and test input and outputs are in\n",
|
||||||
|
"# data['x'], data['y'], data['x_test'], and data['y_test']\n",
|
||||||
|
"print(\"Examples in training set: {}\".format(len(data['y'])))\n",
|
||||||
|
"print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
|
||||||
|
"print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "twI72ZCrCt5z"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Load in the data\n",
|
||||||
|
"train_data_x = data['x'].transpose()\n",
|
||||||
|
"train_data_y = data['y']\n",
|
||||||
|
"val_data_x = data['x_test'].transpose()\n",
|
||||||
|
"val_data_y = data['y_test']\n",
|
||||||
|
"# Print out sizes\n",
|
||||||
|
"print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
|
||||||
|
"print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "8bKADvLHbiV5"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def print_variance(name, data):\n",
|
||||||
|
" # First dimension(rows) is batch elements\n",
|
||||||
|
" # Second dimension(columns) is neurons.\n",
|
||||||
|
" np_data = data.detach().numpy()\n",
|
||||||
|
" # Compute variance across neurons and average these variances over members of the batch\n",
|
||||||
|
" neuron_variance = np.mean(np.var(np_data, axis=0))\n",
|
||||||
|
" # Print out the name and the variance\n",
|
||||||
|
" print(\"%s variance=%f\"%(name,neuron_variance))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "3bBpJIV-N-lt"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# He initialization of weights\n",
|
||||||
|
"def weights_init(layer_in):\n",
|
||||||
|
" if isinstance(layer_in, nn.Linear):\n",
|
||||||
|
" nn.init.kaiming_uniform_(layer_in.weight)\n",
|
||||||
|
" layer_in.bias.data.fill_(0.0)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YgLaex1pfhqz"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def run_one_step_of_model(model, x_train, y_train):\n",
|
||||||
|
" # choose cross entropy loss function (equation 5.24 in the loss notes)\n",
|
||||||
|
" loss_function = nn.CrossEntropyLoss()\n",
|
||||||
|
" # construct SGD optimizer and initialize learning rate and momentum\n",
|
||||||
|
" optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
|
||||||
|
"\n",
|
||||||
|
" # load the data into a class that creates the batches\n",
|
||||||
|
" data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=200, shuffle=True, worker_init_fn=np.random.seed(1))\n",
|
||||||
|
"\n",
|
||||||
|
" # Initialize model weights\n",
|
||||||
|
" model.apply(weights_init)\n",
|
||||||
|
"\n",
|
||||||
|
" # Get a batch\n",
|
||||||
|
" for i, data in enumerate(data_loader):\n",
|
||||||
|
" # retrieve inputs and labels for this batch\n",
|
||||||
|
" x_batch, y_batch = data\n",
|
||||||
|
" # zero the parameter gradients\n",
|
||||||
|
" optimizer.zero_grad()\n",
|
||||||
|
" # forward pass -- calculate model output\n",
|
||||||
|
" pred = model(x_batch)\n",
|
||||||
|
" # compute the loss\n",
|
||||||
|
" loss = loss_function(pred, y_batch)\n",
|
||||||
|
" # backward pass\n",
|
||||||
|
" loss.backward()\n",
|
||||||
|
" # SGD update\n",
|
||||||
|
" optimizer.step()\n",
|
||||||
|
" # Break out of this loop -- we just want to see the first\n",
|
||||||
|
" # iteration, but usually we would continue\n",
|
||||||
|
" break"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "DFlu45pORQEz"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# convert training data to torch tensors\n",
|
||||||
|
"x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
|
||||||
|
"y_train = torch.tensor(train_data_y.astype('long'))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "i7Q0ScWgRe4G"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# This is a simple residual model with 5 residual branches in a row\n",
|
||||||
|
"class ResidualNetwork(torch.nn.Module):\n",
|
||||||
|
" def __init__(self, input_size, output_size, hidden_size=100):\n",
|
||||||
|
" super(ResidualNetwork, self).__init__()\n",
|
||||||
|
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
|
||||||
|
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear4 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear5 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear6 = nn.Linear(hidden_size, output_size)\n",
|
||||||
|
"\n",
|
||||||
|
" def count_params(self):\n",
|
||||||
|
" return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
|
||||||
|
"\n",
|
||||||
|
" def forward(self, x):\n",
|
||||||
|
" print_variance(\"Input\",x)\n",
|
||||||
|
" f = self.linear1(x)\n",
|
||||||
|
" print_variance(\"First preactivation\",f)\n",
|
||||||
|
" res1 = f+ self.linear2(f.relu())\n",
|
||||||
|
" print_variance(\"After first residual connection\",res1)\n",
|
||||||
|
" res2 = res1 + self.linear3(res1.relu())\n",
|
||||||
|
" print_variance(\"After second residual connection\",res2)\n",
|
||||||
|
" res3 = res2 + self.linear4(res2.relu())\n",
|
||||||
|
" print_variance(\"After third residual connection\",res3)\n",
|
||||||
|
" res4 = res3 + self.linear4(res3.relu())\n",
|
||||||
|
" print_variance(\"After fourth residual connection\",res4)\n",
|
||||||
|
" res5 = res4 + self.linear4(res4.relu())\n",
|
||||||
|
" print_variance(\"After fifth residual connection\",res5)\n",
|
||||||
|
" return self.linear6(res5)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "FslroPJJffrh"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define the model and run for one step\n",
|
||||||
|
"# Monitoring the variance at each point in the network\n",
|
||||||
|
"n_hidden = 100\n",
|
||||||
|
"n_input = 40\n",
|
||||||
|
"n_output = 10\n",
|
||||||
|
"model = ResidualNetwork(n_input, n_output, n_hidden)\n",
|
||||||
|
"run_one_step_of_model(model, x_train, y_train)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "NYw8I_3mmX5c"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Notice that the variance roughly doubles at each step so it increases exponentially as in figure 11.6b in the book."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "0kZUlWkkW8jE"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# TODO Adapt the residual network below to add a batch norm operation\n",
|
||||||
|
"# before the contents of each residual link as in figure 11.6c in the book\n",
|
||||||
|
"# Use the torch function nn.BatchNorm1d\n",
|
||||||
|
"class ResidualNetworkWithBatchNorm(torch.nn.Module):\n",
|
||||||
|
" def __init__(self, input_size, output_size, hidden_size=100):\n",
|
||||||
|
" super(ResidualNetworkWithBatchNorm, self).__init__()\n",
|
||||||
|
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
|
||||||
|
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear4 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear5 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear6 = nn.Linear(hidden_size, output_size)\n",
|
||||||
|
"\n",
|
||||||
|
" def count_params(self):\n",
|
||||||
|
" return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
|
||||||
|
"\n",
|
||||||
|
" def forward(self, x):\n",
|
||||||
|
" print_variance(\"Input\",x)\n",
|
||||||
|
" f = self.linear1(x)\n",
|
||||||
|
" print_variance(\"First preactivation\",f)\n",
|
||||||
|
" res1 = f+ self.linear2(f.relu())\n",
|
||||||
|
" print_variance(\"After first residual connection\",res1)\n",
|
||||||
|
" res2 = res1 + self.linear3(res1.relu())\n",
|
||||||
|
" print_variance(\"After second residual connection\",res2)\n",
|
||||||
|
" res3 = res2 + self.linear4(res2.relu())\n",
|
||||||
|
" print_variance(\"After third residual connection\",res3)\n",
|
||||||
|
" res4 = res3 + self.linear4(res3.relu())\n",
|
||||||
|
" print_variance(\"After fourth residual connection\",res4)\n",
|
||||||
|
" res5 = res4 + self.linear4(res4.relu())\n",
|
||||||
|
" print_variance(\"After fifth residual connection\",res5)\n",
|
||||||
|
" return self.linear6(res5)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "5JvMmaRITKGd"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define the model\n",
|
||||||
|
"n_hidden = 100\n",
|
||||||
|
"n_input = 40\n",
|
||||||
|
"n_output = 10\n",
|
||||||
|
"model = ResidualNetworkWithBatchNorm(n_input, n_output, n_hidden)\n",
|
||||||
|
"run_one_step_of_model(model, x_train, y_train)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "2U3DnlH9Uw6c"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Note that the variance now increases linearly as in figure 11.6c."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "R_ucFq9CXq8D"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
375
Notebooks/Chap12/12_1_Self_Attention.ipynb
Normal file
375
Notebooks/Chap12/12_1_Self_Attention.ipynb
Normal file
@@ -0,0 +1,375 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyOKrX9gmuhl9+KwscpZKr3u",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_1_Self_Attention.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 12.1: Self Attention**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook builds a self-attnetion mechanism from scratch, as discussed in section 12.2 of the book.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OLComQyvCIJ7"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$. \n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "9OJkkoNqCVK2"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Set seed so we get the same random numbers\n",
|
||||||
|
"np.random.seed(3)\n",
|
||||||
|
"# Number of inputs\n",
|
||||||
|
"N = 3\n",
|
||||||
|
"# Number of dimensions of each input\n",
|
||||||
|
"D = 4\n",
|
||||||
|
"# Create an empty list\n",
|
||||||
|
"all_x = []\n",
|
||||||
|
"# Create elements x_n and append to list\n",
|
||||||
|
"for n in range(N):\n",
|
||||||
|
" all_x.append(np.random.normal(size=(D,1)))\n",
|
||||||
|
"# Print out the list\n",
|
||||||
|
"print(all_x)\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "oAygJwLiCSri"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"We'll also need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "W2iHFbtKMaDp"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Set seed so we get the same random numbers\n",
|
||||||
|
"np.random.seed(0)\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose random values for the parameters\n",
|
||||||
|
"omega_q = np.random.normal(size=(D,D))\n",
|
||||||
|
"omega_k = np.random.normal(size=(D,D))\n",
|
||||||
|
"omega_v = np.random.normal(size=(D,D))\n",
|
||||||
|
"beta_q = np.random.normal(size=(D,1))\n",
|
||||||
|
"beta_k = np.random.normal(size=(D,1))\n",
|
||||||
|
"beta_v = np.random.normal(size=(D,1))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "79TSK7oLMobe"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now let's compute the queries, keys, and values for each input"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "VxaKQtP3Ng6R"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Make three lists to store queries, keys, and values\n",
|
||||||
|
"all_queries = []\n",
|
||||||
|
"all_keys = []\n",
|
||||||
|
"all_values = []\n",
|
||||||
|
"# For every input\n",
|
||||||
|
"for x in all_x:\n",
|
||||||
|
" # TODO -- compute the keys, queries and values.\n",
|
||||||
|
" # Replace these three lines\n",
|
||||||
|
" query = np.ones_like(x)\n",
|
||||||
|
" key = np.ones_like(x)\n",
|
||||||
|
" value = np.ones_like(x)\n",
|
||||||
|
"\n",
|
||||||
|
" all_queries.append(query)\n",
|
||||||
|
" all_keys.append(key)\n",
|
||||||
|
" all_values.append(value)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "TwDK2tfdNmw9"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"We'll need a softmax function (equation 12.5) -- here, it will take a list of arbirtrary numbers and return a list where the elements are non-negative and sum to one\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "Se7DK6PGPSUk"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def softmax(items_in):\n",
|
||||||
|
"\n",
|
||||||
|
" # TODO Compute the elements of items_out\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" items_out = items_in.copy()\n",
|
||||||
|
"\n",
|
||||||
|
" return items_out ;"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "u93LIcE5PoiM"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now compute the self attention values:"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "8aJVhbKDW7lm"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Create emptymlist for output\n",
|
||||||
|
"all_x_prime = []\n",
|
||||||
|
"\n",
|
||||||
|
"# For each output\n",
|
||||||
|
"for n in range(N):\n",
|
||||||
|
" # Create list for dot products of query N with all keys\n",
|
||||||
|
" all_km_qn = []\n",
|
||||||
|
" # Compute the dot products\n",
|
||||||
|
" for key in all_keys:\n",
|
||||||
|
" # TODO -- compute the appropriate dot product\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" dot_product = 1\n",
|
||||||
|
"\n",
|
||||||
|
" # Store dot product\n",
|
||||||
|
" all_km_qn.append(dot_product)\n",
|
||||||
|
"\n",
|
||||||
|
" # Compute dot product\n",
|
||||||
|
" attention = softmax(all_km_qn)\n",
|
||||||
|
" # Print result (should be positive sum to one)\n",
|
||||||
|
" print(\"Attentions for output \", n)\n",
|
||||||
|
" print(attention)\n",
|
||||||
|
"\n",
|
||||||
|
" # TODO: Compute a weighted sum of all of the values according to the attention\n",
|
||||||
|
" # (equation 12.3)\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" x_prime = np.zeros((D,1))\n",
|
||||||
|
"\n",
|
||||||
|
" all_x_prime.append(x_prime)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Print out true values to check you have it correct\n",
|
||||||
|
"print(\"x_prime_0_calculated:\", all_x_prime[0].transpose())\n",
|
||||||
|
"print(\"x_prime_0_true: [[ 0.94744244 -0.24348429 -0.91310441 -0.44522983]]\")\n",
|
||||||
|
"print(\"x_prime_1_calculated:\", all_x_prime[1].transpose())\n",
|
||||||
|
"print(\"x_prime_1_true: [[ 1.64201168 -0.08470004 4.02764044 2.18690791]]\")\n",
|
||||||
|
"print(\"x_prime_2_calculated:\", all_x_prime[2].transpose())\n",
|
||||||
|
"print(\"x_prime_2_true: [[ 1.61949281 -0.06641533 3.96863308 2.15858316]]\")\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "yimz-5nCW6vQ"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now let's compute the same thing, but using matrix calculations. We'll store the $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ in the columns of a $D\\times N$ matrix, using equations 12.6 and 12.7/8.\n",
|
||||||
|
"\n",
|
||||||
|
"Note: The book uses column vectors (for compatibility with the rest of the text), but in the wider literature it is more normal to store the inputs in the rows of a matrix; in this case, the computation is the same, but all the matrices are transposed and the operations proceed in the reverse order."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "PJ2vCQ_7C38K"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define softmax operation that works independently on each column\n",
|
||||||
|
"def softmax_cols(data_in):\n",
|
||||||
|
" # Exponentiate all of the values\n",
|
||||||
|
" exp_values = np.exp(data_in) ;\n",
|
||||||
|
" # Sum over columns\n",
|
||||||
|
" denom = np.sum(exp_values, axis = 0);\n",
|
||||||
|
" # Replicate denominator to N rows\n",
|
||||||
|
" denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
|
||||||
|
" # Compute softmax\n",
|
||||||
|
" softmax = exp_values / denom\n",
|
||||||
|
" # return the answer\n",
|
||||||
|
" return softmax"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "obaQBdUAMXXv"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
" # Now let's compute self attention in matrix form\n",
|
||||||
|
"def self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k):\n",
|
||||||
|
"\n",
|
||||||
|
" # TODO -- Write this function\n",
|
||||||
|
" # 1. Compute queries, keys, and values\n",
|
||||||
|
" # 2. Compute dot products\n",
|
||||||
|
" # 3. Apply softmax to calculate attentions\n",
|
||||||
|
" # 4. Weight values by attentions\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" X_prime = np.zeros_like(X);\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
" return X_prime"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "gb2WvQ3SiH8r"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Copy data into matrix\n",
|
||||||
|
"X = np.zeros((D, N))\n",
|
||||||
|
"X[:,0] = np.squeeze(all_x[0])\n",
|
||||||
|
"X[:,1] = np.squeeze(all_x[1])\n",
|
||||||
|
"X[:,2] = np.squeeze(all_x[2])\n",
|
||||||
|
"\n",
|
||||||
|
"# Run the self attention mechanism\n",
|
||||||
|
"X_prime = self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k)\n",
|
||||||
|
"\n",
|
||||||
|
"# Print out the results\n",
|
||||||
|
"print(X_prime)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "MUOJbgJskUpl"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"If you did this correctly, the values should be the same as above.\n",
|
||||||
|
"\n",
|
||||||
|
"TODO: \n",
|
||||||
|
"\n",
|
||||||
|
"Print out the attention matrix\n",
|
||||||
|
"You will see that the values are quite extreme (one is very close to one and the others are very close to zero. Now we'll fix this problem by using scaled dot-product attention."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "as_lRKQFpvz0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Now let's compute self attention in matrix form\n",
|
||||||
|
"def scaled_dot_product_self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k):\n",
|
||||||
|
"\n",
|
||||||
|
" # TODO -- Write this function\n",
|
||||||
|
" # 1. Compute queries, keys, and values\n",
|
||||||
|
" # 2. Compute dot products\n",
|
||||||
|
" # 3. Scale the dot products as in equation 12.9\n",
|
||||||
|
" # 4. Apply softmax to calculate attentions\n",
|
||||||
|
" # 5. Weight values by attentions\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" X_prime = np.zeros_like(X);\n",
|
||||||
|
"\n",
|
||||||
|
" return X_prime"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "kLU7PUnnqvIh"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Run the self attention mechanism\n",
|
||||||
|
"X_prime = scaled_dot_product_self_attention(X,omega_v, omega_q, omega_k, beta_v, beta_q, beta_k)\n",
|
||||||
|
"\n",
|
||||||
|
"# Print out the results\n",
|
||||||
|
"print(X_prime)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "n18e3XNzmVgL"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"TODO -- Investigate whether the self-attention mechanism is covariant with respect to permulation.\n",
|
||||||
|
"If it is, when we permute the columns of the input matrix $\\mathbf{X}$, the columns of the output matrix $\\mathbf{X}'$ will also be permuted.\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "QDEkIrcgrql-"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
212
Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
Normal file
212
Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyMSk8qTqDYqFnRJVZKlsue0",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 12.1: Multhead Self-Attention**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook builds a multihead self-attentionm mechanism as in figure 12.6\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OLComQyvCIJ7"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The multihead self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$. \n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "9OJkkoNqCVK2"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Set seed so we get the same random numbers\n",
|
||||||
|
"np.random.seed(3)\n",
|
||||||
|
"# Number of inputs\n",
|
||||||
|
"N = 6\n",
|
||||||
|
"# Number of dimensions of each input\n",
|
||||||
|
"D = 8\n",
|
||||||
|
"# Create an empty list\n",
|
||||||
|
"X = np.random.normal(size=(D,N))\n",
|
||||||
|
"# Print X\n",
|
||||||
|
"print(X)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "oAygJwLiCSri"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"We'll use two heads. We'll need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4). We'll use two heads, and (as in the figure), we'll make the queries keys and values of size D/H"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "W2iHFbtKMaDp"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Number of heads\n",
|
||||||
|
"H = 2\n",
|
||||||
|
"# QDV dimension\n",
|
||||||
|
"H_D = int(D/H)\n",
|
||||||
|
"\n",
|
||||||
|
"# Set seed so we get the same random numbers\n",
|
||||||
|
"np.random.seed(0)\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose random values for the parameters for the first head\n",
|
||||||
|
"omega_q1 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"omega_k1 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"omega_v1 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"beta_q1 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"beta_k1 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"beta_v1 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose random values for the parameters for the second head\n",
|
||||||
|
"omega_q2 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"omega_k2 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"omega_v2 = np.random.normal(size=(H_D,D))\n",
|
||||||
|
"beta_q2 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"beta_k2 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"beta_v2 = np.random.normal(size=(H_D,1))\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose random values for the parameters\n",
|
||||||
|
"omega_c = np.random.normal(size=(D,D))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "79TSK7oLMobe"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now let's compute the multiscale self-attention"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "VxaKQtP3Ng6R"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define softmax operation that works independently on each column\n",
|
||||||
|
"def softmax_cols(data_in):\n",
|
||||||
|
" # Exponentiate all of the values\n",
|
||||||
|
" exp_values = np.exp(data_in) ;\n",
|
||||||
|
" # Sum over columns\n",
|
||||||
|
" denom = np.sum(exp_values, axis = 0);\n",
|
||||||
|
" # Replicate denominator to N rows\n",
|
||||||
|
" denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
|
||||||
|
" # Compute softmax\n",
|
||||||
|
" softmax = exp_values / denom\n",
|
||||||
|
" # return the answer\n",
|
||||||
|
" return softmax"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "obaQBdUAMXXv"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
" # Now let's compute self attention in matrix form\n",
|
||||||
|
"def multihead_scaled_self_attention(X,omega_v1, omega_q1, omega_k1, beta_v1, beta_q1, beta_k1, omega_v2, omega_q2, omega_k2, beta_v2, beta_q2, beta_k2, omega_c):\n",
|
||||||
|
"\n",
|
||||||
|
" # TODO Write the multihead scaled self-attention mechanism.\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" X_prime = np.zeros_like(X) ;\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
" return X_prime"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "gb2WvQ3SiH8r"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Run the self attention mechanism\n",
|
||||||
|
"X_prime = multihead_scaled_self_attention(X,omega_v1, omega_q1, omega_k1, beta_v1, beta_q1, beta_k1, omega_v2, omega_q2, omega_k2, beta_v2, beta_q2, beta_k2, omega_c)\n",
|
||||||
|
"\n",
|
||||||
|
"# Print out the results\n",
|
||||||
|
"np.set_printoptions(precision=3)\n",
|
||||||
|
"print(\"Your answer:\")\n",
|
||||||
|
"print(X_prime)\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"True values:\")\n",
|
||||||
|
"print(\"[[-21.207 -5.373 -20.933 -9.179 -11.319 -17.812]\")\n",
|
||||||
|
"print(\" [ -1.995 7.906 -10.516 3.452 9.863 -7.24 ]\")\n",
|
||||||
|
"print(\" [ 5.479 1.115 9.244 0.453 5.656 7.089]\")\n",
|
||||||
|
"print(\" [ -7.413 -7.416 0.363 -5.573 -6.736 -0.848]\")\n",
|
||||||
|
"print(\" [-11.261 -9.937 -4.848 -8.915 -13.378 -5.761]\")\n",
|
||||||
|
"print(\" [ 3.548 10.036 -2.244 1.604 12.113 -2.557]\")\n",
|
||||||
|
"print(\" [ 4.888 -5.814 2.407 3.228 -4.232 3.71 ]\")\n",
|
||||||
|
"print(\" [ 1.248 18.894 -6.409 3.224 19.717 -5.629]]\")\n",
|
||||||
|
"\n",
|
||||||
|
"# If your answers don't match, then make sure that you are doing the scaling, and make sure the scaling value is correct"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "MUOJbgJskUpl"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
341
Notebooks/Chap12/12_3_Tokenization.ipynb
Normal file
341
Notebooks/Chap12/12_3_Tokenization.ipynb
Normal file
@@ -0,0 +1,341 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyP0/KodWM9Dtr2x+8MdXXH1",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap12/12_3_Tokenization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 12.3: Tokenization**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook builds set of tokens from a text string as in figure 12.8 of the book.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"I adapted this code from *SOMEWHERE*. If anyone recognizes it, can you let me know and I will give the proper attribution or rewrite if the license is not permissive.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import re, collections"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "3_WkaFO3OfLi"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"text = \"a sailor went to sea sea sea \"+\\\n",
|
||||||
|
" \"to see what he could see see see \"+\\\n",
|
||||||
|
" \"but all that he could see see see \"+\\\n",
|
||||||
|
" \"was the bottom of the deep blue sea sea sea\""
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "tVZVuauIXmJk"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Tokenize the input sentence To begin with the tokens are the individual letters and the </w> whitespace token. So, we represent each word in terms of these tokens with spaces between the tokens to delineate them.\n",
|
||||||
|
"\n",
|
||||||
|
"The tokenized text is stored in a structure that represents each word as tokens together with the count of how often that word occurs. We'll call this the *vocabulary*."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "fF2RBrouWV5w"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def initialize_vocabulary(text):\n",
|
||||||
|
" vocab = collections.defaultdict(int)\n",
|
||||||
|
" words = text.strip().split()\n",
|
||||||
|
" for word in words:\n",
|
||||||
|
" vocab[' '.join(list(word)) + ' </w>'] += 1\n",
|
||||||
|
" return vocab"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OfvXkLSARk4_"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"vocab = initialize_vocabulary(text)\n",
|
||||||
|
"print('Vocabulary: {}'.format(vocab))\n",
|
||||||
|
"print('Size of vocabulary: {}'.format(len(vocab)))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "aydmNqaoOpSm"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Find all the tokens in the current vocabulary and their frequencies"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "fJAiCjphWsI9"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def get_tokens_and_frequencies(vocab):\n",
|
||||||
|
" tokens = collections.defaultdict(int)\n",
|
||||||
|
" for word, freq in vocab.items():\n",
|
||||||
|
" word_tokens = word.split()\n",
|
||||||
|
" for token in word_tokens:\n",
|
||||||
|
" tokens[token] += freq\n",
|
||||||
|
" return tokens"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "qYi6F_K3RYsW"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"tokens = get_tokens_and_frequencies(vocab)\n",
|
||||||
|
"print('Tokens: {}'.format(tokens))\n",
|
||||||
|
"print('Number of tokens: {}'.format(len(tokens)))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "Y4LCVGnvXIwp"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Find each pair of adjacent tokens in the vocabulary\n",
|
||||||
|
"and count them. We will subsequently merge the most frequently occurring pair."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "_-Rh1mD_Ww3b"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def get_pairs_and_counts(vocab):\n",
|
||||||
|
" pairs = collections.defaultdict(int)\n",
|
||||||
|
" for word, freq in vocab.items():\n",
|
||||||
|
" symbols = word.split()\n",
|
||||||
|
" for i in range(len(symbols)-1):\n",
|
||||||
|
" pairs[symbols[i],symbols[i+1]] += freq\n",
|
||||||
|
" return pairs"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OqJTB3UFYubH"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"pairs = get_pairs_and_counts(vocab)\n",
|
||||||
|
"print('Pairs: {}'.format(pairs))\n",
|
||||||
|
"print('Number of distinct pairs: {}'.format(len(pairs)))\n",
|
||||||
|
"\n",
|
||||||
|
"most_frequent_pair = max(pairs, key=pairs.get)\n",
|
||||||
|
"print('Most frequent pair: {}'.format(most_frequent_pair))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "d-zm0JBcZSjS"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Merge the instances of the best pair in the vocabulary"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "pcborzqIXQFS"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def merge_pair_in_vocabulary(pair, vocab_in):\n",
|
||||||
|
" vocab_out = {}\n",
|
||||||
|
" bigram = re.escape(' '.join(pair))\n",
|
||||||
|
" p = re.compile(r'(?<!\\S)' + bigram + r'(?!\\S)')\n",
|
||||||
|
" for word in vocab_in:\n",
|
||||||
|
" word_out = p.sub(''.join(pair), word)\n",
|
||||||
|
" vocab_out[word_out] = vocab_in[word]\n",
|
||||||
|
" return vocab_out"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "xQI6NALdWQZX"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"vocab = merge_pair_in_vocabulary(most_frequent_pair, vocab)\n",
|
||||||
|
"print('Vocabulary: {}'.format(vocab))\n",
|
||||||
|
"print('Size of vocabulary: {}'.format(len(vocab)))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "TRYeBZI3ZULu"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Update the tokens, which now include the best token 'se'"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "bkhUx3GeXwba"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"tokens = get_tokens_and_frequencies(vocab)\n",
|
||||||
|
"print('Tokens: {}'.format(tokens))\n",
|
||||||
|
"print('Number of tokens: {}'.format(len(tokens)))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "Fqj-vQWeXxQi"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now let's write the full tokenization routine"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "K_hKp2kSXXS1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# TODO -- write this routine by filling in this missing parts,\n",
|
||||||
|
"# calling the above routines\n",
|
||||||
|
"def tokenize(text, num_merges):\n",
|
||||||
|
" # Initialize the vocabulary from the input text\n",
|
||||||
|
" # vocab = (your code here)\n",
|
||||||
|
"\n",
|
||||||
|
" for i in range(num_merges):\n",
|
||||||
|
" # Find the tokens and how often they occur in the vocabulary\n",
|
||||||
|
" # tokens = (your code here)\n",
|
||||||
|
"\n",
|
||||||
|
" # Find the pairs of adjacent tokens and their counts\n",
|
||||||
|
" # pairs = (your code here)\n",
|
||||||
|
"\n",
|
||||||
|
" # Find the most frequent pair\n",
|
||||||
|
" # most_frequent_pair = (your code here)\n",
|
||||||
|
" print('Most frequent pair: {}'.format(most_frequent_pair))\n",
|
||||||
|
"\n",
|
||||||
|
" # Merge the code in the vocabulary\n",
|
||||||
|
" # vocab = (your code here)\n",
|
||||||
|
"\n",
|
||||||
|
" # Find the tokens and how often they occur in the vocabulary one last time\n",
|
||||||
|
" # tokens = (your code here)\n",
|
||||||
|
"\n",
|
||||||
|
" return tokens, vocab"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "U_1SkQRGQ8f3"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"tokens, vocab = tokenize(text, num_merges=22)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "w0EkHTrER_-I"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"print('Tokens: {}'.format(tokens))\n",
|
||||||
|
"print('Number of tokens: {}'.format(len(tokens)))\n",
|
||||||
|
"print('Vocabulary: {}'.format(vocab))\n",
|
||||||
|
"print('Size of vocabulary: {}'.format(len(vocab)))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "moqDtTzIb-NG"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"TODO - Consider the input text:\n",
|
||||||
|
"\n",
|
||||||
|
"\"How much wood could a woodchuck chuck if a woodchuck could chuck wood\"\n",
|
||||||
|
"\n",
|
||||||
|
"How many tokens will there be initially and what will they be?\n",
|
||||||
|
"How many tokens will there be if we run the tokenization routine for the maximum number of iterations (merges)?\n",
|
||||||
|
"\n",
|
||||||
|
"When you've made your predictions, run the code and see if you are correct."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "jOW_HJtMdAxd"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
159
Notebooks/Chap13/13_1_Graph_Representation.ipynb
Normal file
159
Notebooks/Chap13/13_1_Graph_Representation.ipynb
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyMuzP1/oqTRTw4Xs/R4J/M3",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_1_Graph_Representation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 13.1: Graph representation**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook investigates representing graphs with matrices as illustrated in figure 13.4 from the book.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import networkx as nx"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OLComQyvCIJ7"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Routine to draw graph structure\n",
|
||||||
|
"def draw_graph_structure(adjacency_matrix):\n",
|
||||||
|
"\n",
|
||||||
|
" G = nx.Graph()\n",
|
||||||
|
" n_node = adjacency_matrix.shape[0]\n",
|
||||||
|
" for i in range(n_node):\n",
|
||||||
|
" for j in range(i):\n",
|
||||||
|
" if adjacency_matrix[i,j]:\n",
|
||||||
|
" G.add_edge(i,j)\n",
|
||||||
|
"\n",
|
||||||
|
" nx.draw(G, nx.spring_layout(G, seed = 0), with_labels=True)\n",
|
||||||
|
" plt.show()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "O1QMxC7X4vh9"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define a graph\n",
|
||||||
|
"# Note that the nodes are labelled from 0 rather than 1 as in the book\n",
|
||||||
|
"A = np.array([[0,1,0,1,0,0,0,0],\n",
|
||||||
|
" [1,0,1,1,1,0,0,0],\n",
|
||||||
|
" [0,1,0,0,1,0,0,0],\n",
|
||||||
|
" [1,1,0,0,1,0,0,0],\n",
|
||||||
|
" [0,1,1,1,0,1,0,1],\n",
|
||||||
|
" [0,0,0,0,1,0,1,1],\n",
|
||||||
|
" [0,0,0,0,0,1,0,0],\n",
|
||||||
|
" [0,0,0,0,1,1,0,0]]);\n",
|
||||||
|
"print(A)\n",
|
||||||
|
"draw_graph_structure(A)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "TIrihEw-7DRV"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# TODO -- find algorithmically how many walks of length three are between nodes 3 and 7\n",
|
||||||
|
"# Replace this line\n",
|
||||||
|
"print(\"Number of walks between nodes three and seven = ???\")"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "PzvfUpkV4zCj"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# TODO -- find algorithmically what the minimum path distance between nodes 0 and 6 is\n",
|
||||||
|
"# (i.e. what is the first walk length with non-zero count between 0 and 6)\n",
|
||||||
|
"# Replace this line\n",
|
||||||
|
"print(\"Minimum distance = ???\")\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# What is the worst case complexity of your method?"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "MhhJr6CgCRb5"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Now let's represent node 0 as a vector\n",
|
||||||
|
"x = np.array([[1],[0],[0],[0],[0],[0],[0],[0]]);\n",
|
||||||
|
"print(x)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "lCQjXlatABGZ"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# TODO: Find algorithmically how many paths of length 3 are there between node 0 and every other node\n",
|
||||||
|
"# Replace this line\n",
|
||||||
|
"print(np.zeros_like(x))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "nizLdZgLDzL4"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
244
Notebooks/Chap13/13_2_Graph_Classification.ipynb
Normal file
244
Notebooks/Chap13/13_2_Graph_Classification.ipynb
Normal file
@@ -0,0 +1,244 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyOMSGUFWT+YN0fwYHpMmHJM",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_2_Graph_Classification.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 13.2: Graph classification**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook investigates representing graphs with matrices as illustrated in figure 13.4 from the book.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import networkx as nx"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "OLComQyvCIJ7"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Let's build a model that maps a chemical structure to a binary decision. This model might be used to predict whether a chemical is liquid at room temparature or not. We'll start by drawing the chemical structure."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "UNleESc7k5uB"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define a graph that represents the chemical structure of ethanol and draw it\n",
|
||||||
|
"# Each node is labelled with the node number and the element (carbon, hydrogen, oxygen)\n",
|
||||||
|
"G = nx.Graph()\n",
|
||||||
|
"G.add_edge('0:H','2:C')\n",
|
||||||
|
"G.add_edge('1:H','2:C')\n",
|
||||||
|
"G.add_edge('3:H','2:C')\n",
|
||||||
|
"G.add_edge('2:C','5:C')\n",
|
||||||
|
"G.add_edge('4:H','5:C')\n",
|
||||||
|
"G.add_edge('6:H','5:C')\n",
|
||||||
|
"G.add_edge('7:O','5:C')\n",
|
||||||
|
"G.add_edge('8:H','7:O')\n",
|
||||||
|
"nx.draw(G, nx.spring_layout(G, seed = 0), with_labels=True, node_size=600)\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "TIrihEw-7DRV"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Define adjacency matrix\n",
|
||||||
|
"# TODO -- Define the adjacency matrix for this chemical\n",
|
||||||
|
"# Replace this line\n",
|
||||||
|
"A = np.zeros((9,9)) ;\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"print(A)\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Define node matrix\n",
|
||||||
|
"# There will be 9 nodes and 118 possible chemical elements\n",
|
||||||
|
"# so we'll define a 9x118 matrix. Each column represents one\n",
|
||||||
|
"# node and is a one-hot vector (i.e. all zeros, except a single one at the\n",
|
||||||
|
"# chemical number of the element).\n",
|
||||||
|
"# Chemical numbers: Hydrogen-->1, Carbon-->6, Oxygen-->8\n",
|
||||||
|
"# Since the indices start at 0, we'll set element 0 to 1 for hydrogen, element 5\n",
|
||||||
|
"# to one for carbon, and element 7 to one for oxygen\n",
|
||||||
|
"# Replace this line:\n",
|
||||||
|
"X = np.zeros((118,9))\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Print the top 15 rows of the data matrix\n",
|
||||||
|
"print(X[0:15,:])"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "gKBD5JsPfrkA"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Now let's define a network with four layers that maps this graph to a binary value, using the formulation in equation 13.11."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "40FLjNIcpHa9"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# We'll need these helper functions\n",
|
||||||
|
"\n",
|
||||||
|
"# Define the Rectified Linear Unit (ReLU) function\n",
|
||||||
|
"def ReLU(preactivation):\n",
|
||||||
|
" activation = preactivation.clip(0.0)\n",
|
||||||
|
" return activation\n",
|
||||||
|
"\n",
|
||||||
|
"# Define the logistic sigmoid function\n",
|
||||||
|
"def sigmoid(x):\n",
|
||||||
|
" return 1.0/(1.0+np.exp(-x))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "52IFREpepHE4"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Our network will have K=3 hidden layers, and will use a dimension of D=200.\n",
|
||||||
|
"K = 3; D = 200\n",
|
||||||
|
"# Set seed so we always get the same random numbers\n",
|
||||||
|
"np.random.seed(1)\n",
|
||||||
|
"# Let's initialize the parameter matrices randomly with He initialization\n",
|
||||||
|
"Omega0 = np.random.normal(size=(D, 118)) * 2.0 / D\n",
|
||||||
|
"beta0 = np.random.normal(size=(D,1)) * 2.0 / D\n",
|
||||||
|
"Omega1 = np.random.normal(size=(D, D)) * 2.0 / D\n",
|
||||||
|
"beta1 = np.random.normal(size=(D,1)) * 2.0 / D\n",
|
||||||
|
"Omega2 = np.random.normal(size=(D, D)) * 2.0 / D\n",
|
||||||
|
"beta2 = np.random.normal(size=(D,1)) * 2.0 / D\n",
|
||||||
|
"omega3 = np.random.normal(size=(1, D))\n",
|
||||||
|
"beta3 = np.random.normal(size=(1,1))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "ag0YdEgnpApK"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"def graph_neural_network(A,X, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3):\n",
|
||||||
|
" # Define this network according to equation 13.11 from the book\n",
|
||||||
|
" # Replace this line\n",
|
||||||
|
" f = np.ones((1,1))\n",
|
||||||
|
"\n",
|
||||||
|
" return f;"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "RQuTMc2WrsU3"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Let's test this network\n",
|
||||||
|
"f = graph_neural_network(A,X, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3)\n",
|
||||||
|
"print(\"Your value is %3f: \"%(f[0,0]), \"True value of f: 0.498010\")"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "X7gYgOu6uIAt"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Let's check that permuting the indices of the graph doesn't change\n",
|
||||||
|
"# the output of the network\n",
|
||||||
|
"# Define a permutation matrix\n",
|
||||||
|
"P = np.array([[0,1,0,0,0,0,0,0,0],\n",
|
||||||
|
" [0,0,0,0,1,0,0,0,0],\n",
|
||||||
|
" [0,0,0,0,0,1,0,0,0],\n",
|
||||||
|
" [0,0,0,0,0,0,0,0,1],\n",
|
||||||
|
" [1,0,0,0,0,0,0,0,0],\n",
|
||||||
|
" [0,0,1,0,0,0,0,0,0],\n",
|
||||||
|
" [0,0,0,1,0,0,0,0,0],\n",
|
||||||
|
" [0,0,0,0,0,0,0,1,0],\n",
|
||||||
|
" [0,0,0,0,0,0,1,0,0]]);\n",
|
||||||
|
"\n",
|
||||||
|
"# TODO -- Use this matrix to permute the adjacency matrix A and node matrix X\n",
|
||||||
|
"# Replace these lines\n",
|
||||||
|
"A_permuted = np.copy(A)\n",
|
||||||
|
"X_permuted = np.copy(X)\n",
|
||||||
|
"\n",
|
||||||
|
"f = graph_neural_network(A_permuted,X_permuted, Omega0, beta0, Omega1, beta1, Omega2, beta2, omega3, beta3)\n",
|
||||||
|
"print(\"Your value is %3f: \"%(f[0,0]), \"True value of f: 0.498010\")"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "F0zc3U_UuR5K"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"TODO -- encode the adjacency matrix and node matrix for propanol and run the network again. Show that the network still runs even though the size of the input graph is different.\n",
|
||||||
|
"\n",
|
||||||
|
"Propanol structure can be found [here](https://upload.wikimedia.org/wikipedia/commons/b/b8/Propanol_flat_structure.png)."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "l44vHi50zGqY"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
277
Notesbooks/Chap11/11_2_Residual_Networks.ipynb
Normal file
277
Notesbooks/Chap11/11_2_Residual_Networks.ipynb
Normal file
@@ -0,0 +1,277 @@
|
|||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0,
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": [],
|
||||||
|
"authorship_tag": "ABX9TyMJvfoCDFcSK7Z0/HkcGunb",
|
||||||
|
"include_colab_link": true
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "view-in-github",
|
||||||
|
"colab_type": "text"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notesbooks/Chap11/11_2_Residual_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# **Notebook 11.2: Residual Networks**\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook adapts the networks for MNIST1D to use residual connections.\n",
|
||||||
|
"\n",
|
||||||
|
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||||
|
"\n",
|
||||||
|
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "t9vk9Elugvmi"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
|
||||||
|
"!git clone https://github.com/greydanus/mnist1d"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "D5yLObtZCi9J"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import os\n",
|
||||||
|
"import torch, torch.nn as nn\n",
|
||||||
|
"from torch.utils.data import TensorDataset, DataLoader\n",
|
||||||
|
"from torch.optim.lr_scheduler import StepLR\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import mnist1d\n",
|
||||||
|
"import random"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YrXWAH7sUWvU"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"args = mnist1d.data.get_dataset_args()\n",
|
||||||
|
"data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
|
||||||
|
"\n",
|
||||||
|
"# The training and test input and outputs are in\n",
|
||||||
|
"# data['x'], data['y'], data['x_test'], and data['y_test']\n",
|
||||||
|
"print(\"Examples in training set: {}\".format(len(data['y'])))\n",
|
||||||
|
"print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
|
||||||
|
"print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "twI72ZCrCt5z"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Load in the data\n",
|
||||||
|
"train_data_x = data['x'].transpose()\n",
|
||||||
|
"train_data_y = data['y']\n",
|
||||||
|
"val_data_x = data['x_test'].transpose()\n",
|
||||||
|
"val_data_y = data['y_test']\n",
|
||||||
|
"# Print out sizes\n",
|
||||||
|
"print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
|
||||||
|
"print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "8bKADvLHbiV5"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Define the network"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "_sFvRDGrl4qe"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# There are 40 input dimensions and 10 output dimensions for this data\n",
|
||||||
|
"# The inputs correspond to the 40 offsets in the MNIST1D template.\n",
|
||||||
|
"D_i = 40\n",
|
||||||
|
"# The outputs correspond to the 10 digits\n",
|
||||||
|
"D_o = 10\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# We will adapt this model to have residual connections around the linear layers\n",
|
||||||
|
"# This is the same model we used in practical 8.1, but we can't use the sequential\n",
|
||||||
|
"# class for residual networks (which aren't strictly sequential). Hence, I've rewritten\n",
|
||||||
|
"# it as a model that inherits from a base class\n",
|
||||||
|
"\n",
|
||||||
|
"class ResidualNetwork(torch.nn.Module):\n",
|
||||||
|
" def __init__(self, input_size, output_size, hidden_size=100):\n",
|
||||||
|
" super(ResidualNetwork, self).__init__()\n",
|
||||||
|
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
|
||||||
|
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
|
||||||
|
" self.linear4 = nn.Linear(hidden_size, output_size)\n",
|
||||||
|
" print(\"Initialized MLPBase model with {} parameters\".format(self.count_params()))\n",
|
||||||
|
"\n",
|
||||||
|
" def count_params(self):\n",
|
||||||
|
" return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
|
||||||
|
"\n",
|
||||||
|
"# # TODO -- Add residual connections to this model\n",
|
||||||
|
"# # The order of operations should similar to figure 11.5b\n",
|
||||||
|
"# # linear1 first, ReLU+linear2 in first residual block, ReLU+linear3 in second residual block), linear4 at end\n",
|
||||||
|
"# # Replace this function\n",
|
||||||
|
" def forward(self, x):\n",
|
||||||
|
" h1 = self.linear1(x).relu()\n",
|
||||||
|
" h2 = self.linear2(h1).relu()\n",
|
||||||
|
" h3 = self.linear3(h2).relu()\n",
|
||||||
|
" return self.linear4(h3)\n"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "FslroPJJffrh"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# He initialization of weights\n",
|
||||||
|
"def weights_init(layer_in):\n",
|
||||||
|
" if isinstance(layer_in, nn.Linear):\n",
|
||||||
|
" nn.init.kaiming_uniform_(layer_in.weight)\n",
|
||||||
|
" layer_in.bias.data.fill_(0.0)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "YgLaex1pfhqz"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"#Define the model\n",
|
||||||
|
"model = ResidualNetwork(40, 10)\n",
|
||||||
|
"\n",
|
||||||
|
"# choose cross entropy loss function (equation 5.24 in the loss notes)\n",
|
||||||
|
"loss_function = nn.CrossEntropyLoss()\n",
|
||||||
|
"# construct SGD optimizer and initialize learning rate and momentum\n",
|
||||||
|
"optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
|
||||||
|
"# object that decreases learning rate by half every 20 epochs\n",
|
||||||
|
"scheduler = StepLR(optimizer, step_size=20, gamma=0.5)\n",
|
||||||
|
"# create 100 dummy data points and store in data loader class\n",
|
||||||
|
"x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
|
||||||
|
"y_train = torch.tensor(train_data_y.astype('long'))\n",
|
||||||
|
"x_val= torch.tensor(val_data_x.transpose().astype('float32'))\n",
|
||||||
|
"y_val = torch.tensor(val_data_y.astype('long'))\n",
|
||||||
|
"\n",
|
||||||
|
"# load the data into a class that creates the batches\n",
|
||||||
|
"data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
|
||||||
|
"\n",
|
||||||
|
"# Initialize model weights\n",
|
||||||
|
"model.apply(weights_init)\n",
|
||||||
|
"\n",
|
||||||
|
"# loop over the dataset n_epoch times\n",
|
||||||
|
"n_epoch = 100\n",
|
||||||
|
"# store the loss and the % correct at each epoch\n",
|
||||||
|
"losses_train = np.zeros((n_epoch))\n",
|
||||||
|
"errors_train = np.zeros((n_epoch))\n",
|
||||||
|
"losses_val = np.zeros((n_epoch))\n",
|
||||||
|
"errors_val = np.zeros((n_epoch))\n",
|
||||||
|
"\n",
|
||||||
|
"for epoch in range(n_epoch):\n",
|
||||||
|
" # loop over batches\n",
|
||||||
|
" for i, data in enumerate(data_loader):\n",
|
||||||
|
" # retrieve inputs and labels for this batch\n",
|
||||||
|
" x_batch, y_batch = data\n",
|
||||||
|
" # zero the parameter gradients\n",
|
||||||
|
" optimizer.zero_grad()\n",
|
||||||
|
" # forward pass -- calculate model output\n",
|
||||||
|
" pred = model(x_batch)\n",
|
||||||
|
" # compute the loss\n",
|
||||||
|
" loss = loss_function(pred, y_batch)\n",
|
||||||
|
" # backward pass\n",
|
||||||
|
" loss.backward()\n",
|
||||||
|
" # SGD update\n",
|
||||||
|
" optimizer.step()\n",
|
||||||
|
"\n",
|
||||||
|
" # Run whole dataset to get statistics -- normally wouldn't do this\n",
|
||||||
|
" pred_train = model(x_train)\n",
|
||||||
|
" pred_val = model(x_val)\n",
|
||||||
|
" _, predicted_train_class = torch.max(pred_train.data, 1)\n",
|
||||||
|
" _, predicted_val_class = torch.max(pred_val.data, 1)\n",
|
||||||
|
" errors_train[epoch] = 100 - 100 * (predicted_train_class == y_train).float().sum() / len(y_train)\n",
|
||||||
|
" errors_val[epoch]= 100 - 100 * (predicted_val_class == y_val).float().sum() / len(y_val)\n",
|
||||||
|
" losses_train[epoch] = loss_function(pred_train, y_train).item()\n",
|
||||||
|
" losses_val[epoch]= loss_function(pred_val, y_val).item()\n",
|
||||||
|
" print(f'Epoch {epoch:5d}, train loss {losses_train[epoch]:.6f}, train error {errors_train[epoch]:3.2f}, val loss {losses_val[epoch]:.6f}, percent error {errors_val[epoch]:3.2f}')\n",
|
||||||
|
"\n",
|
||||||
|
" # tell scheduler to consider updating learning rate\n",
|
||||||
|
" scheduler.step()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "NYw8I_3mmX5c"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"# Plot the results\n",
|
||||||
|
"fig, ax = plt.subplots()\n",
|
||||||
|
"ax.plot(errors_train,'r-',label='train')\n",
|
||||||
|
"ax.plot(errors_val,'b-',label='test')\n",
|
||||||
|
"ax.set_ylim(0,100); ax.set_xlim(0,n_epoch)\n",
|
||||||
|
"ax.set_xlabel('Epoch'); ax.set_ylabel('Error')\n",
|
||||||
|
"ax.set_title('TrainError %3.2f, Val Error %3.2f'%(errors_train[-1],errors_val[-1]))\n",
|
||||||
|
"ax.legend()\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "CcP_VyEmE2sv"
|
||||||
|
},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"The primary motivation of residual networks is to allow training of much deeper networks. \n",
|
||||||
|
"\n",
|
||||||
|
"TODO: Try running this network with and without the residual connections. Does adding the residual connections change the performance?"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"id": "wMmqhmxuAx0M"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@@ -7,7 +7,7 @@ To be published by MIT Press Dec 5th 2023.<br>
|
|||||||
|
|
||||||
<h2> Download draft PDF </h2>
|
<h2> Download draft PDF </h2>
|
||||||
|
|
||||||
<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.1/UnderstandingDeepLearning_26_07_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-07-26. CC-BY-NC-ND license
|
<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.2/UnderstandingDeepLearning_06_08_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-08-06. CC-BY-NC-ND license
|
||||||
<br>
|
<br>
|
||||||
<img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
|
<img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
|
||||||
<br>
|
<br>
|
||||||
@@ -74,7 +74,7 @@ To be published by MIT Press Dec 5th 2023.<br>
|
|||||||
<li> Appendices - <a href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLAppendixPDF.zip">PDF Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1k2j7hMN40ISPSg9skFYWFL3oZT7r8v-l"> SVG Figures</a> / <a href="https://docs.google.com/presentation/d/1_2cJHRnsoQQHst0rwZssv-XH4o5SEHks/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">Powerpoint Figures</a>
|
<li> Appendices - <a href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLAppendixPDF.zip">PDF Figures</a> / <a href="https://drive.google.com/uc?export=download&id=1k2j7hMN40ISPSg9skFYWFL3oZT7r8v-l"> SVG Figures</a> / <a href="https://docs.google.com/presentation/d/1_2cJHRnsoQQHst0rwZssv-XH4o5SEHks/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">Powerpoint Figures</a>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
Instructions for editing figures / equations can be found <a href="https://drive.google.com/uc?export=download&id=1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/">here</a>.</p>
|
Instructions for editing figures / equations can be found <a href="https://drive.google.com/file/d/1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/view?usp=sharing">here</a>.</p>
|
||||||
|
|
||||||
<h2>Resources for students</h2>
|
<h2>Resources for students</h2>
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user