Compare commits

...

24 Commits

Author SHA1 Message Date
udlbook
50d1c5e255 Created using Colaboratory 2023-10-12 18:42:59 +01:00
udlbook
9a3517629a Created using Colaboratory 2023-10-12 18:37:05 +01:00
udlbook
af77c76435 Created using Colaboratory 2023-10-12 18:07:26 +01:00
udlbook
0515940ace Created using Colaboratory 2023-10-12 17:39:48 +01:00
udlbook
5d578df07b Created using Colaboratory 2023-10-12 17:25:40 +01:00
udlbook
d174c9f34c Created using Colaboratory 2023-10-12 15:15:13 +01:00
udlbook
dbb3d4b666 Created using Colaboratory 2023-10-11 18:22:52 +01:00
udlbook
39ad6413ce Created using Colaboratory 2023-10-11 18:21:04 +01:00
udlbook
3e51d89714 Created using Colaboratory 2023-10-10 12:01:50 +01:00
udlbook
5680e5d7f7 Created using Colaboratory 2023-10-10 11:52:48 +01:00
udlbook
0a5a97f55d Created using Colaboratory 2023-10-10 11:51:31 +01:00
udlbook
0dc94ead03 Created using Colaboratory 2023-10-09 16:55:44 +01:00
udlbook
c7e7e731b3 Created using Colaboratory 2023-10-06 16:57:48 +01:00
udlbook
06f197c787 Created using Colaboratory 2023-10-06 15:31:42 +01:00
udlbook
ff29fc34e8 Created using Colaboratory 2023-10-05 09:19:41 +01:00
udlbook
2653116c47 Created using Colaboratory 2023-10-04 18:57:52 +01:00
udlbook
f50de74496 Created using Colaboratory 2023-10-04 17:03:39 +01:00
udlbook
aa5d89adf3 Created using Colaboratory 2023-10-04 12:46:08 +01:00
udlbook
18e827842c Created using Colaboratory 2023-10-04 08:32:27 +01:00
udlbook
22b6b18660 Created using Colaboratory 2023-10-03 18:52:37 +01:00
udlbook
ed060b6b08 Created using Colaboratory 2023-10-03 17:22:46 +01:00
udlbook
df0132505b Created using Colaboratory 2023-10-03 08:58:32 +01:00
udlbook
67fb0f5990 Update index.html 2023-10-01 18:22:32 +01:00
udlbook
ecd01f2992 Delete Notesbooks/Chap11 directory 2023-10-01 18:19:38 +01:00
14 changed files with 2064 additions and 323 deletions

View File

@@ -332,9 +332,7 @@
"2. What is $\\mbox{exp}[1]$?\n",
"3. What is $\\mbox{exp}[-\\infty]$?\n",
"4. What is $\\mbox{exp}[+\\infty]$?\n",
"5. A function is convex if we can draw a straight line between any two points on the\n",
"function, and this line always lies above the function. Similarly, a function is concave\n",
"if a straight line between any two points always lies below the function. Is the exponential function convex or concave or neither?\n"
"5. A function is convex if we can draw a straight line between any two points on the function, and this line always lies above the function. Similarly, a function is concave if a straight line between any two points always lies below the function. Is the exponential function convex or concave or neither?\n"
]
},
{
@@ -343,7 +341,7 @@
"id": "R6A4e5IxIWCu"
},
"source": [
"Now let's consider the logarithm function $y=\\log[x]$. Throughout the book we always use natural (base $e$) logarithms. The log funcction maps non-negative numbers $[0,\\infty]$ to real numbers $[-\\infty,\\infty]$. It is the inverse of the exponential function. So when we compute $\\log[x]$ we are really asking \"What is the number $y$ so that $e^y=x$?\""
"Now let's consider the logarithm function $y=\\log[x]$. Throughout the book we always use natural (base $e$) logarithms. The log function maps non-negative numbers $[0,\\infty]$ to real numbers $[-\\infty,\\infty]$. It is the inverse of the exponential function. So when we compute $\\log[x]$ we are really asking \"What is the number $y$ so that $e^y=x$?\""
]
},
{
@@ -384,15 +382,6 @@
"6. What is $\\mbox{log}[-1]$?\n",
"7. Is the logarithm function concave or convex?\n"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "XG0CKLiPJI7I"
},
"execution_count": null,
"outputs": []
}
],
"metadata": {

File diff suppressed because one or more lines are too long

View File

@@ -4,7 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPFqKOqd6BjlymOawCRkmfn",
"authorship_tag": "ABX9TyPD+qTkgmZCe+VessXM/kIU",
"include_colab_link": true
},
"kernelspec": {
@@ -238,7 +238,7 @@
"def shallow_2_2_3(x1,x2, activation_fn, phi_10,phi_11,phi_12,phi_13, phi_20,phi_21,phi_22,phi_23, theta_10, theta_11,\\\n",
" theta_12, theta_20, theta_21, theta_22, theta_30, theta_31, theta_32):\n",
"\n",
" # TODO -- write this function -- replace the dummy code blow\n",
" # TODO -- write this function -- replace the dummy code below\n",
" pre_1 = np.zeros_like(x1)\n",
" pre_2 = np.zeros_like(x1)\n",
" pre_3 = np.zeros_like(x1)\n",

View File

@@ -4,7 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMhLSGU8+odPS/CoW5PwKna",
"authorship_tag": "ABX9TyMdflMfWi9hu9ZEg/80HCd8",
"include_colab_link": true
},
"kernelspec": {
@@ -62,7 +62,7 @@
"source": [
"The number of regions $N$ created by a shallow neural network with $D_i$ inputs and $D$ hidden units is given by Zaslavsky's formula:\n",
"\n",
"\\begin{equation}N = \\sum_{j=1}^{D_{i}}\\binom{D}{j}=\\sum_{j=1}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} <br>\n",
"\\begin{equation}N = \\sum_{j=0}^{D_{i}}\\binom{D}{j}=\\sum_{j=0}^{D_{i}} \\frac{D!}{(D-j)!j!} \\end{equation} <br>\n",
"\n"
],
"metadata": {
@@ -115,7 +115,7 @@
{
"cell_type": "markdown",
"source": [
"This works but there is a complication. If the number of hidden units $D$ is fewer than the number of hidden dimensions $D_i$ , the formula will fail. When this is the case, there are just $2^D$ regions (see figure 3.10 to understand why).\n",
"This works but there is a complication. If the number of hidden units $D$ is fewer than the number of input dimensions $D_i$ , the formula will fail. When this is the case, there are just $2^D$ regions (see figure 3.10 to understand why).\n",
"\n",
"Let's demonstrate this:"
],
@@ -142,7 +142,7 @@
{
"cell_type": "code",
"source": [
"# Let's do the calculation properly when D<Di\n",
"# Let's do the calculation properly when D<Di (see figure 3.10 from the book)\n",
"D = 8; Di = 10\n",
"N = np.power(2,D)\n",
"# We can equivalently do this by calling number_regions with the D twice\n",
@@ -210,7 +210,7 @@
"source": [
"# Now let's test the code\n",
"N = number_parameters(10, 8)\n",
"print(f\"Di=10, D=8, Number of parameters = {int(N)}, True value = 90\")"
"print(f\"Di=10, D=8, Number of parameters = {int(N)}, True value = 97\")"
],
"metadata": {
"id": "VbhDmZ1gwkQj"
@@ -233,7 +233,7 @@
" for c_hidden in range(1, 200):\n",
" # Iterate over different ranges of number hidden variables for different input sizes\n",
" D = int(c_hidden * 500 / D_i)\n",
" params[c_dim, c_hidden] = D_i * D +1 + D +1\n",
" params[c_dim, c_hidden] = D_i * D +D + D +1\n",
" regions[c_dim, c_hidden] = number_regions(np.min([D_i,D]), D)\n",
"\n",
"fig, ax = plt.subplots()\n",

View File

@@ -4,7 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyOu5BvK3aFb7ZEQKG5vfOZ1",
"authorship_tag": "ABX9TyPmra+JD+dm2M3gCqx3bMak",
"include_colab_link": true
},
"kernelspec": {
@@ -185,7 +185,7 @@
"The ReLU isn't the only kind of activation function. For a long time, people used sigmoid functions. A logistic sigmoid function is defined by the equation\n",
"\n",
"\\begin{equation}\n",
"f[h] = \\frac{1}{1+\\exp{[-10 z ]}}\n",
"f[z] = \\frac{1}{1+\\exp{[-10 z ]}}\n",
"\\end{equation}\n",
"\n",
"(Note that the factor of 10 is not standard -- but it allow us to plot on the same axes as the ReLU examples)"

View File

@@ -0,0 +1,314 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNXqwmC4yEc1mGv9/74b0jY",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_3_Neighborhood_Sampling.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 13.3: Neighborhood sampling**\n",
"\n",
"This notebook investigates neighborhood sampling of graphs as in figure 13.10 from the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import networkx as nx"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's construct the graph from figure 13.10, which has 23 nodes."
],
"metadata": {
"id": "UNleESc7k5uB"
}
},
{
"cell_type": "code",
"source": [
"# Define adjacency matrix\n",
"A = np.array([[0,1,1,1,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [1,0,1,0,0, 0,0,0,1,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [1,1,0,1,0, 0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [1,0,1,0,1, 0,1,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,0,0,1,0, 1,0,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,0,0,0,1, 0,0,1,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,0,0,1,0, 0,0,1,0,1, 1,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,0,0,1,1, 1,1,0,0,0, 1,0,0,1,0, 0,0,0,0,0, 0,0,0],\n",
" [0,1,0,0,0, 0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,1,1,0,0, 0,1,0,1,0, 0,1,1,0,0, 0,1,0,0,0, 0,0,0],\n",
" [0,0,0,0,0, 0,1,1,0,0, 0,0,1,0,0, 0,0,0,0,0, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,1, 0,0,0,0,1, 1,1,0,0,0, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,1, 1,0,0,1,0, 0,1,1,0,0, 0,0,0],\n",
" [0,0,0,0,0, 0,0,1,0,0, 0,0,1,0,0, 0,0,1,1,0, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,1,0,0,0, 1,0,0,0,1, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,1,0,0,1, 0,1,0,0,1, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,1, 0,1,1,0,0, 1,0,1,0,1, 0,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,1,1,0, 0,1,0,1,0, 1,1,1],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,0,1,0, 0,0,1,0,0, 0,0,1],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,1, 1,1,0,0,0, 1,0,0],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,1, 0,1,0],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,0, 1,0,1],\n",
" [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,1,0, 0,1,0]]);\n",
"print(A)"
],
"metadata": {
"id": "fHgH5hdG_W1h"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Routine to draw graph structure, highlighting original node (brown in fig 13.10)\n",
"# and neighborhood nodes (orange in figure 13.10)\n",
"def draw_graph_structure(adjacency_matrix, original_node, neighborhood_nodes=None):\n",
"\n",
" G = nx.Graph()\n",
" n_node = adjacency_matrix.shape[0]\n",
" for i in range(n_node):\n",
" for j in range(i):\n",
" if adjacency_matrix[i,j]:\n",
" G.add_edge(i,j)\n",
"\n",
" color_map = []\n",
"\n",
" for node in G:\n",
" if original_node[node]:\n",
" color_map.append('brown')\n",
" else:\n",
" if neighborhood_nodes[node]:\n",
" color_map.append('orange')\n",
" else:\n",
" color_map.append('white')\n",
"\n",
" nx.draw(G, nx.spring_layout(G, seed = 7), with_labels=True,node_color=color_map)\n",
" plt.show()"
],
"metadata": {
"id": "TIrihEw-7DRV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"n_nodes = A.shape[0]\n",
"\n",
"# Define a single output layer node\n",
"output_layer_nodes=np.zeros((n_nodes,1)); output_layer_nodes[16]=1\n",
"# Define the neighboring nodes to draw (none)\n",
"neighbor_nodes = np.zeros((n_nodes,1))\n",
"print(\"Output layer:\")\n",
"draw_graph_structure(A, output_layer_nodes, neighbor_nodes)"
],
"metadata": {
"id": "gKBD5JsPfrkA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's imagine that we want to form a batch for a node labelling task that consists of just node 16 in the output layer (highlighted). The network consists of the input, hidden layer 1, hidden layer2, and the output layer."
],
"metadata": {
"id": "JaH3g_-O-0no"
}
},
{
"cell_type": "code",
"source": [
"# TODO Find the nodes in hidden layer 2 that connect to node 16 in the output layer\n",
"# using the adjacency matrix\n",
"# Replace this line:\n",
"hidden_layer2_nodes = np.zeros((n_nodes,1));\n",
"\n",
"print(\"Hidden layer 2:\")\n",
"draw_graph_structure(A, output_layer_nodes, hidden_layer2_nodes)"
],
"metadata": {
"id": "9oSiuP3B3HNS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO - Find the nodes in hidden layer 1 that connect that connect to node 16 in the output layer\n",
"# using the adjacency matrix\n",
"# Replace this line:\n",
"hidden_layer1_nodes = np.zeros((n_nodes,1));\n",
"\n",
"print(\"Hidden layer 1:\")\n",
"draw_graph_structure(A, output_layer_nodes, hidden_layer1_nodes)"
],
"metadata": {
"id": "zZFxw3m1_wWr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO Find the nodes in the input layer that connect to node 16 in the output layer\n",
"# using the adjacency matrix\n",
"# Replace this line:\n",
"input_layer_nodes = np.zeros((n_nodes,1));\n",
"\n",
"print(\"Input layer:\")\n",
"draw_graph_structure(A, output_layer_nodes, input_layer_nodes)"
],
"metadata": {
"id": "EL3N8BXyCu0F"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"This is bad news. This is a fairly sparsely connected graph (i.e. adjacency matrix is mostly zeros) and there are only two hidden layers. Nonetheless, we have to involve almost all the nodes in the graph to compute the loss at this output.\n",
"\n",
"To resolve this problem, we'll use neighborhood sampling. We'll start again with a single node in the output layer."
],
"metadata": {
"id": "CE0WqytvC7zr"
}
},
{
"cell_type": "code",
"source": [
"print(\"Output layer:\")\n",
"draw_graph_structure(A, output_layer_nodes, neighbor_nodes)"
],
"metadata": {
"id": "59WNys3KC5y6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define umber of neighbors to sample\n",
"n_sample = 3"
],
"metadata": {
"id": "uCoJwpcTNFdI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO Find the nodes in hidden layer 2 that connect to node 16 in the output layer\n",
"# using the adjacency matrix. Then sample n_sample of these nodes randomly without\n",
"# replacement.\n",
"\n",
"# Replace this line:\n",
"hidden_layer2_nodes = np.zeros((n_nodes,1));\n",
"\n",
"draw_graph_structure(A, output_layer_nodes, hidden_layer2_nodes)"
],
"metadata": {
"id": "_WEop6lYGNhJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO Find the nodes in hidden layer 1 that connect to the nodes in hidden layer 2\n",
"# using the adjacency matrix. Then sample n_sample of these nodes randomly without\n",
"# replacement. Make sure not to sample nodes that were already included in hidden layer 2 our the ouput layer.\n",
"# The nodes at hidden layer 1 are the union of these nodes and the nodes in hidden layer 2\n",
"\n",
"# Replace this line:\n",
"hidden_layer1_nodes = np.zeros((n_nodes,1));\n",
"\n",
"draw_graph_structure(A, output_layer_nodes, hidden_layer1_nodes)\n"
],
"metadata": {
"id": "k90qW_LDLpNk"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO Find the nodes in the input layer that connect to the nodes in hidden layer 1\n",
"# using the adjacency matrix. Then sample n_sample of these nodes randomly without\n",
"# replacement. Make sure not to sample nodes that were already included in hidden layer 1,2, or the output layer.\n",
"# The nodes at the input layer are the union of these nodes and the nodes in hidden layers 1 and 2\n",
"\n",
"# Replace this line:\n",
"input_layer_nodes = np.zeros((n_nodes,1));\n",
"\n",
"draw_graph_structure(A, output_layer_nodes, input_layer_nodes)"
],
"metadata": {
"id": "NDEYUty_O3Zr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"If you did this correctly, there should be 9 yellow nodes in the figure. The \"receptive field\" of node 16 in the output layer increases much more slowly as we move back through the layers of the network."
],
"metadata": {
"id": "vu4eJURmVkc5"
}
}
]
}

View File

@@ -0,0 +1,213 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyOdSkjfQnSZXnffGsZVM7r5",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 13.4: Graph attention networks**\n",
"\n",
"This notebook builds a graph attention mechanism from scratch, as discussed in section 13.8.6 of the book and illustrated in figure 13.12c\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
"\n"
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$. \n",
"\n"
],
"metadata": {
"id": "9OJkkoNqCVK2"
}
},
{
"cell_type": "code",
"source": [
"# Set seed so we get the same random numbers\n",
"np.random.seed(1)\n",
"# Number of nodes in the graph\n",
"N = 8\n",
"# Number of dimensions of each input\n",
"D = 4\n",
"\n",
"# Define a graph\n",
"A = np.array([[0,1,0,1,0,0,0,0],\n",
" [1,0,1,1,1,0,0,0],\n",
" [0,1,0,0,1,0,0,0],\n",
" [1,1,0,0,1,0,0,0],\n",
" [0,1,1,1,0,1,0,1],\n",
" [0,0,0,0,1,0,1,1],\n",
" [0,0,0,0,0,1,0,0],\n",
" [0,0,0,0,1,1,0,0]]);\n",
"print(A)\n",
"\n",
"# Let's also define some random data\n",
"X = np.random.normal(size=(D,N))"
],
"metadata": {
"id": "oAygJwLiCSri"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We'll also need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4)"
],
"metadata": {
"id": "W2iHFbtKMaDp"
}
},
{
"cell_type": "code",
"source": [
"# Choose random values for the parameters\n",
"omega = np.random.normal(size=(D,D))\n",
"beta = np.random.normal(size=(D,1))\n",
"phi = np.random.normal(size=(1,2*D))"
],
"metadata": {
"id": "79TSK7oLMobe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We'll need a softmax operation that operates on the columns of the matrix and a ReLU function as well"
],
"metadata": {
"id": "iYPf6c4MhCgq"
}
},
{
"cell_type": "code",
"source": [
"# Define softmax operation that works independently on each column\n",
"def softmax_cols(data_in):\n",
" # Exponentiate all of the values\n",
" exp_values = np.exp(data_in) ;\n",
" # Sum over columns\n",
" denom = np.sum(exp_values, axis = 0);\n",
" # Replicate denominator to N rows\n",
" denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
" # Compute softmax\n",
" softmax = exp_values / denom\n",
" # return the answer\n",
" return softmax\n",
"\n",
"\n",
"# Define the Rectified Linear Unit (ReLU) function\n",
"def ReLU(preactivation):\n",
" activation = preactivation.clip(0.0)\n",
" return activation\n"
],
"metadata": {
"id": "obaQBdUAMXXv"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
" # Now let's compute self attention in matrix form\n",
"def graph_attention(X,omega, beta, phi, A):\n",
"\n",
" # TODO -- Write this function (see figure 13.12c)\n",
" # 1. Compute X_prime\n",
" # 2. Compute S\n",
" # 3. To apply the mask, set S to a very large negative number (e.g. -1e20) everywhere where A+I is zero\n",
" # 4. Run the softmax function to compute the attention values\n",
" # 5. Postmultiply X' by the attention values\n",
" # 6. Apply the ReLU function\n",
" # Replace this line:\n",
" output = np.ones_like(X) ;\n",
"\n",
" return output;"
],
"metadata": {
"id": "gb2WvQ3SiH8r"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Test out the graph attention mechanism\n",
"np.set_printoptions(precision=3)\n",
"output = graph_attention(X, omega, beta, phi, A);\n",
"print(\"Correct answer is:\")\n",
"print(\"[[1.796 1.346 0.569 1.703 1.298 1.224 1.24 1.234]\")\n",
"print(\" [0.768 0.672 0. 0.529 3.841 4.749 5.376 4.761]\")\n",
"print(\" [0.305 0.129 0. 0.341 0.785 1.014 1.113 1.024]\")\n",
"print(\" [0. 0. 0. 0. 0.35 0.864 1.098 0.871]]]\")\n",
"\n",
"\n",
"print(\"Your answer is:\")\n",
"print(output)"
],
"metadata": {
"id": "d4p6HyHXmDh5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"TODO -- Try to construct a dot-product self-attention mechanism as in practical 12.1 that respects the geometry of the graph and has zero attention between non-neighboring nodes by combining figures 13.12a and 13.12b.\n"
],
"metadata": {
"id": "QDEkIrcgrql-"
}
}
]
}

View File

@@ -0,0 +1,419 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyM0StKV3FIZ3MZqfflqC0Rv",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap15/15_1_GAN_Toy_Example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 15.1: GAN Toy example**\n",
"\n",
"This notebook investigates the GAN toy example as illustred in figure 15.1 in the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Get a batch of real data. Our goal is to make data that looks like this.\n",
"def get_real_data_batch(n_sample):\n",
" np.random.seed(0)\n",
" x_true = np.random.normal(size=(1,n_sample)) + 7.5\n",
" return x_true"
],
"metadata": {
"id": "y_OkVWmam4Qx"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Define our generator. This takes a standard normally-distributed latent variable $z$ and adds a scalar $\\theta$ to this, where $\\theta$ is the single parameter of this generative model according to:\n",
"\n",
"\\begin{equation}\n",
"x_i = z_i + \\theta.\n",
"\\end{equation}\n",
"\n",
"Obviously this model can generate the family of Gaussian distributions with unit variance, but different means."
],
"metadata": {
"id": "RFpL0uCXoTpV"
}
},
{
"cell_type": "code",
"source": [
"# This is our generator -- takes the single parameter theta\n",
"# of the generative model and generates n samples\n",
"def generator(z, theta):\n",
" x_gen = z + theta\n",
" return x_gen"
],
"metadata": {
"id": "OtLQvf3Enfyw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now, we define our disriminator. This is a simple logistic regression model (1D linear model passed through sigmoid) that returns the probability that the data is real"
],
"metadata": {
"id": "Xrzd8aehYAYR"
}
},
{
"cell_type": "code",
"source": [
"# Define our discriminative model\n",
"\n",
"# Logistic sigmoid, maps from [-infty,infty] to [0,1]\n",
"def sig(data_in):\n",
" return 1.0 / (1.0+np.exp(-data_in))\n",
"\n",
"# Discriminator computes y\n",
"def discriminator(x, phi0, phi1):\n",
" return sig(phi0 + phi1 * x)"
],
"metadata": {
"id": "vHBgAFZMsnaC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Draws a figure like Figure 15.1a\n",
"def draw_data_model(x_real, x_syn, phi0=None, phi1=None):\n",
" fix, ax = plt.subplots();\n",
"\n",
" for x in x_syn:\n",
" ax.plot([x,x],[0,0.33],color='#f47a60')\n",
" for x in x_real:\n",
" ax.plot([x,x],[0,0.33],color='#7fe7dc')\n",
"\n",
" if phi0 is not None:\n",
" x_model = np.arange(0,10,0.01)\n",
" y_model = discriminator(x_model, phi0, phi1)\n",
" ax.plot(x_model, y_model,color='#dddddd')\n",
" ax.set_xlim([0,10])\n",
" ax.set_ylim([0,1])\n",
"\n",
"\n",
" plt.show()"
],
"metadata": {
"id": "V1FiDBhepcQJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Get data batch\n",
"x_real = get_real_data_batch(10)\n",
"\n",
"# Initialize generator and synthesize a batch of examples\n",
"theta = 3.0\n",
"np.random.seed(1)\n",
"z = np.random.normal(size=(1,10))\n",
"x_syn = generator(z, theta)\n",
"\n",
"# Initialize discriminator model\n",
"phi0 = -2\n",
"phi1 = 1\n",
"\n",
"draw_data_model(x_real, x_syn, phi0, phi1)"
],
"metadata": {
"id": "U8pFb497x36n"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"You can see that the synthesized (orange) samples don't look much like the real (cyan) ones, and the initial model to discriminate them (gray line represents probability of being real) is pretty bad as well.\n",
"\n",
"Let's deal with the discriminator first. Let's define the loss"
],
"metadata": {
"id": "SNDV1G5PYhcQ"
}
},
{
"cell_type": "code",
"source": [
"# Discriminator loss\n",
"def compute_discriminator_loss(x_real, x_syn, phi0, phi1):\n",
"\n",
" # TODO -- compute the loss for the discriminator\n",
" # Run the real data and the synthetic data through the discriminator\n",
" # Then use the standard binary cross entropy loss with the y=1 for the real samples\n",
" # and y=0 for the synthesized ones.\n",
" # Replace this line\n",
" loss = 0.0\n",
"\n",
"\n",
" return loss"
],
"metadata": {
"id": "Bc3VwCabYcfg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Test the loss\n",
"loss = compute_discriminator_loss(x_real, x_syn, phi0, phi1)\n",
"print(\"True Loss = 13.814757170851447, Your loss=\", loss )"
],
"metadata": {
"id": "MiqM3GXSbn0z"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Gradient of loss (cheating, using finite differences)\n",
"def compute_discriminator_gradient(x_real, x_syn, phi0, phi1):\n",
" delta = 0.0001;\n",
" loss1 = compute_discriminator_loss(x_real, x_syn, phi0, phi1)\n",
" loss2 = compute_discriminator_loss(x_real, x_syn, phi0+delta, phi1)\n",
" loss3 = compute_discriminator_loss(x_real, x_syn, phi0, phi1+delta)\n",
" dl_dphi0 = (loss2-loss1) / delta\n",
" dl_dphi1 = (loss3-loss1) / delta\n",
"\n",
" return dl_dphi0, dl_dphi1\n",
"\n",
"# This routine performs gradient descent with the discriminator\n",
"def update_discriminator(x_real, x_syn, n_iter, phi0, phi1):\n",
"\n",
" # Define learning rate\n",
" alpha = 0.01\n",
"\n",
" # Get derivatives\n",
" print(\"Initial discriminator loss = \", compute_discriminator_loss(x_real, x_syn, phi0, phi1))\n",
" for iter in range(n_iter):\n",
" # Get gradient\n",
" dl_dphi0, dl_dphi1 = compute_discriminator_gradient(x_real, x_syn, phi0, phi1)\n",
" # Take a gradient step downhill\n",
" phi0 = phi0 - alpha * dl_dphi0 ;\n",
" phi1 = phi1 - alpha * dl_dphi1 ;\n",
"\n",
" print(\"Final Discriminator Loss= \", compute_discriminator_loss(x_real, x_syn, phi0, phi1))\n",
"\n",
" return phi0, phi1"
],
"metadata": {
"id": "zAxUPo3p0CIW"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's update the discriminator (sigmoid curve)\n",
"n_iter = 100\n",
"print(\"Initial parameters (phi0,phi1)\", phi0, phi1)\n",
"phi0, phi1 = update_discriminator(x_real, x_syn, n_iter, phi0, phi1)\n",
"print(\"Final parameters (phi0,phi1)\", phi0, phi1)\n",
"draw_data_model(x_real, x_syn, phi0, phi1)"
],
"metadata": {
"id": "FE_DeweeAbMc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's update the generator"
],
"metadata": {
"id": "pRv9myh0d3Xm"
}
},
{
"cell_type": "code",
"source": [
"def compute_generator_loss(z, theta, phi0, phi1):\n",
" # TODO -- Run the generator on the latent variables z with the parameters theta\n",
" # to generate new data x_syn\n",
" # Then run the discriminator on the new data to get the probability of being real\n",
" # The loss is the total negative log probability of being synthesized (i.e. of not being real)\n",
" # Replace this code\n",
" loss = 1\n",
"\n",
"\n",
" return loss"
],
"metadata": {
"id": "5uiLrFBvJFAr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Test generator loss to check you have it correct\n",
"loss = compute_generator_loss(z, theta, -2, 1)\n",
"print(\"True Loss = 13.78437035945412, Your loss=\", loss )"
],
"metadata": {
"id": "cqnU3dGPd6NK"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def compute_generator_gradient(z, theta, phi0, phi1):\n",
" delta = 0.0001\n",
" loss1 = compute_generator_loss(z,theta, phi0, phi1) ;\n",
" loss2 = compute_generator_loss(z,theta+delta, phi0, phi1) ;\n",
" dl_dtheta = (loss2-loss1)/ delta\n",
" return dl_dtheta\n",
"\n",
"def update_generator(z, theta, n_iter, phi0, phi1):\n",
" # Define learning rate\n",
" alpha = 0.02\n",
"\n",
" # Get derivatives\n",
" print(\"Initial generator loss = \", compute_generator_loss(z, theta, phi0, phi1))\n",
" for iter in range(n_iter):\n",
" # Get gradient\n",
" dl_dtheta = compute_generator_gradient(x_real, x_syn, phi0, phi1)\n",
" # Take a gradient step (uphill, since we are trying to make synthesized data less well classified by discriminator)\n",
" theta = theta + alpha * dl_dtheta ;\n",
"\n",
" print(\"Final generator loss = \", compute_generator_loss(z, theta, phi0, phi1))\n",
" return theta\n"
],
"metadata": {
"id": "P1Lqy922dqal"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"n_iter = 10\n",
"theta = 3.0\n",
"print(\"Theta before\", theta)\n",
"theta = update_generator(z, theta, n_iter, phi0, phi1)\n",
"print(\"Theta after\", theta)\n",
"\n",
"x_syn = generator(z,theta)\n",
"draw_data_model(x_real, x_syn, phi0, phi1)"
],
"metadata": {
"id": "Q6kUkMO1P8V0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Now let's define a full GAN loop\n",
"\n",
"# Initialize the parameters\n",
"theta = 3\n",
"phi0 = -2\n",
"phi1 = 1\n",
"\n",
"# Number of iterations for updating generator and discriminator\n",
"n_iter_discrim = 300\n",
"n_iter_gen = 3\n",
"\n",
"print(\"Final parameters (phi0,phi1)\", phi0, phi1)\n",
"for c_gan_iter in range(5):\n",
"\n",
" # Run generator to product syntehsized data\n",
" x_syn = generator(z, theta)\n",
" draw_data_model(x_real, x_syn, phi0, phi1)\n",
"\n",
" # Update the discriminator\n",
" print(\"Updating discriminator\")\n",
" phi0, phi1 = update_discriminator(x_real, x_syn, n_iter_discrim, phi0, phi1)\n",
" draw_data_model(x_real, x_syn, phi0, phi1)\n",
"\n",
" # Update the generator\n",
" print(\"Updating generator\")\n",
" theta = update_generator(z, theta, n_iter_gen, phi0, phi1)\n"
],
"metadata": {
"id": "pcbdK2agTO-y"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"You can see that the synthesized data (orange) is becoming closer to the true data (cyan). However, this is extremely unstable -- as you will find if you mess around with the number of iterations of each optimization and the total iterations overall."
],
"metadata": {
"id": "loMx0TQUgBs7"
}
}
]
}

View File

@@ -0,0 +1,246 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNyLnpoXgKN+RGCuTUszCAZ",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap15/15_2_Wasserstein_Distance.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 15.2: Wassserstein Distance**\n",
"\n",
"This notebook investigates the GAN toy example as illustred in figure 15.1 in the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import cm\n",
"from matplotlib.colors import ListedColormap\n",
"from scipy.optimize import linprog"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define two probability distributions\n",
"p = np.array([5, 3, 2, 1, 8, 7, 5, 9, 2, 1])\n",
"q = np.array([4, 10,1, 1, 4, 6, 3, 2, 0, 1])\n",
"p = p/np.sum(p);\n",
"q= q/np.sum(q);\n",
"\n",
"# Draw those distributions\n",
"fig, ax =plt.subplots(2,1);\n",
"x = np.arange(0,p.size,1)\n",
"ax[0].bar(x,p, color=\"#cccccc\")\n",
"ax[0].set_ylim([0,0.35])\n",
"ax[0].set_ylabel(\"p(x=i)\")\n",
"\n",
"ax[1].bar(x,q,color=\"#f47a60\")\n",
"ax[1].set_ylim([0,0.35])\n",
"ax[1].set_ylabel(\"q(x=j)\")\n",
"plt.show()"
],
"metadata": {
"id": "ZIfQwhd-AV6L"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO Define the distance matrix from figure 15.8d\n",
"# Replace this line\n",
"dist_mat = np.zeros((10,10))\n",
"\n",
"# vectorize the distance matrix\n",
"c = dist_mat.flatten()"
],
"metadata": {
"id": "EZSlZQzWBKTm"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define pretty colormap\n",
"my_colormap_vals_hex =('2a0902', '2b0a03', '2c0b04', '2d0c05', '2e0c06', '2f0d07', '300d08', '310e09', '320f0a', '330f0b', '34100b', '35110c', '36110d', '37120e', '38120f', '39130f', '3a1410', '3b1411', '3c1511', '3d1612', '3e1613', '3f1713', '401714', '411814', '421915', '431915', '451a16', '461b16', '471b17', '481c17', '491d18', '4a1d18', '4b1e19', '4c1f19', '4d1f1a', '4e201b', '50211b', '51211c', '52221c', '53231d', '54231d', '55241e', '56251e', '57261f', '58261f', '592720', '5b2821', '5c2821', '5d2922', '5e2a22', '5f2b23', '602b23', '612c24', '622d25', '632e25', '652e26', '662f26', '673027', '683027', '693128', '6a3229', '6b3329', '6c342a', '6d342a', '6f352b', '70362c', '71372c', '72372d', '73382e', '74392e', '753a2f', '763a2f', '773b30', '783c31', '7a3d31', '7b3e32', '7c3e33', '7d3f33', '7e4034', '7f4134', '804235', '814236', '824336', '834437', '854538', '864638', '874739', '88473a', '89483a', '8a493b', '8b4a3c', '8c4b3c', '8d4c3d', '8e4c3e', '8f4d3f', '904e3f', '924f40', '935041', '945141', '955242', '965343', '975343', '985444', '995545', '9a5646', '9b5746', '9c5847', '9d5948', '9e5a49', '9f5a49', 'a05b4a', 'a15c4b', 'a35d4b', 'a45e4c', 'a55f4d', 'a6604e', 'a7614e', 'a8624f', 'a96350', 'aa6451', 'ab6552', 'ac6552', 'ad6653', 'ae6754', 'af6855', 'b06955', 'b16a56', 'b26b57', 'b36c58', 'b46d59', 'b56e59', 'b66f5a', 'b7705b', 'b8715c', 'b9725d', 'ba735d', 'bb745e', 'bc755f', 'bd7660', 'be7761', 'bf7862', 'c07962', 'c17a63', 'c27b64', 'c27c65', 'c37d66', 'c47e67', 'c57f68', 'c68068', 'c78169', 'c8826a', 'c9836b', 'ca846c', 'cb856d', 'cc866e', 'cd876f', 'ce886f', 'ce8970', 'cf8a71', 'd08b72', 'd18c73', 'd28d74', 'd38e75', 'd48f76', 'd59077', 'd59178', 'd69279', 'd7937a', 'd8957b', 'd9967b', 'da977c', 'da987d', 'db997e', 'dc9a7f', 'dd9b80', 'de9c81', 'de9d82', 'df9e83', 'e09f84', 'e1a185', 'e2a286', 'e2a387', 'e3a488', 'e4a589', 'e5a68a', 'e5a78b', 'e6a88c', 'e7aa8d', 'e7ab8e', 'e8ac8f', 'e9ad90', 'eaae91', 'eaaf92', 'ebb093', 'ecb295', 'ecb396', 'edb497', 'eeb598', 'eeb699', 'efb79a', 'efb99b', 'f0ba9c', 'f1bb9d', 'f1bc9e', 'f2bd9f', 'f2bfa1', 'f3c0a2', 'f3c1a3', 'f4c2a4', 'f5c3a5', 'f5c5a6', 'f6c6a7', 'f6c7a8', 'f7c8aa', 'f7c9ab', 'f8cbac', 'f8ccad', 'f8cdae', 'f9ceb0', 'f9d0b1', 'fad1b2', 'fad2b3', 'fbd3b4', 'fbd5b6', 'fbd6b7', 'fcd7b8', 'fcd8b9', 'fcdaba', 'fddbbc', 'fddcbd', 'fddebe', 'fddfbf', 'fee0c1', 'fee1c2', 'fee3c3', 'fee4c5', 'ffe5c6', 'ffe7c7', 'ffe8c9', 'ffe9ca', 'ffebcb', 'ffeccd', 'ffedce', 'ffefcf', 'fff0d1', 'fff2d2', 'fff3d3', 'fff4d5', 'fff6d6', 'fff7d8', 'fff8d9', 'fffada', 'fffbdc', 'fffcdd', 'fffedf', 'ffffe0')\n",
"my_colormap_vals_dec = np.array([int(element,base=16) for element in my_colormap_vals_hex])\n",
"r = np.floor(my_colormap_vals_dec/(256*256))\n",
"g = np.floor((my_colormap_vals_dec - r *256 *256)/256)\n",
"b = np.floor(my_colormap_vals_dec - r * 256 *256 - g * 256)\n",
"my_colormap = ListedColormap(np.vstack((r,g,b)).transpose()/255.0)\n",
"\n",
"def draw_2D_heatmap(data, title, my_colormap):\n",
" # Make grid of intercept/slope values to plot\n",
" xv, yv = np.meshgrid(np.linspace(0, 1, 10), np.linspace(0, 1, 10))\n",
" fig,ax = plt.subplots()\n",
" fig.set_size_inches(4,4)\n",
" plt.imshow(data, cmap=my_colormap)\n",
" ax.set_title(title)\n",
" ax.set_xlabel('$q$'); ax.set_ylabel('$p$')\n",
" plt.show()"
],
"metadata": {
"id": "ABRANmp6F8iQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"draw_2D_heatmap(dist_mat,'Distance $|i-j|$', my_colormap)"
],
"metadata": {
"id": "G0HFPBXyHT6V"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define b to be the verticalconcatenation of p and q\n",
"b = np.hstack((p,q))[np.newaxis].transpose()"
],
"metadata": {
"id": "SfqeT3KlHWrt"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# TODO: Now construct the matrix A that has the initial distribution constraints\n",
"# so that Ap=b where p is the transport plan P vectorized rows first so p = np.flatten(P)\n",
"# Replace this line:\n",
"A = np.zeros((20,100))\n"
],
"metadata": {
"id": "7KrybL96IuNW"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now we have all of the things we need. The vectorized distance matrix $\\mathbf{c}$, the constraint matrix $\\mathbf{A}$, the vectorized and concatenated original distribution $\\mathbf{b}$. We can run the linear programming optimization."
],
"metadata": {
"id": "zEuEtU33S8Ly"
}
},
{
"cell_type": "code",
"source": [
"# We don't need the constraint that p>0 as this is the default\n",
"opt = linprog(c, A_eq=A, b_eq=b)\n",
"print(opt)"
],
"metadata": {
"id": "wCfsOVbeSmF5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Extract the answer and display"
],
"metadata": {
"id": "vpkkOOI2agyl"
}
},
{
"cell_type": "code",
"source": [
"P = np.array(opt.x).reshape(10,10)\n",
"draw_2D_heatmap(P,'Transport plan $\\mathbf{P}$', my_colormap)"
],
"metadata": {
"id": "nZGfkrbRV_D0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Compute the Wasserstein distance\n"
],
"metadata": {
"id": "ZEiRYRVgalsJ"
}
},
{
"cell_type": "code",
"source": [
"was = np.sum(P * dist_mat)\n",
"print(\"Wasserstein distance = \", was)"
],
"metadata": {
"id": "yiQ_8j-Raq3c"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"TODO -- Compute the\n",
"\n",
"* Forward KL divergence $D_{KL}[p,q]$ between these distributions\n",
"* Reverse KL divergence $D_{KL}[q,p]$ between these distributions\n",
"* Jensen-Shannon divergence $D_{JS}[p,q]$ between these distributions\n",
"\n",
"What do you conclude?"
],
"metadata": {
"id": "zf8yTusua71s"
}
}
]
}

View File

@@ -0,0 +1,235 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMJLViYIpiivB2A7YIuZmzU",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_1_1D_Normalizing_Flows.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 16.1: 1D normalizing flows**\n",
"\n",
"This notebook investigates a 1D normalizing flows example similar to that illustrated in figures 16.1 to 16.3 in the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"First we start with a base probability density function"
],
"metadata": {
"id": "IyVn-Gi-p7wf"
}
},
{
"cell_type": "code",
"source": [
"# Define the base pdf\n",
"def gauss_pdf(z, mu, sigma):\n",
" pr_z = np.exp( -0.5 * (z-mu) * (z-mu) / (sigma * sigma))/(np.sqrt(2*3.1413) * sigma)\n",
" return pr_z"
],
"metadata": {
"id": "ZIfQwhd-AV6L"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"z = np.arange(-3,3,0.01)\n",
"pr_z = gauss_pdf(z, 0, 1)\n",
"\n",
"fig,ax = plt.subplots()\n",
"ax.plot(z, pr_z)\n",
"ax.set_xlim([-3,3])\n",
"ax.set_xlabel('$z$')\n",
"ax.set_ylabel('$Pr(z)$')\n",
"plt.show();"
],
"metadata": {
"id": "gGh8RHmFp_Ls"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's define a nonlinear function that maps from the latent space $z$ to the observed data $x$."
],
"metadata": {
"id": "wVXi5qIfrL9T"
}
},
{
"cell_type": "code",
"source": [
"# Define a function that maps from the base pdf over z to the observed space x\n",
"def f(z):\n",
" x1 = 6/(1+np.exp(-(z-0.25)*1.5))-3\n",
" x2 = z\n",
" p = z * z/9\n",
" x = (1-p) * x1 + p * x2\n",
" return x\n",
"\n",
"# Compute gradient of that function using finite differences\n",
"def df_dz(z):\n",
" return (f(z+0.0001)-f(z-0.0001))/0.0002"
],
"metadata": {
"id": "shHdgZHjp52w"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"x = f(z)\n",
"fig, ax = plt.subplots()\n",
"ax.plot(z,x)\n",
"ax.set_xlim(-3,3)\n",
"ax.set_ylim(-3,3)\n",
"ax.set_xlabel('Latent variable, $z$')\n",
"ax.set_ylabel('Observed variable, $x$')\n",
"plt.show()"
],
"metadata": {
"id": "sz7bnCLUq3Qs"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's evaluate the density in the observed space using equation 16.1"
],
"metadata": {
"id": "rmI0BbuQyXoc"
}
},
{
"cell_type": "code",
"source": [
"# TODO -- plot the density in the observed space\n",
"# Replace these line\n",
"x = np.ones_like(z)\n",
"pr_x = np.ones_like(pr_z)\n"
],
"metadata": {
"id": "iPdiT_5gyNOD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Plot the density in the observed space\n",
"fig,ax = plt.subplots()\n",
"ax.plot(x, pr_x)\n",
"ax.set_xlim([-3,3])\n",
"ax.set_ylim([0, 0.5])\n",
"ax.set_xlabel('$x$')\n",
"ax.set_ylabel('$Pr(x)$')\n",
"plt.show();"
],
"metadata": {
"id": "Jlks8MW3zulA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's draw some samples from the new distribution (see section 16.1)"
],
"metadata": {
"id": "1c5rO0HHz-FV"
}
},
{
"cell_type": "code",
"source": [
"np.random.seed(1)\n",
"n_sample = 20\n",
"\n",
"# TODO -- Draw samples from the modeled density\n",
"# Replace this line\n",
"x_samples = np.ones((n_sample, 1))\n",
"\n"
],
"metadata": {
"id": "LIlTRfpZz2k_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Draw the samples\n",
"fig,ax = plt.subplots()\n",
"ax.plot(x, pr_x)\n",
"for x_sample in x_samples:\n",
" ax.plot([x_sample, x_sample], [0,0.1], 'r-')\n",
"\n",
"ax.set_xlim([-3,3])\n",
"ax.set_ylim([0, 0.5])\n",
"ax.set_xlabel('$x$')\n",
"ax.set_ylabel('$Pr(x)$')\n",
"plt.show();"
],
"metadata": {
"id": "JS__QPNv0vUA"
},
"execution_count": null,
"outputs": []
}
]
}

View File

@@ -0,0 +1,307 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMe8jb5kLJqkNSE/AwExTpa",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_2_Autoregressive_Flows.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 16.2: 1D autoregressive flows**\n",
"\n",
"This notebook investigates a 1D normalizing flows example similar to that illustrated in figure 16.7 in the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"First we'll define an invertible one dimensional function as in figure 16.5"
],
"metadata": {
"id": "jTK456TUd2FV"
}
},
{
"cell_type": "code",
"source": [
"# First let's make the 1D piecewise linear mapping as illustated in figure 16.5\n",
"def g(h, phi):\n",
" # TODO -- write this function (equation 16.12)\n",
" # Note: If you have the first printing of the book, there is a mistake in equation 16.12\n",
" # Check the errata for the correct equation (or figure it out yourself!)\n",
" # Replace this line:\n",
" h_prime = 1\n",
"\n",
"\n",
" return h_prime"
],
"metadata": {
"id": "zceww_9qFi00"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's test this out. If you managed to vectorize the routine above, then good for you\n",
"# but I'll assume you didn't and so we'll use a loop\n",
"\n",
"# Define the parameters\n",
"phi = np.array([0.2, 0.1, 0.4, 0.05, 0.25])\n",
"\n",
"# Run the function on an array\n",
"h = np.arange(0,1,0.01)\n",
"h_prime = np.zeros_like(h)\n",
"for i in range(len(h)):\n",
" h_prime[i] = g(h[i], phi)\n",
"\n",
"# Draw the function\n",
"fig, ax = plt.subplots()\n",
"ax.plot(h,h_prime, 'b-')\n",
"ax.set_xlim([0,1])\n",
"ax.set_ylim([0,1])\n",
"ax.set_xlabel('Input, $h$')\n",
"ax.set_ylabel('Output, $h^\\prime$')\n",
"plt.show()\n"
],
"metadata": {
"id": "CLXhYl9ZIuRN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We will also need the inverse of this function"
],
"metadata": {
"id": "zOCMYC0leOyZ"
}
},
{
"cell_type": "code",
"source": [
"# Define the inverse function\n",
"def g_inverse(h_prime, phi):\n",
" # Lot's of ways to do this, but we'll just do it by bracketing\n",
" h_low = 0\n",
" h_mid = 0.5\n",
" h_high = 0.999\n",
"\n",
" thresh = 0.0001\n",
" c_iter = 0\n",
" while(c_iter < 20 and h_high - h_low > thresh):\n",
" h_prime_low = g(h_low, phi)\n",
" h_prime_mid = g(h_mid, phi)\n",
" h_prime_high = g(h_high, phi)\n",
" if h_prime_mid < h_prime:\n",
" h_low = h_mid\n",
" else:\n",
" h_high = h_mid\n",
"\n",
" h_mid = h_low+(h_high-h_low)/2\n",
" c_iter+=1\n",
"\n",
" return h_mid"
],
"metadata": {
"id": "OIqFAgobeSM8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's define an autogressive flow. Let's switch to looking at figure 16.7.# We'll assume that our piecewise function will use five parameters phi1,phi2,phi3,phi4,phi5"
],
"metadata": {
"id": "t8XPxipfd7hz"
}
},
{
"cell_type": "code",
"source": [
"\n",
"def ReLU(preactivation):\n",
" activation = preactivation.clip(0.0)\n",
" return activation\n",
"\n",
"def softmax(x):\n",
" x = np.exp(x) ;\n",
" x = x/ np.sum(x) ;\n",
" return x\n",
"\n",
"# Return value of phi that doesn't depend on any of the iputs\n",
"def get_phi():\n",
" return np.array([0.2, 0.1, 0.4, 0.05, 0.25])\n",
"\n",
"# Compute values of phi that depend on h1\n",
"def shallow_network_phi_h1(h1, n_hidden=10):\n",
" # For neatness of code, we'll just define the parameters in the network definition itself\n",
" n_input = 1\n",
" np.random.seed(n_input)\n",
" beta0 = np.random.normal(size=(n_hidden,1))\n",
" Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
" beta1 = np.random.normal(size=(5,1))\n",
" Omega1 = np.random.normal(size=(5, n_hidden))\n",
" return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1]])))\n",
"\n",
"# Compute values of phi that depend on h1 and h2\n",
"def shallow_network_phi_h1h2(h1,h2,n_hidden=10):\n",
" # For neatness of code, we'll just define the parameters in the network definition itself\n",
" n_input = 2\n",
" np.random.seed(n_input)\n",
" beta0 = np.random.normal(size=(n_hidden,1))\n",
" Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
" beta1 = np.random.normal(size=(5,1))\n",
" Omega1 = np.random.normal(size=(5, n_hidden))\n",
" return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1],[h2]])))\n",
"\n",
"# Compute values of phi that depend on h1, h2, and h3\n",
"def shallow_network_phi_h1h2h3(h1,h2,h3, n_hidden=10):\n",
" # For neatness of code, we'll just define the parameters in the network definition itself\n",
" n_input = 3\n",
" np.random.seed(n_input)\n",
" beta0 = np.random.normal(size=(n_hidden,1))\n",
" Omega0 = np.random.normal(size=(n_hidden, n_input))\n",
" beta1 = np.random.normal(size=(5,1))\n",
" Omega1 = np.random.normal(size=(5, n_hidden))\n",
" return softmax(beta1 + Omega1 @ ReLU(beta0 + Omega0 @ np.array([[h1],[h2],[h3]])))"
],
"metadata": {
"id": "PnHGlZtcNEAI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The forward mapping as shown in figure 16.7 a"
],
"metadata": {
"id": "8fXeG4V44GVH"
}
},
{
"cell_type": "code",
"source": [
"def forward_mapping(h1,h2,h3,h4):\n",
" #TODO implement the forward mapping\n",
" #Replace this line:\n",
" h_prime1 = 0 ; h_prime2=0; h_prime3=0; h_prime4 = 0\n",
"\n",
" return h_prime1, h_prime2, h_prime3, h_prime4"
],
"metadata": {
"id": "N1zjnIoX0TRP"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The backward mapping as shown in figure 16.7b"
],
"metadata": {
"id": "H8vQfFwI4L7r"
}
},
{
"cell_type": "code",
"source": [
"def backward_mapping(h1_prime,h2_prime,h3_prime,h4_prime):\n",
" #TODO implement the backward mapping\n",
" #Replace this line:\n",
" h1=0; h2=0; h3=0; h4 = 0\n",
"\n",
" return h1,h2,h3,h4"
],
"metadata": {
"id": "HNcQTiVE4DMJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Finally, let's make sure that the network really can be inverted"
],
"metadata": {
"id": "W2IxFkuyZJyn"
}
},
{
"cell_type": "code",
"source": [
"# Test the network to see if it does invert correctly\n",
"h1 = 0.22; h2 = 0.41; h3 = 0.83; h4 = 0.53\n",
"print(\"Original h values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1,h2,h3,h4))\n",
"h1_prime, h2_prime, h3_prime, h4_prime = forward_mapping(h1,h2,h3,h4)\n",
"print(\"h_prime values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1_prime,h2_prime,h3_prime,h4_prime))\n",
"h1,h2,h3,h4 = backward_mapping(h1_prime,h2_prime,h3_prime,h4_prime)\n",
"print(\"Reconstructed h values %3.3f,%3.3f,%3.3f,%3.3f\"%(h1,h2,h3,h4))"
],
"metadata": {
"id": "RT7qvEFp700I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "sDknSPMLZmzh"
},
"execution_count": null,
"outputs": []
}
]
}

View File

@@ -0,0 +1,294 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNeCWINUqqUGKMcxsqPFTAh",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap16/16_3_Contraction_Mappings.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 16.3: Contraction mappings**\n",
"\n",
"This notebook investigates a 1D normalizing flows example similar to that illustrated in figure 16.9 in the book.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "OLComQyvCIJ7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define a function that is a contraction mapping\n",
"def f(z):\n",
" return 0.3 + 0.5 *z + 0.02 * np.sin(z*15)"
],
"metadata": {
"id": "4Pfz2KSghdVI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def draw_function(f, fixed_point=None):\n",
" z = np.arange(0,1,0.01)\n",
" z_prime = f(z)\n",
"\n",
" # Draw this function\n",
" fig, ax = plt.subplots()\n",
" ax.plot(z, z_prime,'c-')\n",
" ax.plot([0,1],[0,1],'k--')\n",
" if fixed_point!=None:\n",
" ax.plot(fixed_point, fixed_point, 'ro')\n",
" ax.set_xlim(0,1)\n",
" ax.set_ylim(0,1)\n",
" ax.set_xlabel('Input, $z$')\n",
" ax.set_ylabel('Output, f$[z]$')\n",
" plt.show()"
],
"metadata": {
"id": "zEwCbIx0hpAI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"draw_function(f)"
],
"metadata": {
"id": "k4e5Yu0fl8bz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's find where $\\mbox{f}[z]=z$ using fixed point iteration"
],
"metadata": {
"id": "DfgKrpCAjnol"
}
},
{
"cell_type": "code",
"source": [
"# Takes a function f and a starting point z\n",
"def fixed_point_iteration(f, z0):\n",
" # TODO -- write this function\n",
" # Print out the iterations as you go, so you can see the progress\n",
" # Set the maximum number of iterations to 20\n",
" # Replace this line\n",
" z_out = 0.5;\n",
"\n",
"\n",
"\n",
" return z_out"
],
"metadata": {
"id": "bAOBvZT-j3lv"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's test that and plot the solution"
],
"metadata": {
"id": "CAS0lgIomAa0"
}
},
{
"cell_type": "code",
"source": [
"# Now let's test that\n",
"z = fixed_point_iteration(f, 0.2)\n",
"draw_function(f, z)"
],
"metadata": {
"id": "EYQZJdNPk8Lg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's define another function\n",
"def f2(z):\n",
" return 0.7 + -0.6 *z + 0.03 * np.sin(z*15)\n",
"draw_function(f2)"
],
"metadata": {
"id": "4DipPiqVlnwJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Now let's test that\n",
"# TODO Before running this code, predict what you think will happen\n",
"z = fixed_point_iteration(f2, 0.9)\n",
"draw_function(f2, z)"
],
"metadata": {
"id": "tYOdbWcomdEE"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Let's define another function\n",
"# Define a function that is a contraction mapping\n",
"def f3(z):\n",
" return -0.2 + 1.5 *z + 0.1 * np.sin(z*15)\n",
"draw_function(f3)"
],
"metadata": {
"id": "Mni37RUpmrIu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Now let's test that\n",
"# TODO Before running this code, predict what you think will happen\n",
"z = fixed_point_iteration(f3, 0.7)\n",
"draw_function(f3, z)"
],
"metadata": {
"id": "agt5mfJrnM1O"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Finally, let's invert a problem of the form $y = z+ f[z]$ for a given value of $y$. What is the $z$ that maps to it?"
],
"metadata": {
"id": "n6GI46-ZoQz6"
}
},
{
"cell_type": "code",
"source": [
"def f4(z):\n",
" return -0.3 + 0.5 *z + 0.02 * np.sin(z*15)"
],
"metadata": {
"id": "dy6r3jr9rjPf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def fixed_point_iteration_z_plus_f(f, y, z0):\n",
" # TODO -- write this function\n",
" # Replace this line\n",
" z_out = 1\n",
"\n",
" return z_out"
],
"metadata": {
"id": "GMX64Iz0nl-B"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def draw_function2(f, y, fixed_point=None):\n",
" z = np.arange(0,1,0.01)\n",
" z_prime = z+f(z)\n",
"\n",
" # Draw this function\n",
" fig, ax = plt.subplots()\n",
" ax.plot(z, z_prime,'c-')\n",
" ax.plot(z, y-f(z),'r-')\n",
" ax.plot([0,1],[0,1],'k--')\n",
" if fixed_point!=None:\n",
" ax.plot(fixed_point, y, 'ro')\n",
" ax.set_xlim(0,1)\n",
" ax.set_ylim(0,1)\n",
" ax.set_xlabel('Input, $z$')\n",
" ax.set_ylabel('Output, z+f$[z]$')\n",
" plt.show()"
],
"metadata": {
"id": "uXxKHad5qT8Y"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Test this out and draw\n",
"y = 0.8\n",
"z = fixed_point_iteration_z_plus_f(f4,y,0.2)\n",
"draw_function2(f4,y,z)\n",
"# If you have done this correctly, the red dot should be\n",
"# where the cyan curve has a y value of 0.8"
],
"metadata": {
"id": "mNEBXC3Aqd_1"
},
"execution_count": null,
"outputs": []
}
]
}

View File

@@ -1,277 +0,0 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMJvfoCDFcSK7Z0/HkcGunb",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notesbooks/Chap11/11_2_Residual_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 11.2: Residual Networks**\n",
"\n",
"This notebook adapts the networks for MNIST1D to use residual connections.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
"\n"
],
"metadata": {
"id": "t9vk9Elugvmi"
}
},
{
"cell_type": "code",
"source": [
"# Run this if you're in a Colab to make a local copy of the MNIST 1D repository\n",
"!git clone https://github.com/greydanus/mnist1d"
],
"metadata": {
"id": "D5yLObtZCi9J"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import os\n",
"import torch, torch.nn as nn\n",
"from torch.utils.data import TensorDataset, DataLoader\n",
"from torch.optim.lr_scheduler import StepLR\n",
"import matplotlib.pyplot as plt\n",
"import mnist1d\n",
"import random"
],
"metadata": {
"id": "YrXWAH7sUWvU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"args = mnist1d.data.get_dataset_args()\n",
"data = mnist1d.data.get_dataset(args, path='./mnist1d_data.pkl', download=False, regenerate=False)\n",
"\n",
"# The training and test input and outputs are in\n",
"# data['x'], data['y'], data['x_test'], and data['y_test']\n",
"print(\"Examples in training set: {}\".format(len(data['y'])))\n",
"print(\"Examples in test set: {}\".format(len(data['y_test'])))\n",
"print(\"Length of each example: {}\".format(data['x'].shape[-1]))"
],
"metadata": {
"id": "twI72ZCrCt5z"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Load in the data\n",
"train_data_x = data['x'].transpose()\n",
"train_data_y = data['y']\n",
"val_data_x = data['x_test'].transpose()\n",
"val_data_y = data['y_test']\n",
"# Print out sizes\n",
"print(\"Train data: %d examples (columns), each of which has %d dimensions (rows)\"%((train_data_x.shape[1],train_data_x.shape[0])))\n",
"print(\"Validation data: %d examples (columns), each of which has %d dimensions (rows)\"%((val_data_x.shape[1],val_data_x.shape[0])))"
],
"metadata": {
"id": "8bKADvLHbiV5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Define the network"
],
"metadata": {
"id": "_sFvRDGrl4qe"
}
},
{
"cell_type": "code",
"source": [
"# There are 40 input dimensions and 10 output dimensions for this data\n",
"# The inputs correspond to the 40 offsets in the MNIST1D template.\n",
"D_i = 40\n",
"# The outputs correspond to the 10 digits\n",
"D_o = 10\n",
"\n",
"\n",
"# We will adapt this model to have residual connections around the linear layers\n",
"# This is the same model we used in practical 8.1, but we can't use the sequential\n",
"# class for residual networks (which aren't strictly sequential). Hence, I've rewritten\n",
"# it as a model that inherits from a base class\n",
"\n",
"class ResidualNetwork(torch.nn.Module):\n",
" def __init__(self, input_size, output_size, hidden_size=100):\n",
" super(ResidualNetwork, self).__init__()\n",
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
" self.linear4 = nn.Linear(hidden_size, output_size)\n",
" print(\"Initialized MLPBase model with {} parameters\".format(self.count_params()))\n",
"\n",
" def count_params(self):\n",
" return sum([p.view(-1).shape[0] for p in self.parameters()])\n",
"\n",
"# # TODO -- Add residual connections to this model\n",
"# # The order of operations should similar to figure 11.5b\n",
"# # linear1 first, ReLU+linear2 in first residual block, ReLU+linear3 in second residual block), linear4 at end\n",
"# # Replace this function\n",
" def forward(self, x):\n",
" h1 = self.linear1(x).relu()\n",
" h2 = self.linear2(h1).relu()\n",
" h3 = self.linear3(h2).relu()\n",
" return self.linear4(h3)\n"
],
"metadata": {
"id": "FslroPJJffrh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# He initialization of weights\n",
"def weights_init(layer_in):\n",
" if isinstance(layer_in, nn.Linear):\n",
" nn.init.kaiming_uniform_(layer_in.weight)\n",
" layer_in.bias.data.fill_(0.0)"
],
"metadata": {
"id": "YgLaex1pfhqz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#Define the model\n",
"model = ResidualNetwork(40, 10)\n",
"\n",
"# choose cross entropy loss function (equation 5.24 in the loss notes)\n",
"loss_function = nn.CrossEntropyLoss()\n",
"# construct SGD optimizer and initialize learning rate and momentum\n",
"optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
"# object that decreases learning rate by half every 20 epochs\n",
"scheduler = StepLR(optimizer, step_size=20, gamma=0.5)\n",
"# create 100 dummy data points and store in data loader class\n",
"x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
"y_train = torch.tensor(train_data_y.astype('long'))\n",
"x_val= torch.tensor(val_data_x.transpose().astype('float32'))\n",
"y_val = torch.tensor(val_data_y.astype('long'))\n",
"\n",
"# load the data into a class that creates the batches\n",
"data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
"\n",
"# Initialize model weights\n",
"model.apply(weights_init)\n",
"\n",
"# loop over the dataset n_epoch times\n",
"n_epoch = 100\n",
"# store the loss and the % correct at each epoch\n",
"losses_train = np.zeros((n_epoch))\n",
"errors_train = np.zeros((n_epoch))\n",
"losses_val = np.zeros((n_epoch))\n",
"errors_val = np.zeros((n_epoch))\n",
"\n",
"for epoch in range(n_epoch):\n",
" # loop over batches\n",
" for i, data in enumerate(data_loader):\n",
" # retrieve inputs and labels for this batch\n",
" x_batch, y_batch = data\n",
" # zero the parameter gradients\n",
" optimizer.zero_grad()\n",
" # forward pass -- calculate model output\n",
" pred = model(x_batch)\n",
" # compute the loss\n",
" loss = loss_function(pred, y_batch)\n",
" # backward pass\n",
" loss.backward()\n",
" # SGD update\n",
" optimizer.step()\n",
"\n",
" # Run whole dataset to get statistics -- normally wouldn't do this\n",
" pred_train = model(x_train)\n",
" pred_val = model(x_val)\n",
" _, predicted_train_class = torch.max(pred_train.data, 1)\n",
" _, predicted_val_class = torch.max(pred_val.data, 1)\n",
" errors_train[epoch] = 100 - 100 * (predicted_train_class == y_train).float().sum() / len(y_train)\n",
" errors_val[epoch]= 100 - 100 * (predicted_val_class == y_val).float().sum() / len(y_val)\n",
" losses_train[epoch] = loss_function(pred_train, y_train).item()\n",
" losses_val[epoch]= loss_function(pred_val, y_val).item()\n",
" print(f'Epoch {epoch:5d}, train loss {losses_train[epoch]:.6f}, train error {errors_train[epoch]:3.2f}, val loss {losses_val[epoch]:.6f}, percent error {errors_val[epoch]:3.2f}')\n",
"\n",
" # tell scheduler to consider updating learning rate\n",
" scheduler.step()"
],
"metadata": {
"id": "NYw8I_3mmX5c"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Plot the results\n",
"fig, ax = plt.subplots()\n",
"ax.plot(errors_train,'r-',label='train')\n",
"ax.plot(errors_val,'b-',label='test')\n",
"ax.set_ylim(0,100); ax.set_xlim(0,n_epoch)\n",
"ax.set_xlabel('Epoch'); ax.set_ylabel('Error')\n",
"ax.set_title('TrainError %3.2f, Val Error %3.2f'%(errors_train[-1],errors_val[-1]))\n",
"ax.legend()\n",
"plt.show()"
],
"metadata": {
"id": "CcP_VyEmE2sv"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The primary motivation of residual networks is to allow training of much deeper networks. \n",
"\n",
"TODO: Try running this network with and without the residual connections. Does adding the residual connections change the performance?"
],
"metadata": {
"id": "wMmqhmxuAx0M"
}
}
]
}

View File

@@ -7,7 +7,7 @@ To be published by MIT Press Dec 5th 2023.<br>
<h2> Download draft PDF </h2>
<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.2/UnderstandingDeepLearning_06_08_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-08-06. CC-BY-NC-ND license
<a href="https://github.com/udlbook/udlbook/releases/download/v1.1.3/UnderstandingDeepLearning_01_10_23_C.pdf">Draft PDF Chapters 1-21</a><br> 2023-10-01. CC-BY-NC-ND license
<br>
<img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
<br>
@@ -116,15 +116,15 @@ Instructions for editing figures / equations can be found <a href="https://drive
<li> Notebook 10.3 - 2D convolution: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_3_2D_Convolution.ipynb">ipynb/colab </a>
<li> Notebook 10.4 - Downsampling & upsampling: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_4_Downsampling_and_Upsampling.ipynb">ipynb/colab </a>
<li> Notebook 10.5 - Convolution for MNIST: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap10/10_5_Convolution_For_MNIST.ipynb">ipynb/colab </a>
<li> Notebook 11.1 - Shattered gradients: (coming soon)
<li> Notebook 11.2 - Residual networks: (coming soon)
<li> Notebook 11.3 - Batch normalization: (coming soon)
<li> Notebook 12.1 - Self-attention: (coming soon)
<li> Notebook 12.2 - Multi-head self-attention: (coming soon)
<li> Notebook 12.3 - Tokenization: (coming soon)
<li> Notebook 11.1 - Shattered gradients: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_1_Shattered_Gradients.ipynb">ipynb/colab </a>
<li> Notebook 11.2 - Residual networks: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_2_Residual_Networks.ipynb">ipynb/colab </a>
<li> Notebook 11.3 - Batch normalization: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap11/11_3_Batch_Normalization.ipynb">ipynb/colab </a>
<li> Notebook 12.1 - Self-attention: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_1_Self_Attention.ipynb">ipynb/colab </a>
<li> Notebook 12.2 - Multi-head self-attention: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_2_Multihead_Self_Attention.ipynb">ipynb/colab </a>
<li> Notebook 12.3 - Tokenization: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_3_Tokenization.ipynb">ipynb/colab </a>
<li> Notebook 12.4 - Decoding strategies: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap12/12_4_Decoding_Strategies.ipynb">ipynb/colab </a>
<li> Notebook 13.1 - Encoding graphs: (coming soon)
<li> Notebook 13.2 - Graph classification : (coming soon)
<li> Notebook 13.1 - Encoding graphs: <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_1_Graph_Representation.ipynb">ipynb/colab </a>
<li> Notebook 13.2 - Graph classification : <a href="https://github.com/udlbook/udlbook/blob/main/Notebooks/Chap13/13_2_Graph_Classification.ipynb">ipynb/colab </a>
<li> Notebook 13.3 - Neighborhood sampling: (coming soon)
<li> Notebook 13.4 - Graph attention: (coming soon)
<li> Notebook 15.1 - GAN toy example: (coming soon)