This commit is contained in:
Simon Prince
2024-04-18 17:41:24 -04:00
22 changed files with 1699 additions and 127 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,18 +1,16 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap01/1_1_BackgroundMathematics.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "s5zzKSOusPOB"
@@ -41,7 +39,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "WV2Dl6owme2d"
@@ -49,11 +46,11 @@
"source": [
"**Linear functions**<br> We will be using the term *linear equation* to mean a weighted sum of inputs plus an offset. If there is just one input $x$, then this is a straight line:\n",
"\n",
"\\begin{equation}y=\\beta+\\omega x,\\end{equation} \n",
"\\begin{equation}y=\\beta+\\omega x,\\end{equation}\n",
"\n",
"where $\\beta$ is the y-intercept of the linear and $\\omega$ is the slope of the line. When there are two inputs $x_{1}$ and $x_{2}$, then this becomes:\n",
"\n",
"\\begin{equation}y=\\beta+\\omega_1 x_1 + \\omega_2 x_2.\\end{equation} \n",
"\\begin{equation}y=\\beta+\\omega_1 x_1 + \\omega_2 x_2.\\end{equation}\n",
"\n",
"Any other functions are by definition **non-linear**.\n",
"\n",
@@ -99,7 +96,7 @@
"ax.plot(x,y,'r-')\n",
"ax.set_ylim([0,10]);ax.set_xlim([0,10])\n",
"ax.set_xlabel('x'); ax.set_ylabel('y')\n",
"plt.show\n",
"plt.show()\n",
"\n",
"# TODO -- experiment with changing the values of beta and omega\n",
"# to understand what they do. Try to make a line\n",
@@ -107,7 +104,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "AedfvD9dxShZ"
@@ -192,7 +188,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "i8tLwpls476R"
@@ -236,7 +231,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "fGzVJQ6N-mHJ"
@@ -275,11 +269,10 @@
"# Compute with vector/matrix form\n",
"y_vec = beta_vec+np.matmul(omega_mat, x_vec)\n",
"print(\"Matrix/vector form\")\n",
"print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0],y_vec[1])))\n"
"print('y1= %3.3f\\ny2 = %3.3f'%((y_vec[0][0],y_vec[1][0])))\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "3LGRoTMLU8ZU"
@@ -293,7 +286,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "7Y5zdKtKZAB2"
@@ -325,11 +317,10 @@
"ax.plot(x,y,'r-')\n",
"ax.set_ylim([0,100]);ax.set_xlim([-5,5])\n",
"ax.set_xlabel('x'); ax.set_ylabel('exp[x]')\n",
"plt.show"
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "XyrT8257IWCu"
@@ -345,7 +336,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "R6A4e5IxIWCu"
@@ -373,11 +363,10 @@
"ax.plot(x,y,'r-')\n",
"ax.set_ylim([-5,5]);ax.set_xlim([0,5])\n",
"ax.set_xlabel('x'); ax.set_ylabel('$\\log[x]$')\n",
"plt.show"
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "yYWrL5AXIWCv"
@@ -397,8 +386,8 @@
],
"metadata": {
"colab": {
"include_colab_link": true,
"provenance": []
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
@@ -420,4 +409,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}

View File

@@ -4,7 +4,6 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyOmndC0N7dFV7W3Mh5ljOLl",
"include_colab_link": true
},
"kernelspec": {
@@ -235,8 +234,8 @@
"levels = 40\n",
"ax.contour(phi0_mesh, phi1_mesh, all_losses ,levels, colors=['#80808080'])\n",
"ax.set_ylim([1,-1])\n",
"ax.set_xlabel('Intercept, $\\phi_0$')\n",
"ax.set_ylabel('Slope, $\\phi_1$')\n",
"ax.set_xlabel(r'Intercept, $\\phi_0$')\n",
"ax.set_ylabel(r'Slope, $\\phi_1$')\n",
"\n",
"# Plot the position of your best fitting line on the loss function\n",
"# It should be close to the minimum\n",
@@ -250,4 +249,4 @@
"outputs": []
}
]
}
}

View File

@@ -1,18 +1,16 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap03/3_1_Shallow_Networks_I.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "1Z6LB4Ybn1oN"
@@ -42,7 +40,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "wQDy9UzXpnf5"
@@ -102,8 +99,8 @@
"source": [
"# Define a shallow neural network with, one input, one output, and three hidden units\n",
"def shallow_1_1_3(x, activation_fn, phi_0,phi_1,phi_2,phi_3, theta_10, theta_11, theta_20, theta_21, theta_30, theta_31):\n",
" # TODO Replace the lines below to compute the three initial lines\n",
" # (figure 3.3a-c) from the theta parameters. These are the preactivations\n",
" # TODO Replace the code below to compute the three initial lines\n",
" # from the theta parameters (i.e. implement equations at bottom of figure 3.3a-c). These are the preactivations\n",
" pre_1 = np.zeros_like(x)\n",
" pre_2 = np.zeros_like(x)\n",
" pre_3 = np.zeros_like(x)\n",
@@ -199,7 +196,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "T34bszToImKQ"
@@ -210,7 +206,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "jhaBSS8oIWSX"
@@ -269,7 +264,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "osonHsEqVp2I"
@@ -354,9 +348,8 @@
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyPBNztJrxnUt1ELWfm1Awa3",
"include_colab_link": true,
"provenance": []
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
@@ -368,4 +361,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}

View File

@@ -4,7 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPkFrjmRAUf0fxN07RC4xMI",
"authorship_tag": "ABX9TyPZzptvvf7OPZai8erQ/0xT",
"include_colab_link": true
},
"kernelspec": {
@@ -127,26 +127,26 @@
" fig, ax = plt.subplots(3,3)\n",
" fig.set_size_inches(8.5, 8.5)\n",
" fig.tight_layout(pad=3.0)\n",
" ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel('$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
" ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel('$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
" ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel('$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
" ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(\"$h_{1}^{'}$\")\n",
" ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(\"$h_{2}^{'}$\")\n",
" ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(\"$h_{3}^{'}$\")\n",
" ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(\"$\\phi_1 h_{1}^{'}$\")\n",
" ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(\"$\\phi_2 h_{2}^{'}$\")\n",
" ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(\"$\\phi_3 h_{3}^{'}$\")\n",
" ax[0,0].plot(x,layer2_pre_1,'r-'); ax[0,0].set_ylabel(r'$\\psi_{10}+\\psi_{11}h_{1}+\\psi_{12}h_{2}+\\psi_{13}h_3$')\n",
" ax[0,1].plot(x,layer2_pre_2,'b-'); ax[0,1].set_ylabel(r'$\\psi_{20}+\\psi_{21}h_{1}+\\psi_{22}h_{2}+\\psi_{23}h_3$')\n",
" ax[0,2].plot(x,layer2_pre_3,'g-'); ax[0,2].set_ylabel(r'$\\psi_{30}+\\psi_{31}h_{1}+\\psi_{32}h_{2}+\\psi_{33}h_3$')\n",
" ax[1,0].plot(x,h1_prime,'r-'); ax[1,0].set_ylabel(r\"$h_{1}^{'}$\")\n",
" ax[1,1].plot(x,h2_prime,'b-'); ax[1,1].set_ylabel(r\"$h_{2}^{'}$\")\n",
" ax[1,2].plot(x,h3_prime,'g-'); ax[1,2].set_ylabel(r\"$h_{3}^{'}$\")\n",
" ax[2,0].plot(x,phi1_h1_prime,'r-'); ax[2,0].set_ylabel(r\"$\\phi_1 h_{1}^{'}$\")\n",
" ax[2,1].plot(x,phi2_h2_prime,'b-'); ax[2,1].set_ylabel(r\"$\\phi_2 h_{2}^{'}$\")\n",
" ax[2,2].plot(x,phi3_h3_prime,'g-'); ax[2,2].set_ylabel(r\"$\\phi_3 h_{3}^{'}$\")\n",
"\n",
" for plot_y in range(3):\n",
" for plot_x in range(3):\n",
" ax[plot_y,plot_x].set_xlim([0,1]);ax[plot_x,plot_y].set_ylim([-1,1])\n",
" ax[plot_y,plot_x].set_aspect(0.5)\n",
" ax[2,plot_y].set_xlabel('Input, $x$');\n",
" ax[2,plot_y].set_xlabel(r'Input, $x$');\n",
" plt.show()\n",
"\n",
" fig, ax = plt.subplots()\n",
" ax.plot(x,y)\n",
" ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
" ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
" ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
" ax.set_aspect(0.5)\n",
" plt.show()"

View File

@@ -118,7 +118,7 @@
" ax.plot(x_model,y_model)\n",
" if sigma_model is not None:\n",
" ax.fill_between(x_model, y_model-2*sigma_model, y_model+2*sigma_model, color='lightgray')\n",
" ax.set_xlabel('Input, $x$'); ax.set_ylabel('Output, $y$')\n",
" ax.set_xlabel(r'Input, $x$'); ax.set_ylabel(r'Output, $y$')\n",
" ax.set_xlim([0,1]);ax.set_ylim([-1,1])\n",
" ax.set_aspect(0.5)\n",
" if title is not None:\n",
@@ -222,7 +222,7 @@
"gauss_prob = normal_distribution(y_gauss, mu, sigma)\n",
"fig, ax = plt.subplots()\n",
"ax.plot(y_gauss, gauss_prob)\n",
"ax.set_xlabel('Input, $y$'); ax.set_ylabel('Probability $Pr(y)$')\n",
"ax.set_xlabel(r'Input, $y$'); ax.set_ylabel(r'Probability $Pr(y)$')\n",
"ax.set_xlim([-5,5]);ax.set_ylim([0,1.0])\n",
"plt.show()\n",
"\n",
@@ -590,4 +590,4 @@
}
}
]
}
}

View File

@@ -119,12 +119,12 @@
" fig.set_size_inches(7.0, 3.5)\n",
" fig.tight_layout(pad=3.0)\n",
" ax[0].plot(x_model,out_model)\n",
" ax[0].set_xlabel('Input, $x$'); ax[0].set_ylabel('Model output')\n",
" ax[0].set_xlabel(r'Input, $x$'); ax[0].set_ylabel(r'Model output')\n",
" ax[0].set_xlim([0,1]);ax[0].set_ylim([-4,4])\n",
" if title is not None:\n",
" ax[0].set_title(title)\n",
" ax[1].plot(x_model,lambda_model)\n",
" ax[1].set_xlabel('Input, $x$'); ax[1].set_ylabel('$\\lambda$ or Pr(y=1|x)')\n",
" ax[1].set_xlabel(r'Input, $x$'); ax[1].set_ylabel(r'$\\lambda$ or Pr(y=1|x)')\n",
" ax[1].set_xlim([0,1]);ax[1].set_ylim([-0.05,1.05])\n",
" if title is not None:\n",
" ax[1].set_title(title)\n",

View File

@@ -4,7 +4,6 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyN4E9Vtuk6t2BhZ0Ajv5SW3",
"include_colab_link": true
},
"kernelspec": {
@@ -67,7 +66,7 @@
" fig,ax = plt.subplots()\n",
" ax.plot(phi_plot,loss_function(phi_plot),'r-')\n",
" ax.set_xlim(0,1); ax.set_ylim(0,1)\n",
" ax.set_xlabel('$\\phi$'); ax.set_ylabel('$L[\\phi]$')\n",
" ax.set_xlabel(r'$\\phi$'); ax.set_ylabel(r'$L[\\phi]$')\n",
" if a is not None and b is not None and c is not None and d is not None:\n",
" plt.axvspan(a, d, facecolor='k', alpha=0.2)\n",
" ax.plot([a,a],[0,1],'b-')\n",
@@ -189,4 +188,4 @@
"outputs": []
}
]
}
}

View File

@@ -108,8 +108,8 @@
" ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
" ax.plot(opt_path[0,:], opt_path[1,:],'-', color='#a0d9d3ff')\n",
" ax.plot(opt_path[0,:], opt_path[1,:],'.', color='#a0d9d3ff',markersize=10)\n",
" ax.set_xlabel(\"$\\phi_{0}$\")\n",
" ax.set_ylabel(\"$\\phi_{1}$\")\n",
" ax.set_xlabel(r\"$\\phi_{0}$\")\n",
" ax.set_ylabel(r\"$\\phi_{1}$\")\n",
" plt.show()"
],
"metadata": {

View File

@@ -83,6 +83,8 @@
{
"cell_type": "code",
"source": [
"!mkdir ./sample_data\n",
"\n",
"args = mnist1d.data.get_dataset_args()\n",
"data = mnist1d.data.get_dataset(args, path='./sample_data/mnist1d_data.pkl', download=False, regenerate=False)\n",
"\n",
@@ -136,7 +138,6 @@
"optimizer = torch.optim.SGD(model.parameters(), lr = 0.05, momentum=0.9)\n",
"# object that decreases learning rate by half every 10 epochs\n",
"scheduler = StepLR(optimizer, step_size=10, gamma=0.5)\n",
"# create 100 dummy data points and store in data loader class\n",
"x_train = torch.tensor(data['x'].astype('float32'))\n",
"y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
"x_test= torch.tensor(data['x_test'].astype('float32'))\n",
@@ -235,4 +236,4 @@
}
}
]
}
}

View File

@@ -92,7 +92,7 @@
{
"cell_type": "code",
"source": [
"# Draw the fitted function, together win uncertainty used to generate points\n",
"# Draw the fitted function, together with uncertainty used to generate points\n",
"def plot_function(x_func, y_func, x_data=None,y_data=None, x_model = None, y_model =None, sigma_func = None, sigma_model=None):\n",
"\n",
" fig,ax = plt.subplots()\n",
@@ -203,7 +203,7 @@
"# Closed form solution\n",
"beta, omega = fit_model_closed_form(x_data,y_data,n_hidden=3)\n",
"\n",
"# Get prediction for model across graph grange\n",
"# Get prediction for model across graph range\n",
"x_model = np.linspace(0,1,100);\n",
"y_model = network(x_model, beta, omega)\n",
"\n",
@@ -302,7 +302,7 @@
"sigma_func = 0.3\n",
"n_hidden = 5\n",
"\n",
"# Set random seed so that get same result every time\n",
"# Set random seed so that we get the same result every time\n",
"np.random.seed(1)\n",
"\n",
"for c_hidden in range(len(hidden_variables)):\n",
@@ -344,4 +344,4 @@
"outputs": []
}
]
}
}

View File

@@ -124,7 +124,7 @@
" D_k = n_hidden # Hidden dimensions\n",
" D_o = 10 # Output dimensions\n",
"\n",
" # Define a model with two hidden layers of size 100\n",
" # Define a model with two hidden layers\n",
" # And ReLU activations between them\n",
" model = nn.Sequential(\n",
" nn.Linear(D_i, D_k),\n",
@@ -157,7 +157,6 @@
" optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)\n",
"\n",
"\n",
" # create 100 dummy data points and store in data loader class\n",
" x_train = torch.tensor(data['x'].astype('float32'))\n",
" y_train = torch.tensor(data['y'].transpose().astype('long'))\n",
" x_test= torch.tensor(data['x_test'].astype('float32'))\n",
@@ -267,4 +266,4 @@
"outputs": []
}
]
}
}

View File

@@ -224,7 +224,7 @@
{
"cell_type": "markdown",
"source": [
"You should see see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
"You should see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
"\n",
"The conclusion of all of this is that in high dimensions you should be sceptical of your intuitions about how things work. I have tried to visualize many things in one or two dimensions in the book, but you should also be sceptical about these visualizations!"
],
@@ -233,4 +233,4 @@
}
}
]
}
}

View File

@@ -4,7 +4,6 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyOR3WOJwfTlMD8eOLsPfPrz",
"include_colab_link": true
},
"kernelspec": {
@@ -140,7 +139,7 @@
" fig.set_size_inches(7,7)\n",
" ax.contourf(phi0mesh, phi1mesh, loss_function, 256, cmap=my_colormap);\n",
" ax.contour(phi0mesh, phi1mesh, loss_function, 20, colors=['#80808080'])\n",
" ax.set_xlabel('$\\phi_{0}$'); ax.set_ylabel('$\\phi_{1}$')\n",
" ax.set_xlabel(r'$\\phi_{0}$'); ax.set_ylabel(r'$\\phi_{1}$')\n",
"\n",
" if grad_path_typical_lr is not None:\n",
" ax.plot(grad_path_typical_lr[0,:], grad_path_typical_lr[1,:],'ro-')\n",
@@ -335,4 +334,4 @@
}
}
]
}
}

View File

@@ -1,18 +1,16 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap09/9_4_Bayesian_Approach.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "el8l05WQEO46"
@@ -159,7 +157,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "i8T_QduzeBmM"
@@ -195,7 +192,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "JojV6ueRk49G"
@@ -211,7 +207,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "YX0O_Ciwp4W1"
@@ -277,7 +272,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "GjPnlG4q0UFK"
@@ -334,7 +328,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "GiNg5EroUiUb"
@@ -343,17 +336,16 @@
"Now we need to perform inference for a new data points $\\mathbf{x}^*$ with corresponding hidden values $\\mathbf{h}^*$. Instead of having a single estimate of the parameters, we have a distribution over the possible parameters. So we marginalize (integrate) over this distribution to account for all possible values:\n",
"\n",
"\\begin{align}\n",
"Pr(y^*|\\mathbf{x}^*) &=& \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
"&=& \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
"&=& \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y}, [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
"[\\mathbf{h}^*;1]\\biggr]\n",
"Pr(y^*|\\mathbf{x}^*) &= \\int Pr(y^{*}|\\mathbf{x}^*,\\boldsymbol\\phi)Pr(\\boldsymbol\\phi|\\{\\mathbf{x}_{i},\\mathbf{y}_{i}\\}) d\\boldsymbol\\phi\\\\\n",
"&= \\int \\text{Norm}_{y^*}\\bigl[[\\mathbf{h}^{*T},1]\\boldsymbol\\phi,\\sigma^2\\bigr]\\cdot\\text{Norm}_{\\boldsymbol\\phi}\\biggl[\\frac{1}{\\sigma^2}\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y},\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\biggr]d\\boldsymbol\\phi\\\\\n",
"&= \\text{Norm}_{y^*}\\biggl[\\frac{1}{\\sigma^2} [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\\mathbf{H}\\mathbf{y}, [\\mathbf{h}^{*T},1]\\left(\\frac{1}{\\sigma^2}\\mathbf{H}\\mathbf{H}^T+\\frac{1}{\\sigma_p^2}\\mathbf{I}\\right)^{-1}\n",
"[\\mathbf{h}^*;1]\\biggr],\n",
"\\end{align}\n",
"\n",
"where the notation $[\\mathbf{h}^{*T},1]$ is a row vector containing $\\mathbf{h}^{T}$ with a one appended to the end and $[\\mathbf{h};1 ]$ is a column vector containing $\\mathbf{h}$ with a one appended to the end.\n",
"\n",
"\n",
"\n",
"To compute this, we reformulated the integrand using the relations from appendices\n",
"C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
"To compute this, we reformulated the integrand using the relations from appendices C.3.3 and C.3.4 as the product of a normal distribution in $\\boldsymbol\\phi$ and a constant with respect\n",
"to $\\boldsymbol\\phi$. The integral of the normal distribution must be one, and so the final result is just the constant. This constant is itself a normal distribution in $y^*$. <br>\n",
"\n",
"If you feel so inclined you can work through the math of this yourself.\n",
@@ -404,7 +396,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "8Hcbe_16sK0F"
@@ -419,9 +410,8 @@
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyMB8B4269DVmrcLoCWrhzKF",
"include_colab_link": true,
"provenance": []
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
@@ -433,4 +423,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}

View File

@@ -4,7 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMLKg5ZmXqojcVrZD5BGm9g",
"authorship_tag": "ABX9TyP3VmRg51U+7NCfSYjRRrgv",
"include_colab_link": true
},
"kernelspec": {
@@ -267,8 +267,8 @@
" fig,ax = plt.subplots()\n",
" ax.plot(np.squeeze(x_in), np.squeeze(dydx), 'b-')\n",
" ax.set_xlim(-2,2)\n",
" ax.set_xlabel('Input, $x$')\n",
" ax.set_ylabel('Gradient, $dy/dx$')\n",
" ax.set_xlabel(r'Input, $x$')\n",
" ax.set_ylabel(r'Gradient, $dy/dx$')\n",
" ax.set_title('No layers = %d'%(K))\n",
" plt.show()"
],

View File

@@ -4,7 +4,6 @@
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMSk8qTqDYqFnRJVZKlsue0",
"include_colab_link": true
},
"kernelspec": {
@@ -147,9 +146,7 @@
" exp_values = np.exp(data_in) ;\n",
" # Sum over columns\n",
" denom = np.sum(exp_values, axis = 0);\n",
" # Replicate denominator to N rows\n",
" denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
" # Compute softmax\n",
" # Compute softmax (numpy broadcasts denominator to all rows automatically)\n",
" softmax = exp_values / denom\n",
" # return the answer\n",
" return softmax"

View File

@@ -128,7 +128,7 @@
{
"cell_type": "code",
"source": [
"draw_2D_heatmap(dist_mat,'Distance $|i-j|$', my_colormap)"
"draw_2D_heatmap(dist_mat,r'Distance $|i-j|$', my_colormap)"
],
"metadata": {
"id": "G0HFPBXyHT6V"
@@ -197,7 +197,7 @@
"cell_type": "code",
"source": [
"TP = np.array(opt.x).reshape(10,10)\n",
"draw_2D_heatmap(TP,'Transport plan $\\mathbf{P}$', my_colormap)"
"draw_2D_heatmap(TP,r'Transport plan $\\mathbf{P}$', my_colormap)"
],
"metadata": {
"id": "nZGfkrbRV_D0"

View File

@@ -1,18 +1,16 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap17/17_2_Reparameterization_Trick.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "t9vk9Elugvmi"
@@ -40,7 +38,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "paLz5RukZP1J"
@@ -114,7 +111,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "r5Hl2QkimWx9"
@@ -139,13 +135,12 @@
"\n",
"fig,ax = plt.subplots()\n",
"ax.plot(phi_vals, expected_vals,'r-')\n",
"ax.set_xlabel('Parameter $\\phi$')\n",
"ax.set_ylabel('$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"ax.set_xlabel(r'Parameter $\\phi$')\n",
"ax.set_ylabel(r'$\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "zTCykVeWqj_O"
@@ -253,13 +248,12 @@
"\n",
"fig,ax = plt.subplots()\n",
"ax.plot(phi_vals, deriv_vals,'r-')\n",
"ax.set_xlabel('Parameter $\\phi$')\n",
"ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"ax.set_xlabel(r'Parameter $\\phi$')\n",
"ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ASu4yKSwAEYI"
@@ -269,7 +263,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "xoFR1wifc8-b"
@@ -366,13 +359,12 @@
"\n",
"fig,ax = plt.subplots()\n",
"ax.plot(phi_vals, deriv_vals,'r-')\n",
"ax.set_xlabel('Parameter $\\phi$')\n",
"ax.set_ylabel('$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"ax.set_xlabel(r'Parameter $\\phi$')\n",
"ax.set_ylabel(r'$\\partial/\\partial\\phi\\mathbb{E}_{Pr(x|\\phi)}[f[x]]$')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "1TWBiUC7bQSw"
@@ -403,7 +395,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "d-0tntSYdKPR"
@@ -415,9 +406,8 @@
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyOxO2/0DTH4n4zhC97qbagY",
"include_colab_link": true,
"provenance": []
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
@@ -429,4 +419,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}

Binary file not shown.

Binary file not shown.