Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9f2449fcde | ||
|
|
025b677457 | ||
|
|
435971e3e2 | ||
|
|
6e76cb9b96 | ||
|
|
732fc6f0b7 | ||
|
|
f2a3fab832 | ||
|
|
8e3008673d | ||
|
|
07bcc98a85 | ||
|
|
f4fa3e8397 | ||
|
|
21cff37c72 |
@@ -4,7 +4,6 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyOSb+W2AOFVQm8FZcHAb2Jq",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -199,7 +198,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1. The black dots show the training data. We'll compute the the likelihood and the negative log likelihood."
|
||||
"The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1. The black dots show the training data. We'll compute the likelihood and the negative log likelihood."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
@@ -208,7 +207,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Return probability under Bernoulli distribution for input x\n",
|
||||
"# Return probability under Bernoulli distribution for observed class y\n",
|
||||
"def bernoulli_distribution(y, lambda_param):\n",
|
||||
" # TODO-- write in the equation for the Bernoulli distribution\n",
|
||||
" # Equation 5.17 from the notes (you will need np.power)\n",
|
||||
@@ -269,7 +268,7 @@
|
||||
"source": [
|
||||
"# Let's test this\n",
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"# Use our neural network to predict the Bernoulli parameter lambda\n",
|
||||
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"lambda_train = sigmoid(model_out)\n",
|
||||
"# Compute the likelihood\n",
|
||||
@@ -336,7 +335,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's investigate finding the maximum likelihood / minimum negative log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, beta_1 (formerly phi_0)"
|
||||
"Now let's investigate finding the maximum likelihood / minimum negative log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and negative log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, beta_1 (formerly phi_0)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "OgcRojvPWh4V"
|
||||
@@ -359,7 +358,7 @@
|
||||
" # Run the network with new parameters\n",
|
||||
" model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
" lambda_train = sigmoid(model_out)\n",
|
||||
" # Compute and store the three values\n",
|
||||
" # Compute and store the two values\n",
|
||||
" likelihoods[count] = compute_likelihood(y_train,lambda_train)\n",
|
||||
" nlls[count] = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
" # Draw the model for every 20th parameter setting\n",
|
||||
@@ -378,7 +377,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the offset beta1\n",
|
||||
"# Now let's plot the likelihood and negative log likelihood as a function of the value of the offset beta1\n",
|
||||
"fig, ax = plt.subplots()\n",
|
||||
"fig.tight_layout(pad=5.0)\n",
|
||||
"likelihood_color = 'tab:red'\n",
|
||||
@@ -430,7 +429,7 @@
|
||||
"source": [
|
||||
"They both give the same answer. But you can see from the likelihood above that the likelihood is very small unless the parameters are almost correct. So in practice, we would work with the negative log likelihood.<br><br>\n",
|
||||
"\n",
|
||||
"Again, to fit the full neural model we would vary all of the 10 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
|
||||
"Again, to fit the full neural model we would vary all of the 10 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\Omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\Omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
|
||||
"\n"
|
||||
],
|
||||
"metadata": {
|
||||
@@ -438,4 +437,4 @@
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -1,18 +1,16 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"colab_type": "text",
|
||||
"id": "view-in-github"
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "jSlFkICHwHQF"
|
||||
@@ -142,7 +140,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "PsgLZwsPxauP"
|
||||
@@ -209,13 +206,12 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
},
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue) The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue). The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -226,7 +222,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Return probability under Categorical distribution for input x\n",
|
||||
"# Return probability under categorical distribution for observed class y\n",
|
||||
"# Just take value from row k of lambda param where y =k,\n",
|
||||
"def categorical_distribution(y, lambda_param):\n",
|
||||
" return np.array([lambda_param[row, i] for i, row in enumerate (y)])"
|
||||
@@ -248,7 +244,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "R5z_0dzQMF35"
|
||||
@@ -286,7 +281,7 @@
|
||||
"source": [
|
||||
"# Let's test this\n",
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"# Use our neural network to predict the parameters of the categorical distribution\n",
|
||||
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"lambda_train = softmax(model_out)\n",
|
||||
"# Compute the likelihood\n",
|
||||
@@ -296,7 +291,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "HzphKgPfOvlk"
|
||||
@@ -318,7 +312,7 @@
|
||||
"source": [
|
||||
"# Return the negative log likelihood of the data under the model\n",
|
||||
"def compute_negative_log_likelihood(y_train, lambda_param):\n",
|
||||
" # TODO -- compute the likelihood of the data -- don't use the likelihood function above -- compute the negative sum of the log probabilities\n",
|
||||
" # TODO -- compute the negative log likelihood of the data -- don't use the likelihood function above -- compute the negative sum of the log probabilities\n",
|
||||
" # You will need np.sum(), np.log()\n",
|
||||
" # Replace the line below\n",
|
||||
" nll = 0\n",
|
||||
@@ -336,24 +330,23 @@
|
||||
"source": [
|
||||
"# Let's test this\n",
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"# Use our neural network to predict the parameters of the categorical distribution\n",
|
||||
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"# Pass the outputs through the softmax function\n",
|
||||
"lambda_train = softmax(model_out)\n",
|
||||
"# Compute the log likelihood\n",
|
||||
"# Compute the negative log likelihood\n",
|
||||
"nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
"# Let's double check we get the right answer before proceeding\n",
|
||||
"print(\"Correct answer = %9.9f, Your answer = %9.9f\"%(17.015457867,nll))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "OgcRojvPWh4V"
|
||||
},
|
||||
"source": [
|
||||
"Now let's investigate finding the maximum likelihood / minimum log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, $\\beta_1$ (formerly $\\phi_0$)"
|
||||
"Now let's investigate finding the maximum likelihood / minimum negative log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and negative log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, $\\beta_1$ (formerly $\\phi_0$)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -378,7 +371,7 @@
|
||||
" # Run the network with new parameters\n",
|
||||
" model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
" lambda_train = softmax(model_out)\n",
|
||||
" # Compute and store the three values\n",
|
||||
" # Compute and store the two values\n",
|
||||
" likelihoods[count] = compute_likelihood(y_train,lambda_train)\n",
|
||||
" nlls[count] = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
" # Draw the model for every 20th parameter setting\n",
|
||||
@@ -397,7 +390,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the offset beta1\n",
|
||||
"# Now let's plot the likelihood and negative log likelihood as a function of the value of the offset beta1\n",
|
||||
"fig, ax = plt.subplots()\n",
|
||||
"fig.tight_layout(pad=5.0)\n",
|
||||
"likelihood_color = 'tab:red'\n",
|
||||
@@ -440,7 +433,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "771G8N1Vk5A2"
|
||||
@@ -448,16 +440,15 @@
|
||||
"source": [
|
||||
"They both give the same answer. But you can see from the likelihood above that the likelihood is very small unless the parameters are almost correct. So in practice, we would work with the negative log likelihood.<br><br>\n",
|
||||
"\n",
|
||||
"Again, to fit the full neural model we would vary all of the 16 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
|
||||
"Again, to fit the full neural model we would vary all of the 16 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\Omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\Omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"authorship_tag": "ABX9TyOPv/l+ToaApJV7Nz+8AtpV",
|
||||
"include_colab_link": true,
|
||||
"provenance": []
|
||||
"provenance": [],
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
@@ -469,4 +460,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
}
|
||||
@@ -113,7 +113,7 @@
|
||||
" b = 0.33\n",
|
||||
" c = 0.66\n",
|
||||
" d = 1.0\n",
|
||||
" n_iter =0;\n",
|
||||
" n_iter = 0\n",
|
||||
"\n",
|
||||
" # While we haven't found the minimum closely enough\n",
|
||||
" while np.abs(b-c) > thresh and n_iter < max_iter:\n",
|
||||
@@ -131,8 +131,7 @@
|
||||
"\n",
|
||||
" print('Iter %d, a=%3.3f, b=%3.3f, c=%3.3f, d=%3.3f'%(n_iter, a,b,c,d))\n",
|
||||
"\n",
|
||||
" # Rule #1 If the HEIGHT at point A is less the HEIGHT at points B, C, and D then halve values of B, C, and D\n",
|
||||
" # i.e. bring them closer to the original point\n",
|
||||
" # Rule #1 If the HEIGHT at point A is less than the HEIGHT at points B, C, and D then halve values of B, C, and D\n",
|
||||
" # i.e. bring them closer to the original point\n",
|
||||
" # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
|
||||
" if (0):\n",
|
||||
@@ -140,7 +139,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
" # Rule #2 If the HEIGHT at point b is less than the HEIGHT at point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c becomes 2/3 between a and new d\n",
|
||||
" # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
|
||||
@@ -148,7 +147,7 @@
|
||||
" continue;\n",
|
||||
"\n",
|
||||
" # Rule #3 If the HEIGHT at point c is less than the HEIGHT at point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c becomes 2/3 between new a and d\n",
|
||||
" # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
|
||||
@@ -190,4 +189,4 @@
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -117,7 +117,7 @@
|
||||
"id": "QU5mdGvpTtEG"
|
||||
},
|
||||
"source": [
|
||||
"Now lets create compute the sum of squares loss for the training data"
|
||||
"Now let's compute the sum of squares loss for the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -317,7 +317,7 @@
|
||||
" b = 0.33 * max_dist\n",
|
||||
" c = 0.66 * max_dist\n",
|
||||
" d = 1.0 * max_dist\n",
|
||||
" n_iter =0;\n",
|
||||
" n_iter = 0\n",
|
||||
"\n",
|
||||
" # While we haven't found the minimum closely enough\n",
|
||||
" while np.abs(b-c) > thresh and n_iter < max_iter:\n",
|
||||
@@ -341,7 +341,7 @@
|
||||
" continue;\n",
|
||||
"\n",
|
||||
" # Rule #2 If point b is less than point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c becomes 2/3 between a and new d\n",
|
||||
" if lossb < lossc:\n",
|
||||
@@ -351,7 +351,7 @@
|
||||
" continue\n",
|
||||
"\n",
|
||||
" # Rule #2 If point c is less than point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c becomes 2/3 between new a and d\n",
|
||||
" a = b\n",
|
||||
|
||||
@@ -53,7 +53,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's create our training data 30 pairs {x_i, y_i}\n",
|
||||
"# Let's create our training data of 30 pairs {x_i, y_i}\n",
|
||||
"# We'll try to fit the Gabor model to these data\n",
|
||||
"data = np.array([[-1.920e+00,-1.422e+01,1.490e+00,-1.940e+00,-2.389e+00,-5.090e+00,\n",
|
||||
" -8.861e+00,3.578e+00,-6.010e+00,-6.995e+00,3.634e+00,8.743e-01,\n",
|
||||
@@ -128,7 +128,7 @@
|
||||
"id": "QU5mdGvpTtEG"
|
||||
},
|
||||
"source": [
|
||||
"Now lets create compute the sum of squares loss for the training data"
|
||||
"Now let's compute the sum of squares loss for the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -198,7 +198,7 @@
|
||||
" b = np.floor(my_colormap_vals_dec - r * 256 *256 - g * 256)\n",
|
||||
" my_colormap = ListedColormap(np.vstack((r,g,b)).transpose()/255.0)\n",
|
||||
"\n",
|
||||
" # Make grid of intercept/slope values to plot\n",
|
||||
" # Make grid of offset/frequency values to plot\n",
|
||||
" offsets_mesh, freqs_mesh = np.meshgrid(np.arange(-10,10.0,0.1), np.arange(2.5,22.5,0.1))\n",
|
||||
" loss_mesh = np.zeros_like(freqs_mesh)\n",
|
||||
" # Compute loss for every set of parameters\n",
|
||||
@@ -343,7 +343,7 @@
|
||||
" b = 0.33 * max_dist\n",
|
||||
" c = 0.66 * max_dist\n",
|
||||
" d = 1.0 * max_dist\n",
|
||||
" n_iter =0;\n",
|
||||
" n_iter = 0\n",
|
||||
"\n",
|
||||
" # While we haven't found the minimum closely enough\n",
|
||||
" while np.abs(b-c) > thresh and n_iter < max_iter:\n",
|
||||
@@ -367,7 +367,7 @@
|
||||
" continue;\n",
|
||||
"\n",
|
||||
" # Rule #2 If point b is less than point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c becomes 2/3 between a and new d\n",
|
||||
" if lossb < lossc:\n",
|
||||
@@ -377,7 +377,7 @@
|
||||
" continue\n",
|
||||
"\n",
|
||||
" # Rule #2 If point c is less than point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c becomes 2/3 between new a and d\n",
|
||||
" a = b\n",
|
||||
|
||||
@@ -61,7 +61,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Let's create our training data 30 pairs {x_i, y_i}\n",
|
||||
"# Let's create our training data of 30 pairs {x_i, y_i}\n",
|
||||
"# We'll try to fit the Gabor model to these data\n",
|
||||
"data = np.array([[-1.920e+00,-1.422e+01,1.490e+00,-1.940e+00,-2.389e+00,-5.090e+00,\n",
|
||||
" -8.861e+00,3.578e+00,-6.010e+00,-6.995e+00,3.634e+00,8.743e-01,\n",
|
||||
@@ -137,7 +137,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now lets compute the sum of squares loss for the training data and plot the loss function"
|
||||
"Now let's compute the sum of squares loss for the training data and plot the loss function"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "QU5mdGvpTtEG"
|
||||
@@ -160,7 +160,7 @@
|
||||
" b = np.floor(my_colormap_vals_dec - r * 256 *256 - g * 256)\n",
|
||||
" my_colormap = ListedColormap(np.vstack((r,g,b)).transpose()/255.0)\n",
|
||||
"\n",
|
||||
" # Make grid of intercept/slope values to plot\n",
|
||||
" # Make grid of offset/frequency values to plot\n",
|
||||
" offsets_mesh, freqs_mesh = np.meshgrid(np.arange(-10,10.0,0.1), np.arange(2.5,22.5,0.1))\n",
|
||||
" loss_mesh = np.zeros_like(freqs_mesh)\n",
|
||||
" # Compute loss for every set of parameters\n",
|
||||
@@ -365,7 +365,6 @@
|
||||
"\n",
|
||||
" # Update the parameters\n",
|
||||
" phi_all[:,c_step+1:c_step+2] = phi_all[:,c_step:c_step+1] - alpha * momentum\n",
|
||||
" # Measure loss and draw model every 8th step\n",
|
||||
"\n",
|
||||
"loss = compute_loss(data[0,:], data[1,:], model, phi_all[:,c_step+1:c_step+2])\n",
|
||||
"draw_model(data,model,phi_all[:,c_step+1], \"Iteration %d, loss = %f\"%(c_step+1,loss))\n",
|
||||
@@ -387,4 +386,4 @@
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -110,7 +110,7 @@
|
||||
" ax.plot(opt_path[0,:], opt_path[1,:],'-', color='#a0d9d3ff')\n",
|
||||
" ax.plot(opt_path[0,:], opt_path[1,:],'.', color='#a0d9d3ff',markersize=10)\n",
|
||||
" ax.set_xlabel(\"$\\phi_{0}$\")\n",
|
||||
" ax.set_ylabel(\"$\\phi_1}$\")\n",
|
||||
" ax.set_ylabel(\"$\\phi_{1}$\")\n",
|
||||
" plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
@@ -169,7 +169,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Because the function changes much faster in $\\phi_1$ than in $\\phi_0$, there is no great step size to choose. If we set the step size so that it makes sensible progress in the $\\phi_1$, then it takes many iterations to converge. If we set the step size tso that we make sensible progress in the $\\phi_{0}$ direction, then the path oscillates in the $\\phi_1$ direction. \n",
|
||||
"Because the function changes much faster in $\\phi_1$ than in $\\phi_0$, there is no great step size to choose. If we set the step size so that it makes sensible progress in the $\\phi_1$ direction, then it takes many iterations to converge. If we set the step size so that we make sensible progress in the $\\phi_{0}$ direction, then the path oscillates in the $\\phi_1$ direction. \n",
|
||||
"\n",
|
||||
"This motivates Adam. At the core of Adam is the idea that we should just determine which way is downhill along each axis (i.e. left/right for $\\phi_0$ or up/down for $\\phi_1$) and move a fixed distance in that direction."
|
||||
],
|
||||
@@ -285,4 +285,4 @@
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyOdSkjfQnSZXnffGsZVM7r5",
|
||||
"authorship_tag": "ABX9TyO/wJ4N9w01f04mmrs/ZSHY",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -185,10 +185,10 @@
|
||||
"np.set_printoptions(precision=3)\n",
|
||||
"output = graph_attention(X, omega, beta, phi, A);\n",
|
||||
"print(\"Correct answer is:\")\n",
|
||||
"print(\"[[1.796 1.346 0.569 1.703 1.298 1.224 1.24 1.234]\")\n",
|
||||
"print(\" [0.768 0.672 0. 0.529 3.841 4.749 5.376 4.761]\")\n",
|
||||
"print(\" [0.305 0.129 0. 0.341 0.785 1.014 1.113 1.024]\")\n",
|
||||
"print(\" [0. 0. 0. 0. 0.35 0.864 1.098 0.871]]]\")\n",
|
||||
"print(\"[[0. 0.028 0.37 0. 0.97 0. 0. 0.698]\")\n",
|
||||
"print(\" [0. 0. 0. 0. 1.184 0. 2.654 0. ]\")\n",
|
||||
"print(\" [1.13 0.564 0. 1.298 0.268 0. 0. 0.779]\")\n",
|
||||
"print(\" [0.825 0. 0. 1.175 0. 0. 0. 0. ]]]\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"print(\"Your answer is:\")\n",
|
||||
|
||||
BIN
UDL_Errata.pdf
BIN
UDL_Errata.pdf
Binary file not shown.
@@ -15,8 +15,8 @@
|
||||
<ul>
|
||||
<li>
|
||||
<p style="font-size: larger; margin-bottom: 0">Download full PDF <a
|
||||
href="https://github.com/udlbook/udlbook/releases/download/v.1.20/UnderstandingDeepLearning_16_1_24_C.pdf">here</a>
|
||||
</p>2024-01-16. CC-BY-NC-ND license<br>
|
||||
href="https://github.com/udlbook/udlbook/releases/download/v2.00/UnderstandingDeepLearning_28_01_24_C.pdf">here</a>
|
||||
</p>2024-01-28. CC-BY-NC-ND license<br>
|
||||
<img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
|
||||
</li>
|
||||
<li> Order your copy from <a href="https://mitpress.mit.edu/9780262048644/understanding-deep-learning/">here </a></li>
|
||||
|
||||
Reference in New Issue
Block a user