Compare commits
60 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
73fb6a2988 | ||
|
|
aa04c283e8 | ||
|
|
c56251df11 | ||
|
|
fa7005b29a | ||
|
|
2e343bc652 | ||
|
|
905d7d1ac4 | ||
|
|
5f8f05a381 | ||
|
|
2eb8eebf70 | ||
|
|
73c3fcc40b | ||
|
|
aa9c613167 | ||
|
|
2ec1f42a80 | ||
|
|
e399f14a82 | ||
|
|
96049aabcb | ||
|
|
912cc890df | ||
|
|
88501605df | ||
|
|
16ef8a7333 | ||
|
|
aacf54fb8b | ||
|
|
40fc192198 | ||
|
|
de1e19ace9 | ||
|
|
5e701faf90 | ||
|
|
edc78dc659 | ||
|
|
5300392d66 | ||
|
|
4696eee641 | ||
|
|
3258300849 | ||
|
|
5ba36dd1e8 | ||
|
|
d7750430f7 | ||
|
|
e184e09b28 | ||
|
|
6cfd494ed8 | ||
|
|
986b51bdbd | ||
|
|
9a9321d923 | ||
|
|
714c58bbf3 | ||
|
|
a8ea2b429f | ||
|
|
fefef63df4 | ||
|
|
193e2329f2 | ||
|
|
9b13823ca8 | ||
|
|
685d910bbc | ||
|
|
4429600bcc | ||
|
|
6b76bbc7c3 | ||
|
|
a5d98bb379 | ||
|
|
428ca727db | ||
|
|
6c8411ae1c | ||
|
|
c951720282 | ||
|
|
79578aa4a1 | ||
|
|
6b2f25101e | ||
|
|
ef28d848df | ||
|
|
e03fad482b | ||
|
|
4fc1abc20e | ||
|
|
aea371dc7d | ||
|
|
36d2695a41 | ||
|
|
7a5113de21 | ||
|
|
bf7f511ee9 | ||
|
|
a7af9f559e | ||
|
|
866861a06c | ||
|
|
2cfbcafedc | ||
|
|
58a150843f | ||
|
|
ffe7ffc823 | ||
|
|
da3a5ad2e9 | ||
|
|
8411fdd1d2 | ||
|
|
362d8838e8 | ||
|
|
718cfba4dc |
@@ -105,7 +105,7 @@
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"\n",
|
||||
"# TODO Create a model with the folowing layers\n",
|
||||
"# TODO Create a model with the following layers\n",
|
||||
"# 1. Convolutional layer, (input=length 40 and 1 channel, kernel size 3x3, stride 2, padding=\"valid\", 15 output channels ) \n",
|
||||
"# 2. ReLU\n",
|
||||
"# 3. Convolutional layer, (input=length 19 and 15 channels, kernel size 3x3, stride 2, padding=\"valid\", 15 output channels )\n",
|
||||
@@ -120,7 +120,7 @@
|
||||
"# https://pytorch.org/docs/1.13/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear\n",
|
||||
"\n",
|
||||
"# Replace the following function which just runs a standard fully connected network\n",
|
||||
"# The flatten at the beginning is becuase we are passing in the data in a slightly different format.\n",
|
||||
"# The flatten at the beginning is because we are passing in the data in a slightly different format.\n",
|
||||
"model = nn.Sequential(\n",
|
||||
"nn.Flatten(),\n",
|
||||
"nn.Linear(40, 100),\n",
|
||||
|
||||
@@ -148,7 +148,7 @@
|
||||
"# 8. A flattening operation\n",
|
||||
"# 9. A fully connected layer mapping from (whatever dimensions we are at-- find out using .shape) to 50 \n",
|
||||
"# 10. A ReLU\n",
|
||||
"# 11. A fully connected layer mappiing from 50 to 10 dimensions\n",
|
||||
"# 11. A fully connected layer mapping from 50 to 10 dimensions\n",
|
||||
"# 12. A softmax function.\n",
|
||||
"\n",
|
||||
"# Replace this class which implements a minimal network (which still does okay)\n",
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
"source": [
|
||||
"# Gradients II: Backpropagation algorithm\n",
|
||||
"\n",
|
||||
"In this practical, we'll investigate the backpropagation algoritithm. This computes the gradients of the loss with respect to all of the parameters (weights and biases) in the network. We'll use these gradients when we run stochastic gradient descent."
|
||||
"In this practical, we'll investigate the backpropagation algorithm. This computes the gradients of the loss with respect to all of the parameters (weights and biases) in the network. We'll use these gradients when we run stochastic gradient descent."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "L6chybAVFJW2"
|
||||
@@ -53,7 +53,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"First let's define a neural network. We'll just choose the weights and biaes randomly for now"
|
||||
"First let's define a neural network. We'll just choose the weights and biases randomly for now"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "nnUoI0m6GyjC"
|
||||
@@ -178,7 +178,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's define a loss function. We'll just use the least squaures loss function. We'll also write a function to compute dloss_doutpu"
|
||||
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutpu"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "SxVTKp3IcoBF"
|
||||
|
||||
@@ -53,7 +53,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"First let's define a neural network. We'll just choose the weights and biaes randomly for now"
|
||||
"First let's define a neural network. We'll just choose the weights and biases randomly for now"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "nnUoI0m6GyjC"
|
||||
@@ -204,7 +204,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's define a loss function. We'll just use the least squaures loss function. We'll also write a function to compute dloss_doutput\n"
|
||||
"Now let's define a loss function. We'll just use the least squares loss function. We'll also write a function to compute dloss_doutput\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "SxVTKp3IcoBF"
|
||||
|
||||
@@ -176,7 +176,7 @@
|
||||
"# Color represents y value (brighter = higher value)\n",
|
||||
"# Black = -10 or less, White = +10 or more\n",
|
||||
"# 0 = mid orange\n",
|
||||
"# Lines are conoturs where value is equal\n",
|
||||
"# Lines are contours where value is equal\n",
|
||||
"draw_2D_function(x1,x2,y)\n",
|
||||
"\n",
|
||||
"# TODO\n",
|
||||
|
||||
@@ -215,7 +215,7 @@
|
||||
"# Color represents y value (brighter = higher value)\n",
|
||||
"# Black = -10 or less, White = +10 or more\n",
|
||||
"# 0 = mid orange\n",
|
||||
"# Lines are conoturs where value is equal\n",
|
||||
"# Lines are contours where value is equal\n",
|
||||
"draw_2D_function(x1,x2,y)\n",
|
||||
"\n",
|
||||
"# TODO\n",
|
||||
|
||||
@@ -36,7 +36,7 @@
|
||||
"\n",
|
||||
"We'll compute loss functions for maximum likelihood, minimum negative log likelihood, and least squares and show that they all imply that we should use the same parameter values\n",
|
||||
"\n",
|
||||
"In part II, we'll investigate binary classification (where the output data is 0 or 1). This will be based on the Bernouilli distribution\n",
|
||||
"In part II, we'll investigate binary classification (where the output data is 0 or 1). This will be based on the Bernoulli distribution\n",
|
||||
"\n",
|
||||
"In part III we'll investigate multiclass classification (where the output data is 0,1, or, 2). This will be based on the categorical distribution."
|
||||
],
|
||||
@@ -178,7 +178,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The blue line i sthe mean prediction of the model and the gray area represents plus/minus two standardard deviations. This model fits okay, but could be improved. Let's compute the loss. We'll compute the the least squares error, the likelihood, the negative log likelihood."
|
||||
"The blue line is the mean prediction of the model and the gray area represents plus/minus two standard deviations. This model fits okay, but could be improved. Let's compute the loss. We'll compute the the least squares error, the likelihood, the negative log likelihood."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
@@ -276,7 +276,7 @@
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"mu_pred = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"# Set the standard devation to something reasonable\n",
|
||||
"# Set the standard deviation to something reasonable\n",
|
||||
"sigma = 0.2\n",
|
||||
"# Compute the likelihood\n",
|
||||
"likelihood = compute_likelihood(y_train, mu_pred, sigma)\n",
|
||||
@@ -292,7 +292,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of several probabilities, which are all quite small themselves.\n",
|
||||
"This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
|
||||
"\n",
|
||||
"This is why we use negative log likelihood"
|
||||
@@ -326,7 +326,7 @@
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"mu_pred = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"# Set the standard devation to something reasonable\n",
|
||||
"# Set the standard deviation to something reasonable\n",
|
||||
"sigma = 0.2\n",
|
||||
"# Compute the log likelihood\n",
|
||||
"nll = compute_negative_log_likelihood(y_train, mu_pred, sigma)\n",
|
||||
@@ -397,7 +397,7 @@
|
||||
"source": [
|
||||
"# Define a range of values for the parameter\n",
|
||||
"beta_1_vals = np.arange(0,1.0,0.01)\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likehoos and sum of squares\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likelihoods and sum of squares\n",
|
||||
"likelihoods = np.zeros_like(beta_1_vals)\n",
|
||||
"nlls = np.zeros_like(beta_1_vals)\n",
|
||||
"sum_squares = np.zeros_like(beta_1_vals)\n",
|
||||
@@ -482,7 +482,7 @@
|
||||
"source": [
|
||||
"# Define a range of values for the parameter\n",
|
||||
"sigma_vals = np.arange(0.1,0.5,0.005)\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likehoos and sum of squares\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likelihoods and sum of squares\n",
|
||||
"likelihoods = np.zeros_like(sigma_vals)\n",
|
||||
"nlls = np.zeros_like(sigma_vals)\n",
|
||||
"sum_squares = np.zeros_like(sigma_vals)\n",
|
||||
|
||||
@@ -34,7 +34,7 @@
|
||||
"\n",
|
||||
"This practical investigates loss functions. In part I we investigated univariate regression (where the output data $y$ is continuous. Our formulation was based on the normal/Gaussian distribution.\n",
|
||||
"\n",
|
||||
"In this notebook, we investigate binary classification (where the output data is 0 or 1). This will be based on the Bernouilli distribution\n",
|
||||
"In this notebook, we investigate binary classification (where the output data is 0 or 1). This will be based on the Bernoulli distribution\n",
|
||||
"\n",
|
||||
"In part III we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K.\n",
|
||||
"\n",
|
||||
@@ -199,7 +199,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probabiilty, that y=1. The black dots show the training data. We'll compute the the likelihood and the negative log likelihood."
|
||||
"The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1. The black dots show the training data. We'll compute the the likelihood and the negative log likelihood."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
@@ -210,7 +210,7 @@
|
||||
"source": [
|
||||
"# Return probability under Bernoulli distribution for input x\n",
|
||||
"def bernoulli_distribution(y, lambda_param):\n",
|
||||
" # TODO-- write in the equation for the Bernoullid distribution \n",
|
||||
" # TODO-- write in the equation for the Bernoulli distribution \n",
|
||||
" # Equation 5.17 from the notes (you will need np.power)\n",
|
||||
" # Replace the line below\n",
|
||||
" prob = np.zeros_like(y)\n",
|
||||
@@ -249,7 +249,7 @@
|
||||
"source": [
|
||||
"# Return the likelihood of all of the data under the model\n",
|
||||
"def compute_likelihood(y_train, lambda_param):\n",
|
||||
" # TODO -- compute the likelihood of the data -- the product of the Bernoullis probabilities for each data point\n",
|
||||
" # TODO -- compute the likelihood of the data -- the product of the Bernoulli's probabilities for each data point\n",
|
||||
" # Top line of equation 5.3 in the notes\n",
|
||||
" # You will need np.prod() and the bernoulli_distribution function you used above\n",
|
||||
" # Replace the line below\n",
|
||||
@@ -284,7 +284,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of several probabilities, which are all quite small themselves.\n",
|
||||
"This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
|
||||
"\n",
|
||||
"This is why we use negative log likelihood"
|
||||
@@ -317,7 +317,7 @@
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"# Set the standard devation to something reasonable\n",
|
||||
"# Set the standard deviation to something reasonable\n",
|
||||
"lambda_train = sigmoid(model_out)\n",
|
||||
"# Compute the log likelihood\n",
|
||||
"nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
@@ -362,7 +362,7 @@
|
||||
"source": [
|
||||
"# Define a range of values for the parameter\n",
|
||||
"beta_1_vals = np.arange(-2,6.0,0.1)\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likehoods\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likelihoods\n",
|
||||
"likelihoods = np.zeros_like(beta_1_vals)\n",
|
||||
"nlls = np.zeros_like(beta_1_vals)\n",
|
||||
"\n",
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
"# Loss functions part III\n",
|
||||
"\n",
|
||||
"This practical investigates loss functions. In part I we investigated univariate regression (where the output data $y$ is continuous. Our formulation was based on the normal/Gaussian distribution.\n",
|
||||
"In part II we investigated binary classification (where the output data is 0 or 1). This will be based on the Bernouilli distribution.<br><br>\n",
|
||||
"In part II we investigated binary classification (where the output data is 0 or 1). This will be based on the Bernoulli distribution.<br><br>\n",
|
||||
"\n",
|
||||
"Now we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K, which is based on the categorical distribution\n",
|
||||
"\n",
|
||||
@@ -218,7 +218,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probabiilty, that y=0 (red), 1 (green) and 2 (blue) The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue) The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dotsmand the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
@@ -228,7 +228,7 @@
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Return probability under Bernoulli distribution for input x\n",
|
||||
"# Complicated code to commpute it but just take value from row k of lambda param where y =k, \n",
|
||||
"# Complicated code to compute it but just take value from row k of lambda param where y =k, \n",
|
||||
"def categorical_distribution(y, lambda_param):\n",
|
||||
" prob = np.zeros_like(y)\n",
|
||||
" for row_index in range(lambda_param.shape[0]):\n",
|
||||
@@ -305,7 +305,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of sveral probabilities, which are all quite small themselves.\n",
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of several probabilities, which are all quite small themselves.\n",
|
||||
"This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
|
||||
"\n",
|
||||
"This is why we use negative log likelihood"
|
||||
@@ -338,7 +338,7 @@
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
"# Use our neural network to predict the mean of the Gaussian\n",
|
||||
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"# Set the standard devation to something reasonable\n",
|
||||
"# Set the standard deviation to something reasonable\n",
|
||||
"lambda_train = softmax(model_out)\n",
|
||||
"# Compute the log likelihood\n",
|
||||
"nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
@@ -365,7 +365,7 @@
|
||||
"source": [
|
||||
"# Define a range of values for the parameter\n",
|
||||
"beta_1_vals = np.arange(-2,6.0,0.1)\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likehoods\n",
|
||||
"# Create some arrays to store the likelihoods, negative log likelihoods\n",
|
||||
"likelihoods = np.zeros_like(beta_1_vals)\n",
|
||||
"nlls = np.zeros_like(beta_1_vals)\n",
|
||||
"\n",
|
||||
|
||||
@@ -233,7 +233,7 @@
|
||||
"# TODO\n",
|
||||
"# 1. Predict what effect changing phi_0 will have on the network. \n",
|
||||
"# Answer:\n",
|
||||
"# 2. Predict what effect multplying phi_1, phi_2, phi_3 by 0.5 would have. Check if you are correct\n",
|
||||
"# 2. Predict what effect multiplying phi_1, phi_2, phi_3 by 0.5 would have. Check if you are correct\n",
|
||||
"# Answer:\n",
|
||||
"# 3. Predict what effect multiplying phi_1 by -1 will have. Check if you are correct.\n",
|
||||
"# Answer:\n",
|
||||
@@ -500,7 +500,7 @@
|
||||
"print(\"Loss = %3.3f\"%(loss))\n",
|
||||
"\n",
|
||||
"# TODO. Manipulate the parameters (by hand!) to make the function \n",
|
||||
"# fit the data better and try to reduct the loss to as small a number \n",
|
||||
"# fit the data better and try to reduce the loss to as small a number \n",
|
||||
"# as possible. The best that I could do was 0.181\n",
|
||||
"# Tip... start by manipulating phi_0.\n",
|
||||
"# It's not that easy, so don't spend too much time on this!"
|
||||
|
||||
@@ -108,7 +108,7 @@
|
||||
"source": [
|
||||
"def line_search(loss_function, thresh=.0001, max_iter = 10, draw_flag = False):\n",
|
||||
"\n",
|
||||
" # Initialize four points along the rnage we are going to search\n",
|
||||
" # Initialize four points along the range we are going to search\n",
|
||||
" a = 0\n",
|
||||
" b = 0.33\n",
|
||||
" c = 0.66\n",
|
||||
@@ -139,7 +139,7 @@
|
||||
" # Rule #2 If point b is less than point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c beocome 2/3 between a and new d \n",
|
||||
" # point c becomes 2/3 between a and new d \n",
|
||||
" # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
|
||||
" if (0):\n",
|
||||
" continue;\n",
|
||||
@@ -147,7 +147,7 @@
|
||||
" # Rule #3 If point c is less than point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c beocome 2/3 between new a and d \n",
|
||||
" # point c becomes 2/3 between new a and d \n",
|
||||
" # TODO REPLACE THE BLOCK OF CODE BELOW WITH THIS RULE\n",
|
||||
" if(0):\n",
|
||||
" continue\n",
|
||||
|
||||
@@ -114,7 +114,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize the parmaeters and draw the model\n",
|
||||
"# Initialize the parameters and draw the model\n",
|
||||
"phi = np.zeros((2,1))\n",
|
||||
"phi[0] = 0.6 # Intercept\n",
|
||||
"phi[1] = -0.2 # Slope\n",
|
||||
@@ -314,7 +314,7 @@
|
||||
" return compute_loss(data[0,:], data[1,:], model, phi_start+ gradient * dist_prop)\n",
|
||||
"\n",
|
||||
"def line_search(data, model, phi, gradient, thresh=.00001, max_dist = 0.1, max_iter = 15, verbose=False):\n",
|
||||
" # Initialize four points along the rnage we are going to search\n",
|
||||
" # Initialize four points along the range we are going to search\n",
|
||||
" a = 0\n",
|
||||
" b = 0.33 * max_dist\n",
|
||||
" c = 0.66 * max_dist\n",
|
||||
@@ -345,7 +345,7 @@
|
||||
" # Rule #2 If point b is less than point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c beocome 2/3 between a and new d \n",
|
||||
" # point c becomes 2/3 between a and new d \n",
|
||||
" if lossb < lossc:\n",
|
||||
" d = c\n",
|
||||
" b = a+ (d-a)/3\n",
|
||||
@@ -355,7 +355,7 @@
|
||||
" # Rule #2 If point c is less than point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c beocome 2/3 between new a and d \n",
|
||||
" # point c becomes 2/3 between new a and d \n",
|
||||
" a = b\n",
|
||||
" b = a+ (d-a)/3\n",
|
||||
" c = a+ 2*(d-a)/3\n",
|
||||
|
||||
@@ -340,7 +340,7 @@
|
||||
" return compute_loss(data[0,:], data[1,:], model, phi_start+ gradient * dist_prop)\n",
|
||||
"\n",
|
||||
"def line_search(data, model, phi, gradient, thresh=.00001, max_dist = 0.1, max_iter = 15, verbose=False):\n",
|
||||
" # Initialize four points along the rnage we are going to search\n",
|
||||
" # Initialize four points along the range we are going to search\n",
|
||||
" a = 0\n",
|
||||
" b = 0.33 * max_dist\n",
|
||||
" c = 0.66 * max_dist\n",
|
||||
@@ -371,7 +371,7 @@
|
||||
" # Rule #2 If point b is less than point c then\n",
|
||||
" # then point d becomes point c, and\n",
|
||||
" # point b becomes 1/3 between a and new d\n",
|
||||
" # point c beocome 2/3 between a and new d \n",
|
||||
" # point c becomes 2/3 between a and new d \n",
|
||||
" if lossb < lossc:\n",
|
||||
" d = c\n",
|
||||
" b = a+ (d-a)/3\n",
|
||||
@@ -381,7 +381,7 @@
|
||||
" # Rule #2 If point c is less than point b then\n",
|
||||
" # then point a becomes point b, and\n",
|
||||
" # point b becomes 1/3 between new a and d\n",
|
||||
" # point c beocome 2/3 between new a and d \n",
|
||||
" # point c becomes 2/3 between new a and d \n",
|
||||
" a = b\n",
|
||||
" b = a+ (d-a)/3\n",
|
||||
" c = a+ 2*(d-a)/3\n",
|
||||
|
||||
@@ -175,7 +175,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# TODO Modify the code below by changeing the number of tokens generated and the initial sentence\n",
|
||||
"# TODO Modify the code below by changing the number of tokens generated and the initial sentence\n",
|
||||
"# to get a feel for how well this works. Since I didn't reset the seed, it will give a different\n",
|
||||
"# answer every time that you run it.\n",
|
||||
"\n",
|
||||
@@ -253,7 +253,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# TODO Modify the code below by changeing the number of tokens generated and the initial sentence\n",
|
||||
"# TODO Modify the code below by changing the number of tokens generated and the initial sentence\n",
|
||||
"# to get a feel for how well this works. \n",
|
||||
"\n",
|
||||
"# TODO Experiment with changing this line:\n",
|
||||
@@ -471,7 +471,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# This routine reutnrs the k'th most likely next token.\n",
|
||||
"# This routine returns the k'th most likely next token.\n",
|
||||
"# If k =0 then it returns the most likely token, if k=1 it returns the next most likely and so on\n",
|
||||
"# We will need this for beam search\n",
|
||||
"def get_kth_most_likely_token(input_tokens, model, tokenizer, k):\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyNNnZyVCX9glFJGIC8BwtVT",
|
||||
"authorship_tag": "ABX9TyMrWYwQrwgJvDza1vhYK9WQ",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -139,7 +139,7 @@
|
||||
"source": [
|
||||
"def volume_of_hypersphere(diameter, dimensions):\n",
|
||||
" # Formula given in Problem 8.7 of the notes\n",
|
||||
" # You will need sci.special.gamma()\n",
|
||||
" # You will need sci.gamma()\n",
|
||||
" # Check out: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gamma.html\n",
|
||||
" # Also use this value for pi\n",
|
||||
" pi = np.pi\n",
|
||||
|
||||
@@ -83,7 +83,7 @@
|
||||
"source": [
|
||||
"# Plot the 1D linear function\n",
|
||||
"\n",
|
||||
"# Define an array of x values from 0 to 10 with increments of 0.1\n",
|
||||
"# Define an array of x values from 0 to 10 with increments of 0.01\n",
|
||||
"# https://numpy.org/doc/stable/reference/generated/numpy.arange.html\n",
|
||||
"x = np.arange(0.0,10.0, 0.01)\n",
|
||||
"# Compute y using the function you filled in above\n",
|
||||
@@ -171,7 +171,7 @@
|
||||
"# Color represents y value (brighter = higher value)\n",
|
||||
"# Black = -10 or less, White = +10 or more\n",
|
||||
"# 0 = mid orange\n",
|
||||
"# Lines are conoturs where value is equal\n",
|
||||
"# Lines are contours where value is equal\n",
|
||||
"draw_2D_function(x1,x2,y)\n",
|
||||
"\n",
|
||||
"# TODO\n",
|
||||
@@ -308,7 +308,7 @@
|
||||
"source": [
|
||||
"# Draw the exponential function\n",
|
||||
"\n",
|
||||
"# Define an array of x values from -5 to 5 with increments of 0.1\n",
|
||||
"# Define an array of x values from -5 to 5 with increments of 0.01\n",
|
||||
"x = np.arange(-5.0,5.0, 0.01)\n",
|
||||
"y = np.exp(x) ;\n",
|
||||
"\n",
|
||||
@@ -354,7 +354,7 @@
|
||||
"source": [
|
||||
"# Draw the logarithm function\n",
|
||||
"\n",
|
||||
"# Define an array of x values from -5 to 5 with increments of 0.1\n",
|
||||
"# Define an array of x values from -5 to 5 with increments of 0.01\n",
|
||||
"x = np.arange(0.01,5.0, 0.01)\n",
|
||||
"y = np.log(x) ;\n",
|
||||
"\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyM+98aMABiK5vNFFYAwiPiL",
|
||||
"authorship_tag": "ABX9TyPBNztJrxnUt1ELWfm1Awa3",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -97,7 +97,7 @@
|
||||
"ax.set_xlim([-5,5]);ax.set_ylim([-5,5])\n",
|
||||
"ax.set_xlabel('z'); ax.set_ylabel('ReLU[z]')\n",
|
||||
"ax.set_aspect('equal')\n",
|
||||
"plt.show"
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "okwJmSw9pVNF"
|
||||
@@ -226,7 +226,7 @@
|
||||
"source": [
|
||||
"Now let's play with the parameters to make sure we understand how they work. The original parameters were:\n",
|
||||
"\n",
|
||||
"$\\theta_{10} = 0.3$ ; $\\theta_{20} = -1.0$<br>\n",
|
||||
"$\\theta_{10} = 0.3$ ; $\\theta_{11} = -1.0$<br>\n",
|
||||
"$\\theta_{20} = -1.0$ ; $\\theta_{21} = 2.0$<br>\n",
|
||||
"$\\theta_{30} = -0.5$ ; $\\theta_{31} = 0.65$<br>\n",
|
||||
"$\\phi_0 = -0.3; \\phi_1 = 2.0; \\phi_2 = -1.0; \\phi_3 = 7.0$"
|
||||
|
||||
@@ -182,7 +182,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now we'll extend this model to have two outputs $y_1$ and $y_2$, each of which can be visualized with a separate heatmap. You will now have sets of parameters $\\phi_{10}, \\phi_{11},\\phi_{12}$ and $\\phi_{20}, \\phi_{21},\\phi_{22}$ that correspond to each of these outputs."
|
||||
"Now we'll extend this model to have two outputs $y_1$ and $y_2$, each of which can be visualized with a separate heatmap. You will now have sets of parameters $\\phi_{10}, \\phi_{11}, \\phi_{12}$, $\\phi_{13} and $\\phi_{20}, \\phi_{21}, \\phi_{22}$, \\phi_{23}$ that correspond to each of these outputs."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "Xl6LcrUyM7Lh"
|
||||
|
||||
@@ -48,7 +48,7 @@
|
||||
"import numpy as np\n",
|
||||
"# Imports plotting library\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"# Imports math libray\n",
|
||||
"# Imports math library\n",
|
||||
"import math"
|
||||
],
|
||||
"metadata": {
|
||||
@@ -79,7 +79,7 @@
|
||||
"source": [
|
||||
"def number_regions(Di, D):\n",
|
||||
" # TODO -- implement Zaslavsky's formula\n",
|
||||
" # You can use math.com() https://www.w3schools.com/python/ref_math_comb.asp\n",
|
||||
" # You can use math.comb() https://www.w3schools.com/python/ref_math_comb.asp\n",
|
||||
" # Replace this code\n",
|
||||
" N = 1;\n",
|
||||
"\n",
|
||||
@@ -102,7 +102,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Calculate the number of regions for 10D input (Di=2) and 50 hidden units (D=50)\n",
|
||||
"# Calculate the number of regions for 10D input (Di=10) and 50 hidden units (D=50)\n",
|
||||
"N = number_regions(10, 50)\n",
|
||||
"print(f\"Di=10, D=50, Number of regions = {int(N)}, True value = 13432735556\")"
|
||||
],
|
||||
@@ -126,7 +126,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Show that calculation fails when $D_i < D$\n",
|
||||
"# Depending on how you implemented it, the calculation may fail when $D_i > D$ (not to worry...)\n",
|
||||
"try:\n",
|
||||
" N = number_regions(10, 8)\n",
|
||||
" print(f\"Di=10, D=8, Number of regions = {int(N)}, True value = 256\")\n",
|
||||
@@ -256,4 +256,4 @@
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyPmra+JD+dm2M3gCqx3bMak",
|
||||
"authorship_tag": "ABX9TyOmxhh3ymYWX+1HdZ91I6zU",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -223,7 +223,7 @@
|
||||
"ax.plot(z,sig_z,'r-')\n",
|
||||
"ax.set_xlim([-1,1]);ax.set_ylim([0,1])\n",
|
||||
"ax.set_xlabel('z'); ax.set_ylabel('sig[z]')\n",
|
||||
"plt.show"
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "94HIXKJH97ve"
|
||||
@@ -318,7 +318,7 @@
|
||||
"ax.plot(z,heav_z,'r-')\n",
|
||||
"ax.set_xlim([-1,1]);ax.set_ylim([-2,2])\n",
|
||||
"ax.set_xlabel('z'); ax.set_ylabel('heaviside[z]')\n",
|
||||
"plt.show"
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "mSPyp7iA-44H"
|
||||
|
||||
@@ -539,8 +539,8 @@
|
||||
"# Hopefully, you can see that the maximum of the likelihood fn is at the same position as the minimum negative log likelihood\n",
|
||||
"# The least squares solution does not depend on sigma, so it's just flat -- no use here.\n",
|
||||
"# Let's check that:\n",
|
||||
"print(\"Maximum likelihood = %3.3f, at beta_1=%3.3f\"%( (likelihoods[np.argmax(likelihoods)],sigma_vals[np.argmax(likelihoods)])))\n",
|
||||
"print(\"Minimum negative log likelihood = %3.3f, at beta_1=%3.3f\"%( (nlls[np.argmin(nlls)],sigma_vals[np.argmin(nlls)])))\n",
|
||||
"print(\"Maximum likelihood = %3.3f, at sigma=%3.3f\"%( (likelihoods[np.argmax(likelihoods)],sigma_vals[np.argmax(likelihoods)])))\n",
|
||||
"print(\"Minimum negative log likelihood = %3.3f, at sigma=%3.3f\"%( (nlls[np.argmin(nlls)],sigma_vals[np.argmin(nlls)])))\n",
|
||||
"# Plot the best model\n",
|
||||
"sigma= sigma_vals[np.argmin(nlls)]\n",
|
||||
"y_model = shallow_nn(x_model, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
@@ -564,4 +564,4 @@
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyPXPDEQiwNw+kYhWfg4kjz6",
|
||||
"authorship_tag": "ABX9TyPAKqlf9VxztHXKylyJwqe8",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -145,7 +145,7 @@
|
||||
"source": [
|
||||
"def volume_of_hypersphere(diameter, dimensions):\n",
|
||||
" # Formula given in Problem 8.7 of the book\n",
|
||||
" # You will need sci.special.gamma()\n",
|
||||
" # You will need sci.gamma()\n",
|
||||
" # Check out: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gamma.html\n",
|
||||
" # Also use this value for pi\n",
|
||||
" pi = np.pi\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyPTidpnPhn4O5QF011gt0cz",
|
||||
"authorship_tag": "ABX9TyML7rfAGE4gvmNUEiK5x3PS",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -41,6 +41,17 @@
|
||||
"id": "el8l05WQEO46"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"NOTE!!\n",
|
||||
"\n",
|
||||
"If you have the first edition of the printed book, it mistakenly refers to a convolutional filter with no spaces between the elements (i.e. a normal filter without dilation) as having dilation zero. Actually, the convention is (weirdly) that this has dilation one. And when there is one space between the elements, this is dilation two. This notebook reflects the correct convention and so will be out of sync with the printed book. If this is confusing, check the [errata](https://github.com/udlbook/udlbook/blob/main/UDL_Errata.pdf) document."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ggQrHkFZcUiV"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
@@ -50,7 +61,7 @@
|
||||
"metadata": {
|
||||
"id": "nw7k5yCtOzoK"
|
||||
},
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyN1v/yg9PtdSVOWlYJ7bgkz",
|
||||
"authorship_tag": "ABX9TyNJodaaCLMRWL9vTl8B/iLI",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -141,6 +141,9 @@
|
||||
"# https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html\n",
|
||||
"# https://pytorch.org/docs/1.13/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear\n",
|
||||
"\n",
|
||||
"# NOTE THAT THE CONVOLUTIONAL LAYERS NEED TO TAKE THE NUMBER OF INPUT CHANNELS AS A PARAMETER\n",
|
||||
"# AND NOT THE INPUT SIZE.\n",
|
||||
"\n",
|
||||
"# Replace the following function:\n",
|
||||
"model = nn.Sequential(\n",
|
||||
"nn.Flatten(),\n",
|
||||
@@ -185,9 +188,9 @@
|
||||
"scheduler = StepLR(optimizer, step_size=20, gamma=0.5)\n",
|
||||
"# create 100 dummy data points and store in data loader class\n",
|
||||
"x_train = torch.tensor(train_data_x.transpose().astype('float32'))\n",
|
||||
"y_train = torch.tensor(train_data_y.astype('long'))\n",
|
||||
"y_train = torch.tensor(train_data_y.astype('long')).long()\n",
|
||||
"x_val= torch.tensor(val_data_x.transpose().astype('float32'))\n",
|
||||
"y_val = torch.tensor(val_data_y.astype('long'))\n",
|
||||
"y_val = torch.tensor(val_data_y.astype('long')).long()\n",
|
||||
"\n",
|
||||
"# load the data into a class that creates the batches\n",
|
||||
"data_loader = DataLoader(TensorDataset(x_train,y_train), batch_size=100, shuffle=True, worker_init_fn=np.random.seed(1))\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyMmbD0cKYvIHXbKX4AupA1x",
|
||||
"authorship_tag": "ABX9TyNDaU2KKZDyY9Ea7vm/fNxo",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -114,6 +114,11 @@
|
||||
" # Create output\n",
|
||||
" out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)\n",
|
||||
"\n",
|
||||
" # !!!!!! NOTE THERE IS A SUBTLETY HERE !!!!!!!!\n",
|
||||
" # I have padded the image with zeros above, so it is surrouned by a \"ring\" of zeros\n",
|
||||
" # That means that the image indexes are all off by one\n",
|
||||
" # This actually makes your code simpler\n",
|
||||
"\n",
|
||||
" for c_y in range(imageHeightOut):\n",
|
||||
" for c_x in range(imageWidthOut):\n",
|
||||
" for c_kernel_y in range(kernelHeight):\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyOZaNcBrdZ9yCHhjLOwSi69",
|
||||
"authorship_tag": "ABX9TyPVeAd3eDpEOCFh8CVyr1zz",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -267,7 +267,7 @@
|
||||
"# Use the torch function nn.BatchNorm1d\n",
|
||||
"class ResidualNetworkWithBatchNorm(torch.nn.Module):\n",
|
||||
" def __init__(self, input_size, output_size, hidden_size=100):\n",
|
||||
" super(ResidualNetwork, self).__init__()\n",
|
||||
" super(ResidualNetworkWithBatchNorm, self).__init__()\n",
|
||||
" self.linear1 = nn.Linear(input_size, hidden_size)\n",
|
||||
" self.linear2 = nn.Linear(hidden_size, hidden_size)\n",
|
||||
" self.linear3 = nn.Linear(hidden_size, hidden_size)\n",
|
||||
|
||||
@@ -31,7 +31,7 @@
|
||||
"source": [
|
||||
"# **Notebook 12.1: Self Attention**\n",
|
||||
"\n",
|
||||
"This notebook builds a self-attnetion mechanism from scratch, as discussed in section 12.2 of the book.\n",
|
||||
"This notebook builds a self-attention mechanism from scratch, as discussed in section 12.2 of the book.\n",
|
||||
"\n",
|
||||
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||
"\n",
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyNPrHfkLWjy3NfDHRhGG3IE",
|
||||
"authorship_tag": "ABX9TyPsZjfqVeHYh95Hzt+hCIO7",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -409,7 +409,7 @@
|
||||
" print(\"Choosing from %d tokens\"%(thresh_index))\n",
|
||||
" # TODO: Find the probability value to threshold\n",
|
||||
" # Replace this line:\n",
|
||||
" thresh_prob = sorted_probs_decreasing[thresh_index]\n",
|
||||
" thresh_prob = 0.5\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,26 +1,10 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyMpC8kgLnXx0XQBtwNAQ4jJ",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
"colab_type": "text",
|
||||
"id": "view-in-github"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap18/18_1_Diffusion_Encoder.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
@@ -28,6 +12,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "t9vk9Elugvmi"
|
||||
},
|
||||
"source": [
|
||||
"# **Notebook 18.1: Diffusion Encoder**\n",
|
||||
"\n",
|
||||
@@ -36,27 +23,29 @@
|
||||
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||
"\n",
|
||||
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "t9vk9Elugvmi"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "OLComQyvCIJ7"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from matplotlib.colors import ListedColormap\n",
|
||||
"from operator import itemgetter"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "OLComQyvCIJ7"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "4PM8bf6lO0VE"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Create pretty colormap as in book\n",
|
||||
"my_colormap_vals_hex =('2a0902', '2b0a03', '2c0b04', '2d0c05', '2e0c06', '2f0d07', '300d08', '310e09', '320f0a', '330f0b', '34100b', '35110c', '36110d', '37120e', '38120f', '39130f', '3a1410', '3b1411', '3c1511', '3d1612', '3e1613', '3f1713', '401714', '411814', '421915', '431915', '451a16', '461b16', '471b17', '481c17', '491d18', '4a1d18', '4b1e19', '4c1f19', '4d1f1a', '4e201b', '50211b', '51211c', '52221c', '53231d', '54231d', '55241e', '56251e', '57261f', '58261f', '592720', '5b2821', '5c2821', '5d2922', '5e2a22', '5f2b23', '602b23', '612c24', '622d25', '632e25', '652e26', '662f26', '673027', '683027', '693128', '6a3229', '6b3329', '6c342a', '6d342a', '6f352b', '70362c', '71372c', '72372d', '73382e', '74392e', '753a2f', '763a2f', '773b30', '783c31', '7a3d31', '7b3e32', '7c3e33', '7d3f33', '7e4034', '7f4134', '804235', '814236', '824336', '834437', '854538', '864638', '874739', '88473a', '89483a', '8a493b', '8b4a3c', '8c4b3c', '8d4c3d', '8e4c3e', '8f4d3f', '904e3f', '924f40', '935041', '945141', '955242', '965343', '975343', '985444', '995545', '9a5646', '9b5746', '9c5847', '9d5948', '9e5a49', '9f5a49', 'a05b4a', 'a15c4b', 'a35d4b', 'a45e4c', 'a55f4d', 'a6604e', 'a7614e', 'a8624f', 'a96350', 'aa6451', 'ab6552', 'ac6552', 'ad6653', 'ae6754', 'af6855', 'b06955', 'b16a56', 'b26b57', 'b36c58', 'b46d59', 'b56e59', 'b66f5a', 'b7705b', 'b8715c', 'b9725d', 'ba735d', 'bb745e', 'bc755f', 'bd7660', 'be7761', 'bf7862', 'c07962', 'c17a63', 'c27b64', 'c27c65', 'c37d66', 'c47e67', 'c57f68', 'c68068', 'c78169', 'c8826a', 'c9836b', 'ca846c', 'cb856d', 'cc866e', 'cd876f', 'ce886f', 'ce8970', 'cf8a71', 'd08b72', 'd18c73', 'd28d74', 'd38e75', 'd48f76', 'd59077', 'd59178', 'd69279', 'd7937a', 'd8957b', 'd9967b', 'da977c', 'da987d', 'db997e', 'dc9a7f', 'dd9b80', 'de9c81', 'de9d82', 'df9e83', 'e09f84', 'e1a185', 'e2a286', 'e2a387', 'e3a488', 'e4a589', 'e5a68a', 'e5a78b', 'e6a88c', 'e7aa8d', 'e7ab8e', 'e8ac8f', 'e9ad90', 'eaae91', 'eaaf92', 'ebb093', 'ecb295', 'ecb396', 'edb497', 'eeb598', 'eeb699', 'efb79a', 'efb99b', 'f0ba9c', 'f1bb9d', 'f1bc9e', 'f2bd9f', 'f2bfa1', 'f3c0a2', 'f3c1a3', 'f4c2a4', 'f5c3a5', 'f5c5a6', 'f6c6a7', 'f6c7a8', 'f7c8aa', 'f7c9ab', 'f8cbac', 'f8ccad', 'f8cdae', 'f9ceb0', 'f9d0b1', 'fad1b2', 'fad2b3', 'fbd3b4', 'fbd5b6', 'fbd6b7', 'fcd7b8', 'fcd8b9', 'fcdaba', 'fddbbc', 'fddcbd', 'fddebe', 'fddfbf', 'fee0c1', 'fee1c2', 'fee3c3', 'fee4c5', 'ffe5c6', 'ffe7c7', 'ffe8c9', 'ffe9ca', 'ffebcb', 'ffeccd', 'ffedce', 'ffefcf', 'fff0d1', 'fff2d2', 'fff3d3', 'fff4d5', 'fff6d6', 'fff7d8', 'fff8d9', 'fffada', 'fffbdc', 'fffcdd', 'fffedf', 'ffffe0')\n",
|
||||
@@ -66,28 +55,28 @@
|
||||
"b = np.floor(my_colormap_vals_dec - r * 256 *256 - g * 256)\n",
|
||||
"my_colormap_vals = np.vstack((r,g,b)).transpose()/255.0\n",
|
||||
"my_colormap = ListedColormap(my_colormap_vals)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "4PM8bf6lO0VE"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "ONGRaQscfIOo"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Probability distribution for normal\n",
|
||||
"def norm_pdf(x, mu, sigma):\n",
|
||||
" return np.exp(-0.5 * (x-mu) * (x-mu) / (sigma * sigma)) / np.sqrt(2*np.pi*sigma*sigma)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ONGRaQscfIOo"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "gZvG0MKhfY8Y"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# True distribution is a mixture of four Gaussians\n",
|
||||
"class TrueDataDistribution:\n",
|
||||
@@ -108,15 +97,15 @@
|
||||
" mu_list = list(itemgetter(*hidden)(self.mu))\n",
|
||||
" sigma_list = list(itemgetter(*hidden)(self.sigma))\n",
|
||||
" return mu_list + sigma_list * epsilon"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "gZvG0MKhfY8Y"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "qXmej3TUuQyp"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define ground truth probability distribution that we will model\n",
|
||||
"true_dist = TrueDataDistribution()\n",
|
||||
@@ -130,24 +119,24 @@
|
||||
"ax.set_ylim(0,1.0)\n",
|
||||
"ax.set_xlim(-3,3)\n",
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "qXmej3TUuQyp"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's first implement the forward process"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "XHdtfRP47YLy"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Let's first implement the forward process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "hkApJ2VJlQuk"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Do one step of diffusion (equation 18.1)\n",
|
||||
"def diffuse_one_step(z_t_minus_1, beta_t):\n",
|
||||
@@ -157,24 +146,24 @@
|
||||
" z_t = np.zeros_like(z_t_minus_1)\n",
|
||||
"\n",
|
||||
" return z_t"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hkApJ2VJlQuk"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's run the diffusion process for a whole bunch of samples"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ECAUfHNi9NVW"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Now let's run the diffusion process for a whole bunch of samples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "M-TY5w9Q8LYW"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Generate some samples\n",
|
||||
"n_sample = 10000\n",
|
||||
@@ -192,24 +181,24 @@
|
||||
"\n",
|
||||
"for t in range(T):\n",
|
||||
" samples[t+1,:] = diffuse_one_step(samples[t,:], beta)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "M-TY5w9Q8LYW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's, plot the evolution of a few paths as in figure 18.2"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "jYrAW6tN-gJ4"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Let's, plot the evolution of a few paths as in figure 18.2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "4XU6CDZC_kFo"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig, ax = plt.subplots()\n",
|
||||
"t_vals = np.arange(0,101,1)\n",
|
||||
@@ -223,24 +212,24 @@
|
||||
"ax.set_xlabel('value')\n",
|
||||
"ax.set_ylabel('z_{t}')\n",
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "4XU6CDZC_kFo"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Notice that the samples have a tendency to move toward the center. Now let's look at the histogram of the samples at each stage"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "SGTYGGevAktz"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Notice that the samples have a tendency to move toward the center. Now let's look at the histogram of the samples at each stage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "bn5E5NzL-evM"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def draw_hist(z_t,title=''):\n",
|
||||
" fig, ax = plt.subplots()\n",
|
||||
@@ -248,17 +237,17 @@
|
||||
" plt.hist(z_t , bins=np.arange(-3,3, 0.1), density = True)\n",
|
||||
" ax.set_xlim([-3,3])\n",
|
||||
" ax.set_ylim([0,1.0])\n",
|
||||
" ax.set_title('title')\n",
|
||||
" ax.set_title(title)\n",
|
||||
" plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bn5E5NzL-evM"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "pn_XD-EhBlwk"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"draw_hist(samples[0,:],'Original data')\n",
|
||||
"draw_hist(samples[5,:],'Time step 5')\n",
|
||||
@@ -267,33 +256,33 @@
|
||||
"draw_hist(samples[40,:],'Time step 40')\n",
|
||||
"draw_hist(samples[80,:],'Time step 80')\n",
|
||||
"draw_hist(samples[100,:],'Time step 100')"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "pn_XD-EhBlwk"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"You can clearly see that as the diffusion process continues, the data becomes more Gaussian."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "skuLfGl5Czf4"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You can clearly see that as the diffusion process continues, the data becomes more Gaussian."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's investigate the diffusion kernel as in figure 18.3 of the book.\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "s37CBSzzK7wh"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Now let's investigate the diffusion kernel as in figure 18.3 of the book.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "vL62Iym0LEtY"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def diffusion_kernel(x, t, beta):\n",
|
||||
" # TODO -- write this function\n",
|
||||
@@ -301,15 +290,15 @@
|
||||
" dk_mean = 0.0 ; dk_std = 1.0\n",
|
||||
"\n",
|
||||
" return dk_mean, dk_std"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "vL62Iym0LEtY"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "KtP1KF8wMh8o"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def draw_prob_dist(x_plot_vals, prob_dist, title=''):\n",
|
||||
" fig, ax = plt.subplots()\n",
|
||||
@@ -363,47 +352,47 @@
|
||||
" draw_prob_dist(x_plot_vals, diffusion_kernels[20,:],'$q(z_{20}|x)$')\n",
|
||||
" draw_prob_dist(x_plot_vals, diffusion_kernels[40,:],'$q(z_{40}|x)$')\n",
|
||||
" draw_prob_dist(x_plot_vals, diffusion_kernels[80,:],'$q(z_{80}|x)$')"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "KtP1KF8wMh8o"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"x = -2\n",
|
||||
"compute_and_plot_diffusion_kernels(x, T, beta, my_colormap)"
|
||||
],
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "g8TcI5wtRQsx"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = -2\n",
|
||||
"compute_and_plot_diffusion_kernels(x, T, beta, my_colormap)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO -- Run this for different version of $x$ and check that you understand how the graphs change"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "-RuN2lR28-hK"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"TODO -- Run this for different version of $x$ and check that you understand how the graphs change"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "n-x6Whz2J_zy"
|
||||
},
|
||||
"source": [
|
||||
"Finally, let's estimate the marginal distributions empirically and visualize them as in figure 18.4 of the book. This is only tractable because the data is in one dimension and we know the original distribution.\n",
|
||||
"\n",
|
||||
"The marginal distribution at time t is the sum of the diffusion kernels for each position x, weighted by the probability of seeing that value of x in the true distribution."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "n-x6Whz2J_zy"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "YzN5duYpg7C-"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def diffusion_marginal(x_plot_vals, pr_x_true, t, beta):\n",
|
||||
" # If time is zero then marginal is just original distribution\n",
|
||||
@@ -418,7 +407,7 @@
|
||||
" # 1. For each x (value in x_plot_vals):\n",
|
||||
" # 2. Compute the mean and variance of the diffusion kernel at time t\n",
|
||||
" # 3. Compute pdf of this Gaussian at every x_plot_val\n",
|
||||
" # 4. Weight Gaussian by probability at position x and by 0.01 to compensate for bin size\n",
|
||||
" # 4. Weight Gaussian by probability at position x and by 0.01 to componensate for bin size\n",
|
||||
" # 5. Accumulate weighted Gaussian in marginal at time t.\n",
|
||||
" # 6. Multiply result by 0.01 to compensate for bin size\n",
|
||||
" # Replace this line:\n",
|
||||
@@ -427,15 +416,15 @@
|
||||
"\n",
|
||||
"\n",
|
||||
" return marginal_at_time_t"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "YzN5duYpg7C-"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "OgEU9sxjRaeO"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x_plot_vals = np.arange(-3,3,0.01)\n",
|
||||
"marginal_distributions = np.zeros((T+1,len(x_plot_vals)))\n",
|
||||
@@ -460,12 +449,23 @@
|
||||
"draw_prob_dist(x_plot_vals, marginal_distributions[0,:],'$q(z_{0})$')\n",
|
||||
"draw_prob_dist(x_plot_vals, marginal_distributions[20,:],'$q(z_{20})$')\n",
|
||||
"draw_prob_dist(x_plot_vals, marginal_distributions[60,:],'$q(z_{60})$')"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "OgEU9sxjRaeO"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"authorship_tag": "ABX9TyMpC8kgLnXx0XQBtwNAQ4jJ",
|
||||
"include_colab_link": true,
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
|
||||
@@ -31,7 +31,7 @@
|
||||
"source": [
|
||||
"# **Notebook 21.1: Bias mitigation**\n",
|
||||
"\n",
|
||||
"This notebook investigates a post-processing method for bias mitigation (see figure 21.2 in the book). It based on this [blog](https://www.borealisai.com/research-blogs/tutorial1-bias-and-fairness-ai/) that I wrote for Borealis AI in 2019, which itself was derirved from [this blog](https://research.google.com/bigpicture/attacking-discrimination-in-ml/) by Wattenberg, Viégas, and Hardt.\n",
|
||||
"This notebook investigates a post-processing method for bias mitigation (see figure 21.2 in the book). It based on this [blog](https://www.borealisai.com/research-blogs/tutorial1-bias-and-fairness-ai/) that I wrote for Borealis AI in 2019, which itself was derived from [this blog](https://research.google.com/bigpicture/attacking-discrimination-in-ml/) by Wattenberg, Viégas, and Hardt.\n",
|
||||
"\n",
|
||||
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||
"\n",
|
||||
@@ -172,7 +172,7 @@
|
||||
"source": [
|
||||
"# Blindness to protected attribute\n",
|
||||
"\n",
|
||||
"We'll first do the simplest possible thing. We'll choose the same threshold for both blue and yellow populations so that $\\tau_0$ = $\\tau_1$. Basically, we'll ingore what we know about the group membership. Let's see what the ramifications of that."
|
||||
"We'll first do the simplest possible thing. We'll choose the same threshold for both blue and yellow populations so that $\\tau_0$ = $\\tau_1$. Basically, we'll ignore what we know about the group membership. Let's see what the ramifications of that."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bE7yPyuWoSUy"
|
||||
@@ -195,7 +195,7 @@
|
||||
"source": [
|
||||
"def compute_probability_get_loan(credit_scores, frequencies, threshold):\n",
|
||||
" # TODO - Write this function\n",
|
||||
" # Return the probability that somemone from this group loan based on the frequencies of each\n",
|
||||
" # Return the probability that someone from this group loan based on the frequencies of each\n",
|
||||
" # credit score for this group\n",
|
||||
" # Replace this line:\n",
|
||||
" prob = 0.5\n",
|
||||
@@ -297,7 +297,7 @@
|
||||
"\n",
|
||||
"This criterion is clearly not great. The blue and yellow groups get given loans at different rates overall, and (for this threshold), the false alarms and true positives are also different, so it's not even fair when we consider whether the loans really were paid back. \n",
|
||||
"\n",
|
||||
"TODO -- investigate setting a different threshols $\\tau_{0}=\\tau_{1}$. Is it possible to make the overall rates that loans are given the same? Is it possible to make the false alarm rates the same? Is it possible to make the true positive rates the same?"
|
||||
"TODO -- investigate setting a different threshold $\\tau_{0}=\\tau_{1}$. Is it possible to make the overall rates that loans are given the same? Is it possible to make the false alarm rates the same? Is it possible to make the true positive rates the same?"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "UCObTsa57uuC"
|
||||
|
||||
@@ -400,7 +400,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"This model is easilly intepretable. The k'th coeffeicient tells us the how much (and in which direction) changing the value of the k'th input will change the output. This is only valid in the vicinity of the input $x$.\n",
|
||||
"This model is easily interpretable. The k'th coefficient tells us the how much (and in which direction) changing the value of the k'th input will change the output. This is only valid in the vicinity of the input $x$.\n",
|
||||
"\n",
|
||||
"Note that a more sophisticated version of LIME would weight the training points according to how close they are to the original data point of interest."
|
||||
],
|
||||
|
||||
BIN
UDL_Errata.pdf
Normal file
BIN
UDL_Errata.pdf
Normal file
Binary file not shown.
34
index.html
34
index.html
@@ -11,15 +11,18 @@
|
||||
<div>
|
||||
<h1 style="margin: 0; font-size: 36px">Understanding Deep Learning</h1>
|
||||
by Simon J.D. Prince
|
||||
<br>To be published by MIT Press Dec 5th 2023.<br>
|
||||
<br>Published by MIT Press Dec 5th 2023.<br>
|
||||
<ul>
|
||||
<li>
|
||||
<p style="font-size: larger; margin-bottom: 0">Download draft PDF Chapters 1-21 <a
|
||||
href="https://github.com/udlbook/udlbook/releases/download/v1.15/UnderstandingDeepLearning_23_10_23_C.pdf">here</a>
|
||||
</p>2023-23-23. CC-BY-NC-ND license<br>
|
||||
href="https://github.com/udlbook/udlbook/releases/download/v1.16/UnderstandingDeepLearning_24_11_23_C.pdf">here</a>
|
||||
</p>2023-11-24. CC-BY-NC-ND license<br>
|
||||
<img src="https://img.shields.io/github/downloads/udlbook/udlbook/total" alt="download stats shield">
|
||||
</li>
|
||||
<li> Report errata via <a href="https://github.com/udlbook/udlbook/issues">github</a>
|
||||
<li> Order your copy from <a href="https://mitpress.mit.edu/9780262048644/understanding-deep-learning/">here </a></li>
|
||||
<li> Known errata can be found here: <a
|
||||
href="https://github.com/udlbook/udlbook/raw/main/UDL_Errata.pdf">PDF</a></li>
|
||||
<li> Report new errata via <a href="https://github.com/udlbook/udlbook/issues">github</a>
|
||||
or contact me directly at udlbookmail@gmail.com
|
||||
<li> Follow me on <a href="https://twitter.com/SimonPrinceAI">Twitter</a> or <a
|
||||
href="https://www.linkedin.com/in/simon-prince-615bb9165/">LinkedIn</a> for updates.
|
||||
@@ -157,7 +160,7 @@
|
||||
Figures</a>
|
||||
<li> Chapter 16 - Normalizing flows: <a
|
||||
href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap16PDF.zip">PDF Figures</a> / <a
|
||||
href="https://drive.google.com/uc?export=download&id=1B9bxtmdugwtg-b7Y4AdQKAIEVWxjx8l3"> SVG Figures</a>
|
||||
href="https://drive.google.com/uc?export=download&id=1B9bxtmdugwtg-b7Y4AdQKAIEVWxjx8l3"> SVG Figures</a>
|
||||
/
|
||||
<a href="https://docs.google.com/presentation/d/1nLLzqb9pdfF_h6i1HUDSyp7kSMIkSUUA/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">PowerPoint
|
||||
Figures</a>
|
||||
@@ -169,7 +172,9 @@
|
||||
Figures</a>
|
||||
<li> Chapter 18 - Diffusion models: <a
|
||||
href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap18PDF.zip">PDF Figures</a> / <a
|
||||
href="https://docs.google.com/presentation/d/1x_ufIBtVPzWUvRieKMkpw5SdRjXWwdfR/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
|
||||
href="https://drive.google.com/uc?export=download&id=1A-pIGl4PxjVMYOKAUG3aT4a8wD3G-q_r"> SVG Figures</a>
|
||||
/
|
||||
<a href="https://docs.google.com/presentation/d/1x_ufIBtVPzWUvRieKMkpw5SdRjXWwdfR/edit?usp=drive_link&ouid=110441678248547154185&rtpof=true&sd=true">
|
||||
PowerPoint Figures</a>
|
||||
<li> Chapter 19 - Deep reinforcement learning: <a
|
||||
href="https://github.com/udlbook/udlbook/raw/main/PDFFigures/UDLChap19PDF.zip">PDF Figures</a> / <a
|
||||
@@ -200,6 +205,23 @@
|
||||
Instructions for editing figures / equations can be found <a
|
||||
href="https://drive.google.com/file/d/1T_MXXVR4AfyMnlEFI-UVDh--FXI5deAp/view?usp=sharing">here</a>.
|
||||
|
||||
<p> My slides for 20 lecture undergraduate deep learning course:</p>
|
||||
<ul>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=17RHb11BrydOvxSFNbRIomE1QKLVI087m">1. Introduction</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1491zkHULC7gDfqlV6cqUxyVYXZ-de-Ub">2. Supervised Learning</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1XkP1c9EhOBowla1rT1nnsDGMf2rZvrt7">3. Shallow Neural Networks</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1e2ejfZbbfMKLBv0v-tvBWBdI8gO3SSS1">4. Deep Neural Networks</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1fxQ_a1Q3eFPZ4kPqKbak6_emJK-JfnRH">5. Loss Functions</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=17QQ5ZzXBtR_uCNCUU1gPRWWRUeZN9exW">6. Fitting Models</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1hC8JUCOaFWiw3KGn0rm7nW6mEq242QDK">7. Computing Gradients</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1tSjCeAVg0JCeBcPgDJDbi7Gg43Qkh9_d">7b. Initialization</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1RVZW3KjEs0vNSGx3B2fdizddlr6I0wLl">8. Performance</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1LTicIKPRPbZRkkg6qOr1DSuOB72axood">9. Regularization</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1bGVuwAwrofzZdfvj267elIzkYMIvYFj0">10. Convolutional Networks</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1Kllhj0HdS_I3qE2XDU6ifgGGj3tmSRcl">11. Image Generation</a></li>
|
||||
<li><a href="https://drive.google.com/uc?export=download&id=1af6bTTjAbhDYfrDhboW7Fuv52Gk9ygKr">12. Transformers and LLMs</a></li>
|
||||
</ul>
|
||||
|
||||
<h2>Resources for students</h2>
|
||||
|
||||
<p>Answers to selected questions: <a
|
||||
|
||||
Reference in New Issue
Block a user