Update CM20315_Loss_II.ipynb

This commit is contained in:
Pietro Monticone
2023-11-30 16:39:42 +01:00
parent 6c8411ae1c
commit 428ca727db

View File

@@ -34,7 +34,7 @@
"\n", "\n",
"This practical investigates loss functions. In part I we investigated univariate regression (where the output data $y$ is continuous. Our formulation was based on the normal/Gaussian distribution.\n", "This practical investigates loss functions. In part I we investigated univariate regression (where the output data $y$ is continuous. Our formulation was based on the normal/Gaussian distribution.\n",
"\n", "\n",
"In this notebook, we investigate binary classification (where the output data is 0 or 1). This will be based on the Bernouilli distribution\n", "In this notebook, we investigate binary classification (where the output data is 0 or 1). This will be based on the Bernoulli distribution\n",
"\n", "\n",
"In part III we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K.\n", "In part III we'll investigate multiclass classification (where the outputs data can take multiple values 1,... K.\n",
"\n", "\n",
@@ -199,7 +199,7 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probabiilty, that y=1. The black dots show the training data. We'll compute the the likelihood and the negative log likelihood." "The left is model output and the right is the model output after the sigmoid has been applied, so it now lies in the range [0,1] and represents the probability, that y=1. The black dots show the training data. We'll compute the the likelihood and the negative log likelihood."
], ],
"metadata": { "metadata": {
"id": "MvVX6tl9AEXF" "id": "MvVX6tl9AEXF"
@@ -210,7 +210,7 @@
"source": [ "source": [
"# Return probability under Bernoulli distribution for input x\n", "# Return probability under Bernoulli distribution for input x\n",
"def bernoulli_distribution(y, lambda_param):\n", "def bernoulli_distribution(y, lambda_param):\n",
" # TODO-- write in the equation for the Bernoullid distribution \n", " # TODO-- write in the equation for the Bernoulli distribution \n",
" # Equation 5.17 from the notes (you will need np.power)\n", " # Equation 5.17 from the notes (you will need np.power)\n",
" # Replace the line below\n", " # Replace the line below\n",
" prob = np.zeros_like(y)\n", " prob = np.zeros_like(y)\n",
@@ -249,7 +249,7 @@
"source": [ "source": [
"# Return the likelihood of all of the data under the model\n", "# Return the likelihood of all of the data under the model\n",
"def compute_likelihood(y_train, lambda_param):\n", "def compute_likelihood(y_train, lambda_param):\n",
" # TODO -- compute the likelihood of the data -- the product of the Bernoullis probabilities for each data point\n", " # TODO -- compute the likelihood of the data -- the product of the Bernoulli's probabilities for each data point\n",
" # Top line of equation 5.3 in the notes\n", " # Top line of equation 5.3 in the notes\n",
" # You will need np.prod() and the bernoulli_distribution function you used above\n", " # You will need np.prod() and the bernoulli_distribution function you used above\n",
" # Replace the line below\n", " # Replace the line below\n",
@@ -284,7 +284,7 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of sveral probabilities, which are all quite small themselves.\n", "You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of several probabilities, which are all quite small themselves.\n",
"This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n", "This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
"\n", "\n",
"This is why we use negative log likelihood" "This is why we use negative log likelihood"
@@ -317,7 +317,7 @@
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n", "beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
"# Use our neural network to predict the mean of the Gaussian\n", "# Use our neural network to predict the mean of the Gaussian\n",
"model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n", "model_out = shallow_nn(x_train, beta_0, omega_0, beta_1, omega_1)\n",
"# Set the standard devation to something reasonable\n", "# Set the standard deviation to something reasonable\n",
"lambda_train = sigmoid(model_out)\n", "lambda_train = sigmoid(model_out)\n",
"# Compute the log likelihood\n", "# Compute the log likelihood\n",
"nll = compute_negative_log_likelihood(y_train, lambda_train)\n", "nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
@@ -362,7 +362,7 @@
"source": [ "source": [
"# Define a range of values for the parameter\n", "# Define a range of values for the parameter\n",
"beta_1_vals = np.arange(-2,6.0,0.1)\n", "beta_1_vals = np.arange(-2,6.0,0.1)\n",
"# Create some arrays to store the likelihoods, negative log likehoods\n", "# Create some arrays to store the likelihoods, negative log likelihoods\n",
"likelihoods = np.zeros_like(beta_1_vals)\n", "likelihoods = np.zeros_like(beta_1_vals)\n",
"nlls = np.zeros_like(beta_1_vals)\n", "nlls = np.zeros_like(beta_1_vals)\n",
"\n", "\n",