Add files via upload
This commit is contained in:
@@ -1,33 +1,22 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"authorship_tag": "ABX9TyOPv/l+ToaApJV7Nz+8AtpV",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
"colab_type": "text",
|
||||
"id": "view-in-github"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap05/5_3_Multiclass_Cross_entropy_Loss.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "jSlFkICHwHQF"
|
||||
},
|
||||
"source": [
|
||||
"# **Notebook 5.3 Multiclass Cross-Entropy Loss**\n",
|
||||
"\n",
|
||||
@@ -36,10 +25,7 @@
|
||||
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
|
||||
"\n",
|
||||
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "jSlFkICHwHQF"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -61,6 +47,11 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "Fv7SZR3tv7mV"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define the Rectified Linear Unit (ReLU) function\n",
|
||||
"def ReLU(preactivation):\n",
|
||||
@@ -77,15 +68,15 @@
|
||||
" h1 = ReLU(np.matmul(beta_0,np.ones((1,n_data))) + np.matmul(omega_0,x))\n",
|
||||
" model_out = np.matmul(beta_1,np.ones((1,n_data))) + np.matmul(omega_1,h1)\n",
|
||||
" return model_out"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "Fv7SZR3tv7mV"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "pUT9Ain_HRim"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get parameters for model -- we can call this function to easily reset them\n",
|
||||
"def get_parameters():\n",
|
||||
@@ -103,15 +94,15 @@
|
||||
" omega_1[2,0] = 16.0; omega_1[2,1] = -8.0; omega_1[2,2] =-8\n",
|
||||
"\n",
|
||||
" return beta_0, omega_0, beta_1, omega_1"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "pUT9Ain_HRim"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "NRR67ri_1TzN"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Utility function for plotting data\n",
|
||||
"def plot_multiclass_classification(x_model, out_model, lambda_model, x_data = None, y_data = None, title= None):\n",
|
||||
@@ -148,26 +139,27 @@
|
||||
" if y_data[i] ==2:\n",
|
||||
" ax[1].plot(x_data[i],-0.05, 'b.')\n",
|
||||
" plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "NRR67ri_1TzN"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "PsgLZwsPxauP"
|
||||
},
|
||||
"source": [
|
||||
"# Multiclass classification\n",
|
||||
"\n",
|
||||
"For multiclass classification, the network must predict the probability of $K$ classes, using $K$ outputs. However, these probability must be non-negative and sum to one, and the network outputs can take arbitrary values. Hence, we pass the outputs through a softmax function which maps $K$ arbitrary values to $K$ non-negative values that sum to one."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "PsgLZwsPxauP"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "uFb8h-9IXnIe"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Softmax function that maps a vector of arbitrary values to a vector of values that are positive and sum to one.\n",
|
||||
"def softmax(model_out):\n",
|
||||
@@ -184,15 +176,15 @@
|
||||
" softmax_model_out = np.ones_like(model_out)/ exp_model_out.shape[0]\n",
|
||||
"\n",
|
||||
" return softmax_model_out"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "uFb8h-9IXnIe"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "VWzNOt1swFVd"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"# Let's create some 1D training data\n",
|
||||
@@ -214,62 +206,64 @@
|
||||
"model_out= shallow_nn(x_model, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"lambda_model = softmax(model_out)\n",
|
||||
"plot_multiclass_classification(x_model, model_out, lambda_model, x_train, y_train)\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "VWzNOt1swFVd"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue) The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MvVX6tl9AEXF"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"The left is model output and the right is the model output after the softmax has been applied, so it now lies in the range [0,1] and represents the probability, that y=0 (red), 1 (green) and 2 (blue) The dots at the bottom show the training data with the same color scheme. So we want the red curve to be high where there are red dots, the green curve to be high where there are green dots, and the blue curve to be high where there are blue dots We'll compute the the likelihood and the negative log likelihood."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "YaLdRlEX0FkU"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Return probability under Categorical distribution for input x\n",
|
||||
"# Just take value from row k of lambda param where y =k,\n",
|
||||
"def categorical_distribution(y, lambda_param):\n",
|
||||
" return np.array([lambda_param[row, i] for i, row in enumerate (y)])"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "YaLdRlEX0FkU"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "4TSL14dqHHbV"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's double check we get the right answer before proceeding\n",
|
||||
"print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.2,categorical_distribution(np.array([[0]]),np.array([[0.2],[0.5],[0.3]]))))\n",
|
||||
"print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.5,categorical_distribution(np.array([[1]]),np.array([[0.2],[0.5],[0.3]]))))\n",
|
||||
"print(\"Correct answer = %3.3f, Your answer = %3.3f\"%(0.3,categorical_distribution(np.array([[2]]),np.array([[0.2],[0.5],[0.3]]))))\n",
|
||||
"\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "4TSL14dqHHbV"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's compute the likelihood using this function"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "R5z_0dzQMF35"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Now let's compute the likelihood using this function"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "zpS7o6liCx7f"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Return the likelihood of all of the data under the model\n",
|
||||
"def compute_likelihood(y_train, lambda_param):\n",
|
||||
@@ -280,15 +274,15 @@
|
||||
" likelihood = 0\n",
|
||||
"\n",
|
||||
" return likelihood"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "zpS7o6liCx7f"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "1hQxBLoVNlr2"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's test this\n",
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
@@ -299,27 +293,28 @@
|
||||
"likelihood = compute_likelihood(y_train, lambda_train)\n",
|
||||
"# Let's double check we get the right answer before proceeding\n",
|
||||
"print(\"Correct answer = %9.9f, Your answer = %9.9f\"%(0.000000041,likelihood))"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "1hQxBLoVNlr2"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "HzphKgPfOvlk"
|
||||
},
|
||||
"source": [
|
||||
"You can see that this gives a very small answer, even for this small 1D dataset, and with the model fitting quite well. This is because it is the product of several probabilities, which are all quite small themselves.\n",
|
||||
"This will get out of hand pretty quickly with real datasets -- the likelihood will get so small that we can't represent it with normal finite-precision math\n",
|
||||
"\n",
|
||||
"This is why we use negative log likelihood"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "HzphKgPfOvlk"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "dsT0CWiKBmTV"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Return the negative log likelihood of the data under the model\n",
|
||||
"def compute_negative_log_likelihood(y_train, lambda_param):\n",
|
||||
@@ -329,15 +324,15 @@
|
||||
" nll = 0\n",
|
||||
"\n",
|
||||
" return nll"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "dsT0CWiKBmTV"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "nVxUXg9rQmwI"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's test this\n",
|
||||
"beta_0, omega_0, beta_1, omega_1 = get_parameters()\n",
|
||||
@@ -349,24 +344,25 @@
|
||||
"nll = compute_negative_log_likelihood(y_train, lambda_train)\n",
|
||||
"# Let's double check we get the right answer before proceeding\n",
|
||||
"print(\"Correct answer = %9.9f, Your answer = %9.9f\"%(17.015457867,nll))"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "nVxUXg9rQmwI"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now let's investigate finding the maximum likelihood / minimum log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, beta_1 (formerly phi_0)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "OgcRojvPWh4V"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Now let's investigate finding the maximum likelihood / minimum log likelihood solution. For simplicity, we'll assume that all the parameters are fixed except one and look at how the likelihood and log likelihood change as we manipulate the last parameter. We'll start with overall y_offset, $\\beta_1$ (formerly $\\phi_0$)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "pFKtDaAeVU4U"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define a range of values for the parameter\n",
|
||||
"beta_1_vals = np.arange(-2,6.0,0.1)\n",
|
||||
@@ -391,15 +387,15 @@
|
||||
" model_out = shallow_nn(x_model, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
" lambda_model = softmax(model_out)\n",
|
||||
" plot_multiclass_classification(x_model, model_out, lambda_model, x_train, y_train, title=\"beta1[0,0]=%3.3f\"%(beta_1[0,0]))\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "pFKtDaAeVU4U"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "UHXeTa9MagO6"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Now let's plot the likelihood, negative log likelihood, and least squares as a function the value of the offset beta1\n",
|
||||
"fig, ax = plt.subplots()\n",
|
||||
@@ -421,15 +417,15 @@
|
||||
"plt.axvline(x = beta_1_vals[np.argmax(likelihoods)], linestyle='dotted')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "UHXeTa9MagO6"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "aDEPhddNdN4u"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Hopefully, you can see that the maximum of the likelihood fn is at the same position as the minimum negative log likelihood solution\n",
|
||||
"# Let's check that:\n",
|
||||
@@ -441,24 +437,36 @@
|
||||
"model_out = shallow_nn(x_model, beta_0, omega_0, beta_1, omega_1)\n",
|
||||
"lambda_model = softmax(model_out)\n",
|
||||
"plot_multiclass_classification(x_model, model_out, lambda_model, x_train, y_train, title=\"beta1[0,0]=%3.3f\"%(beta_1[0,0]))\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "aDEPhddNdN4u"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "771G8N1Vk5A2"
|
||||
},
|
||||
"source": [
|
||||
"They both give the same answer. But you can see from the likelihood above that the likelihood is very small unless the parameters are almost correct. So in practice, we would work with the negative log likelihood.<br><br>\n",
|
||||
"\n",
|
||||
"Again, to fit the full neural model we would vary all of the 16 parameters of the network in the $\\boldsymbol\\beta_{0},\\boldsymbol\\omega_{0},\\boldsymbol\\beta_{1},\\boldsymbol\\omega_{1}$ until we find the combination that have the maximum likelihood / minimum negative log likelihood.<br><br>\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"id": "771G8N1Vk5A2"
|
||||
"colab": {
|
||||
"authorship_tag": "ABX9TyOPv/l+ToaApJV7Nz+8AtpV",
|
||||
"include_colab_link": true,
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
Reference in New Issue
Block a user