Files
udlbook/Notebooks/Chap08/8_4_High_Dimensional_Spaces.ipynb
2023-11-25 17:02:46 +00:00

236 lines
8.0 KiB
Plaintext

{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPAKqlf9VxztHXKylyJwqe8",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap08/8_4_High_Dimensional_Spaces.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# **Notebook 8.4: High-dimensional spaces**\n",
"\n",
"This notebook investigates the strange properties of high-dimensional spaces as discussed in the notes at the end of chapter 8.\n",
"\n",
"Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
"\n",
"Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions."
],
"metadata": {
"id": "EjLK-kA1KnYX"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4ESMmnkYEVAb"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.special as sci"
]
},
{
"cell_type": "markdown",
"source": [
"# How close are points in high dimensions?\n",
"\n",
"In this part of the notebook, we investigate how close random points are in 2D, 100D, and 1000D. In each case, we generate 1000 points and calculate the Euclidean distance between each pair. "
],
"metadata": {
"id": "MonbPEitLNgN"
}
},
{
"cell_type": "code",
"source": [
"# Fix the random seed so we all have the same random numbers\n",
"np.random.seed(0)\n",
"n_data = 1000\n",
"# Create 1000 data examples (columns) each with 2 dimensions (rows)\n",
"n_dim = 2\n",
"x_2D = np.random.normal(size=(n_dim,n_data))\n",
"# Create 1000 data examples (columns) each with 100 dimensions (rows)\n",
"n_dim = 100\n",
"x_100D = np.random.normal(size=(n_dim,n_data))\n",
"# Create 1000 data examples (columns) each with 1000 dimensions (rows)\n",
"n_dim = 1000\n",
"x_1000D = np.random.normal(size=(n_dim,n_data))"
],
"metadata": {
"id": "vZSHVmcWEk14"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def distance_ratio(x):\n",
" # TODO -- replace the two lines below to calculate the largest and smallest Euclidean distance between\n",
" # the data points in the columns of x. DO NOT include the distance between the data point\n",
" # and itself (which is obviously zero)\n",
" smallest_dist = 1.0\n",
" largest_dist = 1.0\n",
"\n",
" # Calculate the ratio and return\n",
" dist_ratio = largest_dist / smallest_dist\n",
" return dist_ratio"
],
"metadata": {
"id": "PhVmnUs8ErD9"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print('Ratio of largest to smallest distance 2D: %3.3f'%(distance_ratio(x_2D)))\n",
"print('Ratio of largest to smallest distance 100D: %3.3f'%(distance_ratio(x_100D)))\n",
"print('Ratio of largest to smallest distance 1000D: %3.3f'%(distance_ratio(x_1000D)))\n"
],
"metadata": {
"id": "0NdPxfn5GQuJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"If you did this right, you will see that the distance between the nearest and farthest two points in high dimensions is almost the same. "
],
"metadata": {
"id": "uT68c0k8uBxs"
}
},
{
"cell_type": "markdown",
"source": [
"# Volume of a hypersphere\n",
"\n",
"In the second part of this notebook we calculate the volume of a hypersphere of radius 0.5 (i.e., of diameter 1) as a function of the radius. Note that you you can check your answer by doing the calculation for 2D using the standard formula for the area of a circle and making sure it matches."
],
"metadata": {
"id": "b2FYKV1SL4Z7"
}
},
{
"cell_type": "code",
"source": [
"def volume_of_hypersphere(diameter, dimensions):\n",
" # Formula given in Problem 8.7 of the book\n",
" # You will need sci.gamma()\n",
" # Check out: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gamma.html\n",
" # Also use this value for pi\n",
" pi = np.pi\n",
" # TODO replace this code with formula for the volume of a hypersphere\n",
" volume = 1.0\n",
"\n",
" return volume\n"
],
"metadata": {
"id": "CZoNhD8XJaHR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"diameter = 1.0\n",
"for c_dim in range(1,11):\n",
" print(\"Volume of unit diameter hypersphere in %d dimensions is %3.3f\"%(c_dim, volume_of_hypersphere(diameter, c_dim)))"
],
"metadata": {
"id": "fNTBlg_GPEUh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"You should see that the volume decreases to almost nothing in high dimensions. All of the volume is in the corners of the unit hypercube (which always has volume 1)."
],
"metadata": {
"id": "PtaeGSNBunJl"
}
},
{
"cell_type": "markdown",
"source": [
"# Proportion of hypersphere in outer shell\n",
"\n",
"In the third part of the notebook you will calculate what proportion of the volume of a hypersphere is in the outer 1% of the radius/diameter. Calculate the volume of a hypersphere and then the volume of a hypersphere with 0.99 of the radius and then figure out the ratio. "
],
"metadata": {
"id": "GdyMeOBmoXyF"
}
},
{
"cell_type": "code",
"source": [
"def get_prop_of_volume_in_outer_1_percent(dimension):\n",
" # TODO -- replace this line\n",
" proportion = 1.0\n",
"\n",
" return proportion"
],
"metadata": {
"id": "8_CxZ2AIpQ8w"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# While we're here, let's look at how much of the volume is in the outer 1% of the radius\n",
"for c_dim in [1,2,10,20,50,100,150,200,250,300]:\n",
" print('Proportion of volume in outer 1 percent of radius in %d dimensions =%3.3f'%(c_dim, get_prop_of_volume_in_outer_1_percent(c_dim)))"
],
"metadata": {
"id": "LtMDIn2qPVfJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"You should see see that by the time we get to 300 dimensions most of the volume is in the outer 1 percent. <br><br>\n",
"\n",
"The conclusion of all of this is that in high dimensions you should be sceptical of your intuitions about how things work. I have tried to visualize many things in one or two dimensions in the book, but you should also be sceptical about these visualizations!"
],
"metadata": {
"id": "n_FVeb-ZwzxD"
}
}
]
}