udlbook/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyPBrRm/vxXEdut47l4zc2Od",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/udlbook/udlbook/blob/main/Notebooks/Chap13/13_4_Graph_Attention_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# **Notebook 13.1: Graph attention networks**\n",
        "\n",
        "This notebook builds a graph attention mechanism from scratch, as discussed in section 13.8.6 of the book and illustrated in figure 13.12c\n",
        "\n",
        "Work through the cells below, running each cell in turn. In various places you will see the words \"TO DO\". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.\n",
        "\n",
        "Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.\n",
        "\n"
      ],
      "metadata": {
        "id": "t9vk9Elugvmi"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt"
      ],
      "metadata": {
        "id": "OLComQyvCIJ7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "The self-attention mechanism maps $N$ inputs $\\mathbf{x}_{n}\\in\\mathbb{R}^{D}$ and returns $N$ outputs $\\mathbf{x}'_{n}\\in \\mathbb{R}^{D}$.  \n",
        "\n"
      ],
      "metadata": {
        "id": "9OJkkoNqCVK2"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Set seed so we get the same random numbers\n",
        "np.random.seed(1)\n",
        "# Number of nodes in the graph\n",
        "N = 8\n",
        "# Number of dimensions of each input\n",
        "D = 4\n",
        "\n",
        "# Define a graph\n",
        "A = np.array([[0,1,0,1,0,0,0,0],\n",
        "              [1,0,1,1,1,0,0,0],\n",
        "              [0,1,0,0,1,0,0,0],\n",
        "              [1,1,0,0,1,0,0,0],\n",
        "              [0,1,1,1,0,1,0,1],\n",
        "              [0,0,0,0,1,0,1,1],\n",
        "              [0,0,0,0,0,1,0,0],\n",
        "              [0,0,0,0,1,1,0,0]]);\n",
        "print(A)\n",
        "\n",
        "# Let's also define some random data\n",
        "X = np.random.normal(size=(D,N))"
      ],
      "metadata": {
        "id": "oAygJwLiCSri"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We'll also need the weights and biases for the keys, queries, and values (equations 12.2 and 12.4)"
      ],
      "metadata": {
        "id": "W2iHFbtKMaDp"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Choose random values for the parameters\n",
        "omega = np.random.normal(size=(D,D))\n",
        "beta = np.random.normal(size=(D,1))\n",
        "phi = np.random.normal(size=(1,2*D))"
      ],
      "metadata": {
        "id": "79TSK7oLMobe"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We'll need a softmax operation that operates on the columns of the matrix and a ReLU function as well"
      ],
      "metadata": {
        "id": "iYPf6c4MhCgq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Define softmax operation that works independently on each column\n",
        "def softmax_cols(data_in):\n",
        "  # Exponentiate all of the values\n",
        "  exp_values = np.exp(data_in) ;\n",
        "  # Sum over columns\n",
        "  denom = np.sum(exp_values, axis = 0);\n",
        "  # Replicate denominator to N rows\n",
        "  denom = np.matmul(np.ones((data_in.shape[0],1)), denom[np.newaxis,:])\n",
        "  # Compute softmax\n",
        "  softmax = exp_values / denom\n",
        "  # return the answer\n",
        "  return softmax\n",
        "\n",
        "\n",
        "# Define the Rectified Linear Unit (ReLU) function\n",
        "def ReLU(preactivation):\n",
        "  activation = preactivation.clip(0.0)\n",
        "  return activation\n"
      ],
      "metadata": {
        "id": "obaQBdUAMXXv"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        " # Now let's compute self attention in matrix form\n",
        "def graph_attention(X,omega, beta, phi, A):\n",
        "\n",
        "  # TODO -- Write this function (see figure 13.12c)\n",
        "  # 1. Compute X_prime\n",
        "  # 2. Compute S\n",
        "  # 3. To apply the mask, set S to a very large negative number (e.g. -1e20) everywhere where A+I is zero\n",
        "  # 4. Run the softmax function to compute the attention values\n",
        "  # 5. Postmultiply X' by the attention values\n",
        "  # 6. Apply the ReLU function\n",
        "  # Replace this line:\n",
        "  output = np.ones_like(X) ;\n",
        "\n",
        "  return output;"
      ],
      "metadata": {
        "id": "gb2WvQ3SiH8r"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Test out the graph attention mechanism\n",
        "np.set_printoptions(precision=3)\n",
        "output = graph_attention(X, omega, beta, phi, A);\n",
        "print(\"Correct answer is:\")\n",
        "print(\"[[1.796 1.346 0.569 1.703 1.298 1.224 1.24  1.234]\")\n",
        "print(\" [0.768 0.672 0.    0.529 3.841 4.749 5.376 4.761]\")\n",
        "print(\" [0.305 0.129 0.    0.341 0.785 1.014 1.113 1.024]\")\n",
        "print(\" [0.    0.    0.    0.    0.35  0.864 1.098 0.871]]]\")\n",
        "\n",
        "\n",
        "print(\"Your answer is:\")\n",
        "print(output)"
      ],
      "metadata": {
        "id": "d4p6HyHXmDh5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "TODO -- Try to construct a dot-product self-attention mechanism as in practical 12.1 that respects the geometry of the graph and has zero attention between non-neighboring nodes by combining figures 13.12a and 13.12b.\n"
      ],
      "metadata": {
        "id": "QDEkIrcgrql-"
      }
    }
  ]
}