{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Introduction to PyTorch \n", "========================\n", "\n", "Below contents are taken from PyTorch tutorials.\n", "\n", "Tensors are the central data abstraction in PyTorch.\n", "\n", "First things first, let’s import the PyTorch module." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import torch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating Tensors\n", "----------------\n", "\n", "The simplest way to create a tensor is with the ``torch.empty()`` call:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:08:38.261951Z", "iopub.status.busy": "2022-01-20T03:08:38.261683Z", "iopub.status.idle": "2022-01-20T03:08:38.345349Z", "shell.execute_reply": "2022-01-20T03:08:38.344486Z", "shell.execute_reply.started": "2022-01-20T03:08:38.261921Z" }, "jupyter": { "outputs_hidden": false }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "tensor([[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]])\n" ] } ], "source": [ "x = torch.empty(3, 4)\n", "print(type(x))\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s unpack what we just did:\n", "\n", "- We created a tensor using one of the numerous factory methods\n", " attached to the ``torch`` module.\n", "- The tensor itself is 2-dimensional, having 3 rows and 4 columns.\n", "- The type of the object returned is ``torch.Tensor``, which is an\n", " alias for ``torch.FloatTensor``; by default, PyTorch tensors are\n", " populated with 32-bit floating point numbers. (More on data types\n", " below.)\n", "- You will probably see some random-looking values when printing your\n", " tensor. The ``torch.empty()`` call allocates memory for the tensor,\n", " but does not initialize it with any values - so what you’re seeing is\n", " whatever was in memory at the time of allocation.\n", "\n", "A brief note about tensors and their number of dimensions, and\n", "terminology:\n", "\n", "- You will sometimes see a 1-dimensional tensor called a\n", " *vector.* \n", "- Likewise, a 2-dimensional tensor is often referred to as a\n", " *matrix.* \n", "- Anything with more than two dimensions is generally just\n", " called a tensor.\n", "\n", "More often than not, you’ll want to initialize your tensor with some\n", "value. Common cases are all zeros, all ones, or random values, and the\n", "``torch`` module provides factory methods for all of these:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T02:12:34.751044Z", "iopub.status.busy": "2022-01-20T02:12:34.749827Z", "iopub.status.idle": "2022-01-20T02:12:34.761889Z", "shell.execute_reply": "2022-01-20T02:12:34.760543Z", "shell.execute_reply.started": "2022-01-20T02:12:34.750956Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0., 0., 0.],\n", " [0., 0., 0.]])\n", "tensor([[1., 1., 1.],\n", " [1., 1., 1.]])\n", "tensor([[0.3615, 0.8562, 0.5153],\n", " [0.2048, 0.8640, 0.0635]])\n" ] } ], "source": [ "zeros = torch.zeros(2, 3)\n", "print(zeros)\n", "\n", "ones = torch.ones(2, 3)\n", "print(ones)\n", "\n", "random = torch.rand(2, 3)\n", "print(random)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The factory methods all do just what you’d expect - we have a tensor\n", "full of zeros, another full of ones, and another with random values\n", "between 0 and 1.\n", "\n", "Random Tensors and Seeding\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:10:47.488796Z", "iopub.status.busy": "2022-01-20T03:10:47.488464Z", "iopub.status.idle": "2022-01-20T03:10:47.509105Z", "shell.execute_reply": "2022-01-20T03:10:47.508448Z", "shell.execute_reply.started": "2022-01-20T03:10:47.488762Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.3126, 0.3791, 0.3087],\n", " [0.0736, 0.4216, 0.0691]])\n", "tensor([[0.2332, 0.4047, 0.2162],\n", " [0.9927, 0.4128, 0.5938]])\n", "tensor([[0.3126, 0.3791, 0.3087],\n", " [0.0736, 0.4216, 0.0691]])\n" ] } ], "source": [ "torch.manual_seed(1729)\n", "random1 = torch.rand(2, 3)\n", "print(random1)\n", "\n", "random2 = torch.rand(2, 3)\n", "print(random2)\n", "\n", "torch.manual_seed(1729)\n", "random3 = torch.rand(2, 3)\n", "print(random3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What you should see above is that ``random1`` and ``random3`` carry\n", "identical values. Manually setting\n", "the RNG’s seed resets it, so that identical computations depending on\n", "random number should, in most settings, provide identical results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last way to create a tensor that will cover is to specify its data\n", "directly from a PyTorch collection:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:12:43.801619Z", "iopub.status.busy": "2022-01-20T03:12:43.801096Z", "iopub.status.idle": "2022-01-20T03:12:43.812193Z", "shell.execute_reply": "2022-01-20T03:12:43.811007Z", "shell.execute_reply.started": "2022-01-20T03:12:43.801578Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[3.1416, 2.7183],\n", " [1.6180, 0.0073]])\n", "tensor([ 2, 3, 5, 7, 11, 13, 17, 19])\n", "tensor([[2, 4, 6],\n", " [3, 6, 9]])\n" ] } ], "source": [ "some_constants = torch.tensor([[3.1415926, 2.71828], [1.61803, 0.0072897]])\n", "print(some_constants)\n", "\n", "some_integers = torch.tensor((2, 3, 5, 7, 11, 13, 17, 19))\n", "print(some_integers)\n", "\n", "more_integers = torch.tensor(((2, 4, 6), [3, 6, 9]))\n", "print(more_integers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using ``torch.tensor()`` is the most straightforward way to create a\n", "tensor if you already have data in a Python tuple or list. As shown\n", "above, nesting the collections will result in a multi-dimensional\n", "tensor.\n", "\n", "Tensor Data Types\n", "\n", "Setting the datatype of a tensor is possible a couple of ways:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:15:03.056508Z", "iopub.status.busy": "2022-01-20T03:15:03.056231Z", "iopub.status.idle": "2022-01-20T03:15:03.069431Z", "shell.execute_reply": "2022-01-20T03:15:03.068408Z", "shell.execute_reply.started": "2022-01-20T03:15:03.05648Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1, 1, 1],\n", " [1, 1, 1]], dtype=torch.int16)\n", "tensor([[17.3151, 14.5980, 6.0404],\n", " [18.0429, 7.2532, 19.6519]], dtype=torch.float64)\n", "tensor([[17, 14, 6],\n", " [18, 7, 19]], dtype=torch.int32)\n" ] } ], "source": [ "a = torch.ones((2, 3), dtype=torch.int16)\n", "print(a)\n", "\n", "b = torch.rand((2, 3), dtype=torch.float64) * 20.\n", "print(b)\n", "\n", "c = b.to(torch.int32)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest way to set the underlying data type of a tensor is with an\n", "optional argument at creation time. In the first line of the cell above,\n", "we set ``dtype=torch.int16`` for the tensor ``a``. When we print ``a``,\n", "we can see that it’s full of ``1`` rather than ``1.`` - Python’s subtle\n", "cue that this is an integer type rather than floating point.\n", "\n", "Another thing to notice about printing ``a`` is that, unlike when we\n", "left ``dtype`` as the default (32-bit floating point), printing the\n", "tensor also specifies its ``dtype``.\n", "\n", "The other way to set the datatype is with the ``.to()`` method. In the\n", "cell above, we create a random floating point tensor ``b`` in the usual\n", "way. Following that, we create ``c`` by converting ``b`` to a 32-bit\n", "integer with the ``.to()`` method. Note that ``c`` contains all the same\n", "values as ``b``, but truncated to integers.\n", "\n", "Available data types include:\n", "\n", "- ``torch.bool``\n", "- ``torch.int8``\n", "- ``torch.uint8``\n", "- ``torch.int16``\n", "- ``torch.int32``\n", "- ``torch.int64``\n", "- ``torch.half``\n", "- ``torch.float``\n", "- ``torch.double``\n", "\n", "Math & Logic with PyTorch Tensors\n", "---------------------------------\n", "\n", "Now that you know some of the ways to create a tensor… what can you do\n", "with them?\n", "\n", "Let’s look at basic arithmetic first, and how tensors interact with\n", "simple scalars:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:18:27.790422Z", "iopub.status.busy": "2022-01-20T03:18:27.78915Z", "iopub.status.idle": "2022-01-20T03:18:27.818612Z", "shell.execute_reply": "2022-01-20T03:18:27.817811Z", "shell.execute_reply.started": "2022-01-20T03:18:27.790371Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1., 1.],\n", " [1., 1.]])\n", "tensor([[2., 2.],\n", " [2., 2.]])\n", "tensor([[3., 3.],\n", " [3., 3.]])\n", "tensor([[4., 4.],\n", " [4., 4.]])\n", "tensor([[1.4142, 1.4142],\n", " [1.4142, 1.4142]])\n" ] } ], "source": [ "ones = torch.zeros(2, 2) + 1\n", "twos = torch.ones(2, 2) * 2\n", "threes = (torch.ones(2, 2) * 7 - 1) / 2\n", "fours = twos ** 2\n", "sqrt2s = twos ** 0.5\n", "\n", "print(ones)\n", "print(twos)\n", "print(threes)\n", "print(fours)\n", "print(sqrt2s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see above, arithmetic operations between tensors and scalars,\n", "such as addition, subtraction, multiplication, division, and\n", "exponentiation are distributed over every element of the tensor. Because\n", "the output of such an operation will be a tensor, you can chain them\n", "together with the usual operator precedence rules, as in the line where\n", "we create ``threes``.\n", "\n", "Similar operations between two tensors also behave like you’d\n", "intuitively expect:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:20:03.841466Z", "iopub.status.busy": "2022-01-20T03:20:03.840528Z", "iopub.status.idle": "2022-01-20T03:20:03.85142Z", "shell.execute_reply": "2022-01-20T03:20:03.850653Z", "shell.execute_reply.started": "2022-01-20T03:20:03.841415Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 2., 4.],\n", " [ 8., 16.]])\n", "tensor([[5., 5.],\n", " [5., 5.]])\n", "tensor([[12., 12.],\n", " [12., 12.]])\n" ] } ], "source": [ "powers2 = twos ** torch.tensor([[1, 2], [3, 4]])\n", "print(powers2)\n", "\n", "fives = ones + fours\n", "print(fives)\n", "\n", "dozens = threes * fours\n", "print(dozens)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It’s important to note here that all of the tensors in the previous code\n", "cell were of identical shape. What happens when we try to perform a\n", "binary operation on tensors if dissimilar shape?\n", "\n", "\n", " \n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:21:27.052638Z", "iopub.status.busy": "2022-01-20T03:21:27.051692Z", "iopub.status.idle": "2022-01-20T03:21:27.194872Z", "shell.execute_reply": "2022-01-20T03:21:27.193017Z", "shell.execute_reply.started": "2022-01-20T03:21:27.052543Z" }, "scrolled": true }, "outputs": [ { "ename": "RuntimeError", "evalue": "The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[15], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m a \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mrand(\u001b[38;5;241m2\u001b[39m, \u001b[38;5;241m3\u001b[39m)\n\u001b[1;32m 2\u001b[0m b \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mrand(\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m2\u001b[39m)\n\u001b[0;32m----> 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m)\n", "\u001b[0;31mRuntimeError\u001b[0m: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1" ] } ], "source": [ "a = torch.rand(2, 3)\n", "b = torch.rand(3, 2)\n", "\n", "print(a * b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the general case, you cannot operate on tensors of different shape\n", "this way (element-wise), even in a case like the cell above, where the tensors have an\n", "identical number of elements.\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.4518, 0.7276],\n", " [0.4494, 0.6302]])\n" ] } ], "source": [ "print(a @ b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For matrix multiplication, you should use '@' instead of '*'" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:23:25.800228Z", "iopub.status.busy": "2022-01-20T03:23:25.799321Z", "iopub.status.idle": "2022-01-20T03:23:25.811303Z", "shell.execute_reply": "2022-01-20T03:23:25.810297Z", "shell.execute_reply.started": "2022-01-20T03:23:25.800174Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.0703, 0.5105, 0.9451, 0.2359],\n", " [0.1979, 0.3327, 0.6146, 0.5999]])\n", "tensor([[0.1405, 1.0210, 1.8901, 0.4717],\n", " [0.3959, 0.6655, 1.2291, 1.1998]])\n" ] } ], "source": [ "rand = torch.rand(2, 4)\n", "doubled = rand * (torch.ones(1, 4) * 2)\n", "\n", "print(rand)\n", "print(doubled)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What’s the trick here? How is it we got to multiply a 2x4 tensor by a\n", "1x4 tensor?\n", "\n", "Broadcasting is a way to perform an operation between tensors that have\n", "similarities in their shapes. In the example above, the one-row,\n", "four-column tensor is multiplied by *both rows* of the two-row,\n", "four-column tensor.\n", "\n", "This is an important operation in Deep Learning. The common example is\n", "multiplying a tensor of learning weights by a *batch* of input tensors,\n", "applying the operation to each instance in the batch separately, and\n", "returning a tensor of identical shape - just like our (2, 4) \\* (1, 4)\n", "example above returned a tensor of shape (2, 4).\n", "\n", "The rules for broadcasting are:\n", "\n", "- Each tensor must have at least one dimension - no empty tensors.\n", "\n", "- Comparing the dimension sizes of the two tensors, *going from last to\n", " first:*\n", "\n", " - Each dimension must be equal, *or*\n", "\n", " - One of the dimensions must be of size 1, *or*\n", "\n", " - The dimension does not exist in one of the tensors\n", "\n", "Tensors of identical shape, of course, are trivially “broadcastable”, as\n", "you saw earlier.\n", "\n", "Here are some examples of situations that honor the above rules and\n", "allow broadcasting:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:31:48.916345Z", "iopub.status.busy": "2022-01-20T03:31:48.916048Z", "iopub.status.idle": "2022-01-20T03:31:48.926128Z", "shell.execute_reply": "2022-01-20T03:31:48.925446Z", "shell.execute_reply.started": "2022-01-20T03:31:48.916314Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[0.0703, 0.5105],\n", " [0.9451, 0.2359],\n", " [0.1979, 0.3327]],\n", "\n", " [[0.0703, 0.5105],\n", " [0.9451, 0.2359],\n", " [0.1979, 0.3327]],\n", "\n", " [[0.0703, 0.5105],\n", " [0.9451, 0.2359],\n", " [0.1979, 0.3327]],\n", "\n", " [[0.0703, 0.5105],\n", " [0.9451, 0.2359],\n", " [0.1979, 0.3327]]]) torch.Size([4, 3, 2])\n", "tensor([[[0.6146, 0.6146],\n", " [0.5999, 0.5999],\n", " [0.5013, 0.5013]],\n", "\n", " [[0.6146, 0.6146],\n", " [0.5999, 0.5999],\n", " [0.5013, 0.5013]],\n", "\n", " [[0.6146, 0.6146],\n", " [0.5999, 0.5999],\n", " [0.5013, 0.5013]],\n", "\n", " [[0.6146, 0.6146],\n", " [0.5999, 0.5999],\n", " [0.5013, 0.5013]]]) torch.Size([4, 3, 2])\n", "tensor([[[0.9397, 0.8656],\n", " [0.9397, 0.8656],\n", " [0.9397, 0.8656]],\n", "\n", " [[0.9397, 0.8656],\n", " [0.9397, 0.8656],\n", " [0.9397, 0.8656]],\n", "\n", " [[0.9397, 0.8656],\n", " [0.9397, 0.8656],\n", " [0.9397, 0.8656]],\n", "\n", " [[0.9397, 0.8656],\n", " [0.9397, 0.8656],\n", " [0.9397, 0.8656]]]) torch.Size([4, 3, 2])\n" ] } ], "source": [ "a = torch.ones(4, 3, 2)\n", "\n", "b = a * torch.rand(1, 3, 2) # 3rd & 2nd dims identical to a, dim 1 absent\n", "print(b, b.size())\n", "\n", "c = a * torch.rand(3, 1) # 3rd dim = 1, 2nd dim identical to a\n", "print(c, c.size())\n", "\n", "d = a * torch.rand(1, 2) # 3rd dim identical to a, 2nd dim = 1\n", "print(d, d.size())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look closely at the values of each tensor above: \n", "For (layer, rwo, column)\n", "\n", "- The multiplication operation that created ``b`` was \n", " broadcast over every “layer” of ``a``.\n", "- For ``c``, the operation was broadcast over ever layer and row of\n", " ``a`` - every 3-element column is identical. \n", "- For ``d``, we switched it around - now every *row* is identical,\n", " across layers and columns.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More Math with Tensors\n", "~~~~~~~~~~~~~~~~~~~~~~\n", "\n", "PyTorch tensors have over three hundred operations that can be performed\n", "on them.\n", "\n", "Here is a small sample from some of the major categories of operations:\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:32:45.971242Z", "iopub.status.busy": "2022-01-20T03:32:45.97084Z", "iopub.status.idle": "2022-01-20T03:32:45.975648Z", "shell.execute_reply": "2022-01-20T03:32:45.974722Z", "shell.execute_reply.started": "2022-01-20T03:32:45.971194Z" } }, "outputs": [], "source": [ "import math" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:33:45.238621Z", "iopub.status.busy": "2022-01-20T03:33:45.23833Z", "iopub.status.idle": "2022-01-20T03:33:45.341287Z", "shell.execute_reply": "2022-01-20T03:33:45.340625Z", "shell.execute_reply.started": "2022-01-20T03:33:45.238587Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Common functions:\n", "tensor([[0.0413, 0.3730, 0.2772, 0.2987],\n", " [0.4734, 0.0476, 0.8904, 0.5951]])\n", "tensor([[1., 1., -0., 1.],\n", " [-0., -0., -0., -0.]])\n", "tensor([[ 0., 0., -1., 0.],\n", " [-1., -1., -1., -1.]])\n" ] }, { "ename": "NameError", "evalue": "name 'math' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[18], line 9\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;28mprint\u001b[39m(torch\u001b[38;5;241m.\u001b[39mfloor(a))\n\u001b[1;32m 8\u001b[0m \u001b[38;5;66;03m# trigonometric functions and their inverses\u001b[39;00m\n\u001b[0;32m----> 9\u001b[0m angles \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mtensor([\u001b[38;5;241m0\u001b[39m, \u001b[43mmath\u001b[49m\u001b[38;5;241m.\u001b[39mpi \u001b[38;5;241m/\u001b[39m \u001b[38;5;241m4\u001b[39m, math\u001b[38;5;241m.\u001b[39mpi \u001b[38;5;241m/\u001b[39m \u001b[38;5;241m2\u001b[39m, \u001b[38;5;241m3\u001b[39m \u001b[38;5;241m*\u001b[39m math\u001b[38;5;241m.\u001b[39mpi \u001b[38;5;241m/\u001b[39m \u001b[38;5;241m4\u001b[39m])\n\u001b[1;32m 10\u001b[0m sines \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39msin(angles)\n\u001b[1;32m 11\u001b[0m inverses \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39masin(sines)\n", "\u001b[0;31mNameError\u001b[0m: name 'math' is not defined" ] } ], "source": [ "# common functions\n", "a = torch.rand(2, 4) * 2 - 1\n", "print('Common functions:')\n", "print(torch.abs(a))\n", "print(torch.ceil(a))\n", "print(torch.floor(a))\n", "\n", "# trigonometric functions and their inverses\n", "angles = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4])\n", "sines = torch.sin(angles)\n", "inverses = torch.asin(sines)\n", "print('\\nSine and arcsine:')\n", "print(angles)\n", "print(sines)\n", "print(inverses)\n", "\n", "# bitwise operations\n", "print('\\nBitwise XOR:')\n", "b = torch.tensor([1, 5, 11])\n", "c = torch.tensor([2, 7, 10])\n", "print(torch.bitwise_xor(b, c))\n", "\n", "# comparisons:\n", "print('\\nBroadcasted, element-wise equality comparison:')\n", "d = torch.tensor([[1., 2.], [3., 4.]])\n", "e = torch.ones(1, 2) # many comparison ops support broadcasting!\n", "print(torch.eq(d, e)) # returns a tensor of type bool\n", "\n", "# reductions:\n", "print('\\nReduction ops:')\n", "print(torch.max(d)) # returns a single-element tensor\n", "print(torch.mean(d)) # average\n", "print(torch.std(d)) # standard deviation\n", "print(torch.prod(d)) # product of all numbers\n", "print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2]))) # filter unique elements\n", "\n", "# vector and linear algebra operations\n", "m1 = torch.rand(2, 2) # random matrix\n", "m2 = torch.tensor([[3., 0.], [0., 3.]]) # three times identity matrix\n", "\n", "print('\\nMatrices:')\n", "print(m1)\n", "m3 = torch.matmul(m1, m2) # torch.mm\n", "print(m3) # 3 times m1\n", "print(torch.svd(m3)) # singular value decomposition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Autograd\n", "============================\n", "\n", "PyTorch’s *Autograd* feature is part of what make PyTorch flexible and\n", "fast for building machine learning projects. It allows for the rapid and\n", "easy computation of multiple partial derivatives (also referred to as\n", "*gradients)* over a complex computation. This operation is central to\n", "backpropagation-based neural network learning.\n", "\n", "The power of autograd comes from the fact that it traces your\n", "computation dynamically *at runtime,* meaning that if your model has\n", "decision branches, or loops whose lengths are not known until runtime,\n", "the computation will still be traced correctly, and you’ll get correct\n", "gradients to drive learning. This, combined with the fact that your\n", "models are built in Python, offers far more flexibility than frameworks\n", "that rely on static analysis of a more rigidly-structured model for\n", "computing gradients.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "A Simple Example\n", "----------------\n", "\n", "That was a lot of theory - but what does it look like to use autograd in\n", "practice?\n", "\n", "Let’s start with a straightforward example. First, we’ll do some imports\n", "to let us graph our results:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:42:12.773499Z", "iopub.status.busy": "2022-01-20T03:42:12.773135Z", "iopub.status.idle": "2022-01-20T03:42:12.778813Z", "shell.execute_reply": "2022-01-20T03:42:12.778058Z", "shell.execute_reply.started": "2022-01-20T03:42:12.773461Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %matplotlib inline\n", "\n", "import torch\n", "\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as ticker\n", "import math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we’ll create an input tensor full of evenly spaced values on the\n", "interval $[0, 2{\\pi}]$, and specify ``requires_grad=True``. (Like\n", "most functions that create tensors, ``torch.linspace()`` accepts an\n", "optional ``requires_grad`` option.) Setting this flag means that in\n", "every computation that follows, autograd will be accumulating the\n", "history of the computation in the output tensors of that computation.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:42:27.645673Z", "iopub.status.busy": "2022-01-20T03:42:27.645361Z", "iopub.status.idle": "2022-01-20T03:42:27.65522Z", "shell.execute_reply": "2022-01-20T03:42:27.654546Z", "shell.execute_reply.started": "2022-01-20T03:42:27.645642Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([0.0000, 0.2618, 0.5236, 0.7854, 1.0472, 1.3090, 1.5708, 1.8326, 2.0944,\n", " 2.3562, 2.6180, 2.8798, 3.1416, 3.4034, 3.6652, 3.9270, 4.1888, 4.4506,\n", " 4.7124, 4.9742, 5.2360, 5.4978, 5.7596, 6.0214, 6.2832],\n", " requires_grad=True)\n" ] } ], "source": [ "a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we’ll perform a computation, and plot its output in terms of its\n", "inputs:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:42:39.72062Z", "iopub.status.busy": "2022-01-20T03:42:39.72013Z", "iopub.status.idle": "2022-01-20T03:42:39.931487Z", "shell.execute_reply": "2022-01-20T03:42:39.930832Z", "shell.execute_reply.started": "2022-01-20T03:42:39.720568Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "b = torch.sin(a)\n", "plt.plot(a.detach(), b.detach())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s have a closer look at the tensor ``b``. When we print it, we see\n", "an indicator that it is tracking its computation history:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:42:43.607123Z", "iopub.status.busy": "2022-01-20T03:42:43.606846Z", "iopub.status.idle": "2022-01-20T03:42:43.613676Z", "shell.execute_reply": "2022-01-20T03:42:43.612818Z", "shell.execute_reply.started": "2022-01-20T03:42:43.607092Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([ 0.0000e+00, 2.5882e-01, 5.0000e-01, 7.0711e-01, 8.6603e-01,\n", " 9.6593e-01, 1.0000e+00, 9.6593e-01, 8.6603e-01, 7.0711e-01,\n", " 5.0000e-01, 2.5882e-01, -8.7423e-08, -2.5882e-01, -5.0000e-01,\n", " -7.0711e-01, -8.6603e-01, -9.6593e-01, -1.0000e+00, -9.6593e-01,\n", " -8.6603e-01, -7.0711e-01, -5.0000e-01, -2.5882e-01, 1.7485e-07],\n", " grad_fn=)\n" ] } ], "source": [ "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This ``grad_fn`` gives us a hint that when we execute the\n", "backpropagation step and compute gradients, we’ll need to compute the\n", "derivative of $sin(x)$ for all this tensor’s inputs.\n", "\n", "Let’s perform some more computations:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:43:00.994489Z", "iopub.status.busy": "2022-01-20T03:43:00.994219Z", "iopub.status.idle": "2022-01-20T03:43:01.002737Z", "shell.execute_reply": "2022-01-20T03:43:01.001757Z", "shell.execute_reply.started": "2022-01-20T03:43:00.99446Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([ 0.0000e+00, 5.1764e-01, 1.0000e+00, 1.4142e+00, 1.7321e+00,\n", " 1.9319e+00, 2.0000e+00, 1.9319e+00, 1.7321e+00, 1.4142e+00,\n", " 1.0000e+00, 5.1764e-01, -1.7485e-07, -5.1764e-01, -1.0000e+00,\n", " -1.4142e+00, -1.7321e+00, -1.9319e+00, -2.0000e+00, -1.9319e+00,\n", " -1.7321e+00, -1.4142e+00, -1.0000e+00, -5.1764e-01, 3.4969e-07],\n", " grad_fn=)\n", "tensor([ 1.0000e+00, 1.5176e+00, 2.0000e+00, 2.4142e+00, 2.7321e+00,\n", " 2.9319e+00, 3.0000e+00, 2.9319e+00, 2.7321e+00, 2.4142e+00,\n", " 2.0000e+00, 1.5176e+00, 1.0000e+00, 4.8236e-01, -3.5763e-07,\n", " -4.1421e-01, -7.3205e-01, -9.3185e-01, -1.0000e+00, -9.3185e-01,\n", " -7.3205e-01, -4.1421e-01, 4.7684e-07, 4.8236e-01, 1.0000e+00],\n", " grad_fn=)\n" ] } ], "source": [ "c = 2 * b\n", "print(c)\n", "\n", "d = c + 1\n", "print(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let’s compute a single-element output. When you call\n", "``.backward()`` on a tensor with no arguments, it expects the calling\n", "tensor to contain only a single element, as is the case when computing a\n", "loss function.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:43:09.01646Z", "iopub.status.busy": "2022-01-20T03:43:09.016005Z", "iopub.status.idle": "2022-01-20T03:43:09.023232Z", "shell.execute_reply": "2022-01-20T03:43:09.022421Z", "shell.execute_reply.started": "2022-01-20T03:43:09.016413Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(25., grad_fn=)\n" ] } ], "source": [ "# out = (sin(a) * 2 + 1).sum()\n", "out = d.sum()\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each ``grad_fn`` stored with our tensors allows you to walk the\n", "computation all the way back to its inputs with its ``next_functions``\n", "property. We can see below that drilling down on this property on ``d``\n", "shows us the gradient functions for all the prior tensors. Note that\n", "``a.grad_fn`` is reported as ``None``, indicating that this was an input\n", "to the function with no history of its own.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:43:21.364902Z", "iopub.status.busy": "2022-01-20T03:43:21.364331Z", "iopub.status.idle": "2022-01-20T03:43:21.37522Z", "shell.execute_reply": "2022-01-20T03:43:21.374284Z", "shell.execute_reply.started": "2022-01-20T03:43:21.364853Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "d:\n", "\n", "\n", "\n", "\n", "()\n", "\n", "c:\n", "\n", "\n", "b:\n", "\n", "\n", "a:\n", "None\n" ] } ], "source": [ "# d = sin(a) * 2 + 1\n", "print('d:')\n", "print(d.grad_fn)\n", "print(d.grad_fn.next_functions[0][0])\n", "print(d.grad_fn.next_functions[0][0].next_functions[0][0])\n", "print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0])\n", "print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions)\n", "\n", "# c = sin(a) * 2\n", "print('\\nc:')\n", "print(c.grad_fn)\n", "\n", "# b = sin(a)\n", "print('\\nb:')\n", "print(b.grad_fn)\n", "\n", "\n", "print('\\na:')\n", "print(a.grad_fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With all this machinery in place, how do we get derivatives out? You\n", "call the ``backward()`` method on the output, and check the input’s\n", "``grad`` property to inspect the gradients:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:43:58.062579Z", "iopub.status.busy": "2022-01-20T03:43:58.061927Z", "iopub.status.idle": "2022-01-20T03:43:58.278347Z", "shell.execute_reply": "2022-01-20T03:43:58.277577Z", "shell.execute_reply.started": "2022-01-20T03:43:58.062513Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([ 2.0000e+00, 1.9319e+00, 1.7321e+00, 1.4142e+00, 1.0000e+00,\n", " 5.1764e-01, -8.7423e-08, -5.1764e-01, -1.0000e+00, -1.4142e+00,\n", " -1.7321e+00, -1.9319e+00, -2.0000e+00, -1.9319e+00, -1.7321e+00,\n", " -1.4142e+00, -1.0000e+00, -5.1764e-01, 2.3850e-08, 5.1764e-01,\n", " 1.0000e+00, 1.4142e+00, 1.7321e+00, 1.9319e+00, 2.0000e+00])\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "out.backward()\n", "print(a.grad)\n", "plt.plot(a.detach(), a.grad.detach())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall the computation steps we took to get here:\n", "\n", "```\n", "\n", " a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)\n", " b = torch.sin(a)\n", " c = 2 * b\n", " d = c + 1\n", " out = d.sum()\n", "```\n", "\n", "Adding a constant, as we did to compute ``d``, does not change the\n", "derivative. That leaves $c = 2 * b = 2 * sin(a)$, the derivative\n", "of which should be $2 * cos(a)$. Looking at the graph above,\n", "that’s just what we see.\n", "\n", "Be aware than only *leaf nodes* of the computation have their gradients\n", "computed. If you tried, for example, ``print(c.grad)`` you’d get back\n", "``None``. In this simple example, only the input is a leaf node, so only\n", "it has gradients computed." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/s6/yqpth6w93j5d3mshrxfc32pw0000gn/T/ipykernel_20142/3425888651.py:1: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1670525849783/work/build/aten/src/ATen/core/TensorBody.h:485.)\n", " print(c.grad)\n" ] } ], "source": [ "print(c.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Building Models with PyTorch\n", "============================\n", "\n", "``torch.nn.Module`` and ``torch.nn.Parameter``\n", "----------------------------------------------\n", "\n", "``torch.nn.Module`` is the PyTorch base class meant\n", "to encapsulate behaviors specific to PyTorch Models and their\n", "components.\n", "\n", "One important behavior of ``torch.nn.Module`` is registering parameters.\n", "If a particular ``Module`` subclass has learning weights, these weights\n", "are expressed as instances of ``torch.nn.Parameter``. The ``Parameter``\n", "class is a subclass of ``torch.Tensor``, with the special behavior that\n", "when they are assigned as attributes of a ``Module``, they are added to\n", "the list of that modules parameters. These parameters may be accessed\n", "through the ``parameters()`` method on the ``Module`` class.\n", "\n", "As a simple example, here’s a very simple model with two linear layers\n", "and an activation function. We’ll create an instance of it and ask it to\n", "report on its parameters:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2022-01-20T03:46:13.199152Z", "iopub.status.busy": "2022-01-20T03:46:13.198516Z", "iopub.status.idle": "2022-01-20T03:46:13.219074Z", "shell.execute_reply": "2022-01-20T03:46:13.21846Z", "shell.execute_reply.started": "2022-01-20T03:46:13.199096Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The model:\n", "TinyModel(\n", " (linear1): Linear(in_features=100, out_features=200, bias=True)\n", " (activation): ReLU()\n", " (linear2): Linear(in_features=200, out_features=10, bias=True)\n", " (softmax): Softmax(dim=None)\n", ")\n", "\n", "\n", "Just one layer:\n", "Linear(in_features=200, out_features=10, bias=True)\n", "\n", "\n", "Model params:\n", "linear1.weight torch.Size([200, 100])\n", "linear1.bias torch.Size([200])\n", "linear2.weight torch.Size([10, 200])\n", "linear2.bias torch.Size([10])\n", "\n", "\n", "Layer params:\n", "Parameter containing:\n", "tensor([[-0.0569, -0.0149, -0.0570, ..., -0.0580, 0.0173, 0.0384],\n", " [ 0.0461, -0.0352, -0.0688, ..., 0.0234, -0.0590, -0.0442],\n", " [ 0.0644, -0.0121, -0.0230, ..., 0.0022, 0.0334, -0.0081],\n", " ...,\n", " [ 0.0630, 0.0589, -0.0343, ..., 0.0355, -0.0558, 0.0173],\n", " [-0.0115, -0.0556, -0.0164, ..., -0.0494, -0.0416, -0.0285],\n", " [-0.0606, -0.0218, 0.0501, ..., -0.0519, -0.0179, 0.0639]],\n", " requires_grad=True)\n", "Parameter containing:\n", "tensor([-0.0453, -0.0434, 0.0570, -0.0346, -0.0043, 0.0112, -0.0636, 0.0130,\n", " -0.0035, -0.0433], requires_grad=True)\n" ] } ], "source": [ "import torch\n", "\n", "class TinyModel(torch.nn.Module):\n", " \n", " def __init__(self):\n", " super(TinyModel, self).__init__()\n", " \n", " self.linear1 = torch.nn.Linear(100, 200)\n", " self.activation = torch.nn.ReLU()\n", " self.linear2 = torch.nn.Linear(200, 10)\n", " self.softmax = torch.nn.Softmax()\n", " \n", " def forward(self, x):\n", " x = self.linear1(x)\n", " x = self.activation(x)\n", " x = self.linear2(x)\n", " x = self.softmax(x)\n", " return x\n", "\n", "tinymodel = TinyModel()\n", "\n", "print('The model:')\n", "print(tinymodel)\n", "\n", "print('\\n\\nJust one layer:')\n", "print(tinymodel.linear2)\n", "\n", "print('\\n\\nModel params:')\n", "for name, param in tinymodel.named_parameters():\n", " print(name, param.size())\n", "\n", "print('\\n\\nLayer params:')\n", "for param in tinymodel.linear2.parameters():\n", " print(param)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows the fundamental structure of a PyTorch model: there is an\n", "``__init__()`` method that defines the layers and other components of a\n", "model, and a ``forward()`` method where the computation gets done. Note\n", "that we can print the model, or any of its submodules, to learn about\n", "its structure.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other Resources\n", "---------------\n", "\n", "- Docs on the `data\n", " utilities `__, including\n", " Dataset and DataLoader, at pytorch.org\n", "- A `note on the use of pinned\n", " memory `__\n", " for GPU training\n", "- Documentation on the datasets available in\n", " `TorchVision `__,\n", " `TorchText `__, and\n", " `TorchAudio `__\n", "- Documentation on the `loss\n", " functions `__\n", " available in PyTorch\n", "- Documentation on the `torch.optim\n", " package `__, which\n", " includes optimizers and related tools, such as learning rate\n", " scheduling\n", "- A detailed `tutorial on saving and loading\n", " models `__\n", "- The `Tutorials section of\n", " pytorch.org `__ contains tutorials on\n", " a broad variety of training tasks, including classification in\n", " different domains, generative adversarial networks, reinforcement\n", " learning, and more \n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "vscode": { "interpreter": { "hash": "376d1a70a9f5a517edf0ae82d6ab8acb1238048da669e4153e9581473fc054b3" } } }, "nbformat": 4, "nbformat_minor": 4 }