Skip to content

Gradients with PyTorch

Run Jupyter Notebook

You can run the code for this section in this jupyter notebook link.

Tensors with Gradients

Creating Tensors with Gradients

  • Allows accumulation of gradients

Method 1: Create tensor with gradients

It is very similar to creating a tensor, all you need to do is to add an additional argument.

import torch
a = torch.ones((2, 2), requires_grad=True)
a
tensor([[ 1.,  1.],
        [ 1.,  1.]])

Check if tensor requires gradients

This should return True otherwise you've not done it right.

a.requires_grad

True

Method 2: Create tensor with gradients

This allows you to create a tensor as usual then an additional line to allow it to accumulate gradients.

# Normal way of creating gradients
a = torch.ones((2, 2))

# Requires gradient
a.requires_grad_()

# Check if requires gradient
a.requires_grad
True

A tensor without gradients just for comparison

If you do not do either of the methods above, you'll realize you will get False for checking for gradients.

# Not a variable
no_gradient = torch.ones(2, 2)

no_gradient.requires_grad
False

Tensor with gradients addition operation

# Behaves similarly to tensors
b = torch.ones((2, 2), requires_grad=True)
print(a + b)
print(torch.add(a, b))
tensor([[ 2.,  2.],
        [ 2.,  2.]])

tensor([[ 2.,  2.],
        [ 2.,  2.]])

Tensor with gradients multiplication operation

As usual, the operations we learnt previously for tensors apply for tensors with gradients. Feel free to try divisions, mean or standard deviation!

print(a * b)
print(torch.mul(a, b))

tensor([[ 1.,  1.],
        [ 1.,  1.]])
tensor([[ 1.,  1.],
        [ 1.,  1.]])

Manually and Automatically Calculating Gradients

What exactly is requires_grad? - Allows calculation of gradients w.r.t. the tensor that all allows gradients accumulation

\[y_i = 5(x_i+1)^2\]

Create tensor of size 2x1 filled with 1's that requires gradient

x = torch.ones(2, requires_grad=True)
x
tensor([ 1.,  1.])

Simple linear equation with x tensor created

\[y_i\bigr\rvert_{x_i=1} = 5(1 + 1)^2 = 5(2)^2 = 5(4) = 20\]

We should get a value of 20 by replicating this simple equation

y = 5 * (x + 1) ** 2
y
tensor([ 20.,  20.])

Simple equation with y tensor

Backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable

Let's reduce y to a scalar then...

\[o = \frac{1}{2}\sum_i y_i\]

As you can see above, we've a tensor filled with 20's, so average them would return 20

o = (1/2) * torch.sum(y)
o
tensor(20.)

Calculating first derivative

Recap y equation: \(y_i = 5(x_i+1)^2\)

Recap o equation: \(o = \frac{1}{2}\sum_i y_i\)

Substitute y into o equation: \(o = \frac{1}{2} \sum_i 5(x_i+1)^2\)

\[\frac{\partial o}{\partial x_i} = \frac{1}{2}[10(x_i+1)]\]
\[\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{1}{2}[10(1 + 1)] = \frac{10}{2}(2) = 10\]

We should expect to get 10, and it's so simple to do this with PyTorch with the following line...

Get first derivative:

o.backward()

Print out first derivative:

x.grad
tensor([ 10.,  10.])

If x requires gradient and you create new objects with it, you get all gradients

print(x.requires_grad)
print(y.requires_grad)
print(o.requires_grad)
True
True
True

Summary

We've learnt to...

Success

  • Tensor with Gradients
    • Wraps a tensor for gradient accumulation
  • Gradients
    • Define original equation
    • Substitute equation with x values
    • Reduce to scalar output, o through mean
    • Calculate gradients with o.backward()
    • Then access gradients of the x tensor with requires_grad through x.grad

Citation

If you have found these useful in your research, presentations, school work, projects or workshops, feel free to cite using this DOI.

DOI

Comments