# Gradients with PyTorch¶

## Tensors with Gradients¶

### Creating Tensors with Gradients¶

- Allows accumulation of gradients

Method 1: Create tensor with gradients

It is very similar to creating a tensor, all you need to do is to add an additional argument.

import torch

a = torch.ones((2, 2), requires_grad=True) a

tensor([[ 1., 1.], [ 1., 1.]])

Check if tensor requires gradients

This should return True otherwise you've not done it right.

a.requires_grad

```
True
```

Method 2: Create tensor with gradients

This allows you to create a tensor as usual then an additional line to allow it to accumulate gradients.

# Normal way of creating gradients a = torch.ones((2, 2)) # Requires gradient a.requires_grad_() # Check if requires gradient a.requires_grad

```
True
```

A tensor without gradients just for comparison

If you do not do either of the methods above, you'll realize you will get False for checking for gradients.

# Not a variable no_gradient = torch.ones(2, 2)

no_gradient.requires_grad

```
False
```

Tensor with gradients addition operation

# Behaves similarly to tensors b = torch.ones((2, 2), requires_grad=True) print(a + b) print(torch.add(a, b))

tensor([[ 2., 2.], [ 2., 2.]]) tensor([[ 2., 2.], [ 2., 2.]])

Tensor with gradients multiplication operation

As usual, the operations we learnt previously for tensors apply for tensors with gradients. Feel free to try divisions, mean or standard deviation!

print(a * b) print(torch.mul(a, b))

tensor([[ 1., 1.], [ 1., 1.]]) tensor([[ 1., 1.], [ 1., 1.]])

### Manually and Automatically Calculating Gradients¶

**What exactly is requires_grad?**
- Allows calculation of gradients w.r.t. the tensor that all allows gradients accumulation

Create tensor of size 2x1 filled with 1's that requires gradient

x = torch.ones(2, requires_grad=True) x

tensor([ 1., 1.])

Simple linear equation with x tensor created

We should get a value of 20 by replicating this simple equation

y = 5 * (x + 1) ** 2 y

tensor([ 20., 20.])

Simple equation with y tensor

Backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable

Let's reduce y to a scalar then...

As you can see above, we've a tensor filled with 20's, so average them would return 20

o = (1/2) * torch.sum(y) o

tensor(20.)

Calculating first derivative

**Recap**: y_i = 5(x_i+1)^2

`y`

equation**Recap**: o = \frac{1}{2}\sum_i y_i

`o`

equation**Substitute**: o = \frac{1}{2} \sum_i 5(x_i+1)^2

`y`

into `o`

equationWe should expect to get 10, and it's so simple to do this with PyTorch with the following line...

Get first derivative:

o.backward()

Print out first derivative:

x.grad

tensor([ 10., 10.])

If x requires gradient and you create new objects with it, you get all gradients

print(x.requires_grad) print(y.requires_grad) print(o.requires_grad)

True True True

# Summary¶

We've learnt to...

Success

- Tensor with Gradients
- Wraps a tensor for gradient accumulation

- Gradients
- Define original equation
- Substitute equation with
`x`

values - Reduce to scalar output,
`o`

through`mean`

- Calculate gradients with
`o.backward()`

- Then access gradients of the
`x`

tensor with`requires_grad`

through`x.grad`

## Citation¶

If you have found these useful in your research, presentations, school work, projects or workshops, feel free to cite using this DOI.