Skip to content
Snippets Groups Projects
Commit b47b2617 authored by Maxxhim's avatar Maxxhim
Browse files

add files

parents
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# TP 2: Computational Graph
During last TP we asked you to make the backward pass, implementing all the derivatives needed.
As you can expect, doing this every time you make a new model is a little redundant.
The ML libraries allow you to implement models by focusing only on the forward pass, they construct than a graph and compute derivatives from the bottom to the leafs of the graph. This graph is known as "computational graph".
The aim of this TP is to build a computational graph inspired by [pytorch](https://pytorch.org/), and than test it with a simple model (MLP).
The construction of the model, of the loss and the optimizer are also inspired by pytorch.
The transition to pytorch should be easy in the future.
**Disclaimer** This code is inspired by how users use Pytorch, it doesn't replace Pytorch and the implemtatin differ form Pytorch. The only goal of this TP is to give an intuition how pytorch and the computational graph work, before starting using it!
## The Computational Graph
The computational graph is a graph that specifies the operations done to get a given value.
If C = A + B, the graph will look like:
```
A
\
\
+ -- C
/
/
B
```
or if F = C * D and E = log(F), the graph will look like:
```
A
\
\
+ -- C
/ \
/ \
B * -- F -- log -- E
/
/
D
```
As you can see, the graph is build during the "forward pass" and it is easy to see the gradients flow (start on E and end on the leafs A, B and D.
### Variable
To build this graph in our code, we introduce an object called Variable. This object will look like an array in numpy, it will content data, and have methods like mean, sum, t, etc. Variable has also a grad, a grad__fn, and children field.
* Variable.grad: store the gradients during the backward pass for the given variable (same shape as Variabale.shape)
* Variable.grad_fn: store the function that has built this Variable (addition, multiplication, etc)
* Variable.children: list of all the operations where the Variable was used. We need Variable.children during backward pass to know if all the children have propagate thier gradients before the current variable compute in turn its gradients.
### Functions
Functions cointains all the oprations we can use in your code. Each operation needs a forward and a backward method.
The forward method is simply the __init__ method where you compute and store the result of the operation.
There is 2 backward methods:
* backward (general): Inherited from the _Function parent class, it calls the second _backward method (see below) and updates the gradients of the variables used to build the current one.
* _backward (specific): Is specific to each operation, computes the gradients for its parents. The derivatives are computed according to the specific operation.
### Functional
Is simply an interface for all functions defined in functions.
You don't need to touch this file. but take a look at it, because it gives you all the functions you have add.
If you don't use a function from this interface, you will not be able to construct the graph and ptopagate trough it.
*|!\* Even the standart operation you can use directly: +, -, *, / use the operation of functional! Take a look at Variable.__add__, Variable.__sub__, Variable.__mul__, Variable.__truediv__ if you doubt.
## What we ask you to do:
Complete the *Fill here* in the following cells of this notebook.
Use latex notation to add your formulas.
Once you have filled the missing parts of an operation, go to function.py and implement the missing parts of it. Tests are provided by saving the gradients using pytorch on the same conditions (same arrays, gradients cleared between operations).
%% Cell type:markdown id: tags:
Before starting with the derivatives, let's take a look at variable.py.
The majority of this class is provided to you. We **ask you to describe a little bit this class in your report,
mostly the methods backward and update_grad**. Than fill in the missing parts in this two methods.
%% Cell type:markdown id: tags:
We will now create the Variables you will use to test your implementation.
%% Cell type:code id: tags:
``` python
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
from functional import F
from variable import Variable
def display_variable_information(name, var):
print("\nName:", name)
print("Data:\n", var.data)
print("Shape:", var.shape)
print("Grad:\n", var.grad)
print("Grad_fn:", var.grad_fn)
# scalars
a = Variable([4.5])
b = Variable([6.78])
# arrays
C = Variable([[1.73, 2.83], [5.13, 8.43], [5.13, 8.43]])
D = Variable([[3.57, 4.96], [2.06, 1.94], [5.13, 8.43]])
print("Variables Informations:")
# uncomment if you want
display_variable_information("a", a)
#display_variable_information("b", b)
#display_variable_information("c", C)
#display_variable_information("d", D)
```
%% Output
Variables Informations:
Name: a
Data:
[[4.5]]
Shape: (1, 1)
Grad:
None
Grad_fn: None
%% Cell type:code id: tags:
``` python
print(C)
print(C[0,1])
C[0,0] = 10
print(C)
m = C + D
m = F.add(C, D)
k = C.sum()
print(k.grad_fn)
print(k)
```
%% Output
Variable([[10. 2.83],
[ 5.13 8.43],
[ 5.13 8.43]])
Variable([[2.83]])
Variable([[10. 2.83],
[ 5.13 8.43],
[ 5.13 8.43]])
None
Variable([[39.95]])
%% Cell type:code id: tags:
``` python
from check_values import check_result_and_grads
def clear_variables(*argv):
"""Clear all Variables passed in arguments."""
for var in argv:
var.grad = None
var.grad_fn = None
var.children = []
var.retained_values = {}
```
%% Cell type:markdown id: tags:
# Addition
**Given to you as example**
**Inputs**: $x, y$
**Operation**: $f(x,y) = x + y$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = 1$
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = \frac{\partial}{\partial f} \cdot 1 = \frac{\partial}{\partial f}$
+ **w.r.t.** $y$:
$\frac{\partial f}{\partial y} = 1$
By chain rule:
$\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = \frac{\partial}{\partial f} \cdot 1 = \frac{\partial}{\partial f}$
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = a + b
res_scalar.backward()
check_result_and_grads(res_scalar, a, b, operation="addition", itype="scalar")
res_array = C + D
res_array.mean().backward()
check_result_and_grads(res_array, C, C, operation="addition", itype="array")
```
%% Cell type:markdown id: tags:
# Subtraction
**Inputs**: $x, y$
**Operation**: $f(x,y) = x - y$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+ **w.r.t.** $y$:
$\frac{\partial f}{\partial y} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = a - b
res_scalar.backward()
check_result_and_grads(res_scalar, a, b, operation="subtraction", itype="scalar")
res_array = C - D
res_array.mean().backward()
check_result_and_grads(res_array, C, D, operation="subtraction", itype="array")
```
%% Cell type:markdown id: tags:
# Multiplication
**Inputs**: $x, y$
**Operation**: $f(x,y) = x * y$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+ **w.r.t.** $y$:
$\frac{\partial f}{\partial y} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = a * b
res_scalar.backward()
check_result_and_grads(res_scalar, a, b, operation="multiplication", itype="scalar")
res_array = C * D
res_array.mean().backward()
check_result_and_grads(res_array, C, D, operation="multiplication", itype="array")
```
%% Cell type:markdown id: tags:
# Division
**Inputs**: $x, y$
**Operation**: $f(x,y) = x / y$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+ **w.r.t.** $y$:
$\frac{\partial f}{\partial y} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = a / b
res_scalar.backward()
check_result_and_grads(res_scalar, a, b, operation="division", itype="scalar")
res_array = C / D
res_array.mean().backward()
check_result_and_grads(res_array, C, D, operation="division", itype="array")
```
%% Cell type:markdown id: tags:
# Matrix Multiplication
**Inputs**: $x, y$
**Operation**: $f(x,y) = x.dot(y)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+ **w.r.t.** $y$:
$\frac{\partial f}{\partial y} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_array = F.matmul(C.t(), D)
res_array.mean().backward()
check_result_and_grads(res_array, C, D, operation="matMul", itype="array")
```
%% Cell type:markdown id: tags:
# Exponential
**Inputs**: $x$
**Operation**: $f(x) = e^x$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.exp(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="exp", itype="scalar")
res_array = F.exp(C)
res_array.mean().backward()
check_result_and_grads(res_array, C, operation="exp", itype="array")
```
%% Cell type:markdown id: tags:
# Natural Logarithm
**Inputs**: $x$
**Operation**: $f(x) = ln(x)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.log(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="log", itype="scalar")
res_array = F.log(C)
res_array.mean().backward()
check_result_and_grads(res_array, C, operation="log", itype="array")
```
%% Cell type:markdown id: tags:
# Sinus
**Inputs**: $x$
**Operation**: $f(x) = \sin(x)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.sin(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="sin", itype="scalar")
res_array = F.sin(C)
res_array.mean().backward()
check_result_and_grads(res_array, C, operation="sin", itype="array")
```
%% Cell type:markdown id: tags:
# Cosinus
**Inputs**: $x$
**Operation**: $f(x) = \cos(x)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.cos(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="cos", itype="scalar")
res_array = F.cos(C)
res_array.mean().backward()
check_result_and_grads(res_array, C, operation="cos", itype="array")
```
%% Cell type:markdown id: tags:
# Tangent
**Inputs**: $x$
**Operation**: $f(x) = \tan(x)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.tan(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="tan", itype="scalar")
res_array = F.tan(C)
res_array.mean().backward()
check_result_and_grads(res_array, C, operation="tan", itype="array")
```
%% Cell type:markdown id: tags:
# Sigmoid
**Inputs**: $x$
**Operation**: $f(x) = \frac{1}{1 + e^{-x}}$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.sigmoid(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="sigmoid", itype="scalar")
res_array = F.sigmoid(C)
res_array[0,0].backward()
check_result_and_grads(res_array, C, operation="sigmoid", itype="array")
```
%% Cell type:markdown id: tags:
# Tanh
**Inputs**: $x$
**Operation**: $f(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.tanh(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="tanh", itype="scalar")
res_array = F.tanh(C)
res_array[0,0].backward()
check_result_and_grads(res_array, C, operation="tanh", itype="array")
```
%% Cell type:markdown id: tags:
# ReLu
**Inputs**: $x$
**Operation**: $f(x) = \max(0, x)$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_scalar = F.relu(a)
res_scalar.backward()
check_result_and_grads(res_scalar, a, operation="relu", itype="scalar")
res_array = F.relu(C)
res_array[0,0].backward()
check_result_and_grads(res_array, C, operation="relu", itype="array")
```
%% Cell type:markdown id: tags:
# Softmax
***The derivative of the softmax is not trivial to computein a vectorized manner, I have done the exercice and give you my implementation of the softmax, feel free to ask me questions about it.***
***You have to fill bellow the formulas!***
**Inputs**: $x$
**Operation**: $f(x) = \frac{e^{x_i}}{\sum_i e^{x_i}}$
**Derivatives**:
+ **w.r.t.** $x$:
$\frac{\partial f}{\partial x} = ...$ *Fill here*
By chain rule:
$\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
%% Cell type:code id: tags:
``` python
clear_variables(a, b, C, D)
res_array = F.softmax(C, dim=0)
res_array[0,0].backward()
check_result_and_grads(res_array, C, operation="softmax", itype="array")
```
%% Cell type:markdown id: tags:
# Cross Entropy Loss
For the Cross entropy loss we have used the trick from pytorch that implements direclty the cross entropy loss with the softmax for more stability.
Take a look [here](https://pytorch.org/docs/stable/nn.html#crossentropyloss).
You don't have to implement it but make sure you understand what append here, comment it on your report.
%% Cell type:code id: tags:
``` python
import nn as nn
X = Variable([[0.1711, 0.5140, 0.3149], [0.1359, 0.4985, 0.3656], [0.0275, 0.5467, 0.4258]])
y = Variable([1, 2, 0])
cel = nn.CrossEntropyLoss()
loss = cel(X, y)
loss.backward()
check_result_and_grads(loss, X, operation="CEL", itype="array")
```
%% Cell type:markdown id: tags:
## An MLP as example
Now that you have all the components filled for the graph computational, you will need some additional steps to make a MLP trainable.
You have to complete the missing parts in ***nn.py*** and in ***optim.py***.
First we will generate a simple dataset, each color represents each class.
As you can see, we have 3 classes and each sample has 2 features (the cordinates).
%% Cell type:code id: tags:
``` python
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
# Generate a dataset and plot it# Gener
N = 500
np.random.seed(0)
X, y = sklearn.datasets.make_blobs(N)
plt.scatter(X[:,0], X[:,1], s=40, c=y)
X_train = X[:350]
y_train = y[:350]
X_test = X[350:]
y_test = y[350:]
```
%% Cell type:markdown id: tags:
## Define and train the MLP
Follow the todos here and complete the missing parts in ***nn.py*** and ***optim.py***.
List of things you have to do. You can put a 'x' inside the [ ] when you have done it!
Example: * [x] Example done.
* nn.py:
* Linear:
* in init
* [ ] Initialize the weights.
* [ ] Initialize the bias.
* in call:
* [ ] Implement the linear transformation.
* [ ] Add the bias.
* optim.py:
* SGD:
* in step:
* [ ] Implement the SGD update mechanism.
%% Cell type:code id: tags:
``` python
from functional import F
from variable import Variable
import nn as nn
from optim import SGD
np.random.seed(13)
class MLP(nn.Module):
def __init__(self, in_features, hidden_size, out_features):
#######################################################################
# TODO: define 2 linear layers, one that takes the inputs and outputs
# values with hidden_size
# and the second one that takes the values from the first layer and
# outputs the scores.
# implement Linear in nn.py before, you need it here.
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def forward(self, X):
output = None
#######################################################################
# TODO: define your forward pass as follow
# 1) y = linear(inputs)
# 2) y_nl = relu(y)
# 3) output = linear(y_nl)
# softmax not needed because it's already in cross entropy
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
return output
model = MLP(2, 100, 3)
optimizer = SGD(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()
epochs = 1000
batch_size = 50
history_losses = []
history_acc = []
for epoch in range(1, epochs+1):
model.train()
indices = range(X_train.shape[0])
train_losses = []
train_acc = []
for iteration in range(X_train.shape[0]//batch_size):
batch_indices = np.random.choice(indices, batch_size)
indices = list(set(indices) - set(batch_indices))
X_batch = Variable(X_train[batch_indices])
y_batch = Variable(y_train[batch_indices])
#######################################################################
# TODO: Add here all the elements you need to train your model for each
# batch.
#######################################################################
# you need to clear out the gradients for all the parameters
pass
# compute the forward pass
pass
# compute tht loss
pass
# compute the backward pass
pass
# optimize
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
# keep loss
train_losses.append(loss.item())
# keep accuracy
y_pred = np.argmax(outputs.data, axis=1)
train_acc.append((y_pred[:, None] == y_batch.data).mean())
history_losses.append(np.mean(train_losses))
history_acc.append(np.mean(train_acc))
# mod allow us to only display in a logaritmic way
mod = 10**np.floor(np.log10(epoch))
if epoch % mod == 0:
print("Epoch {:>3}/{:>3}, loss {:.4f}, acc {:.2f}".format(epoch, epochs, history_losses[-1], history_acc[-1]))
```
%% Cell type:markdown id: tags:
## Visualisation
Now you can visualise for fun the loss and the accuracy of your model during trainning and get the final accuracy.
%% Cell type:code id: tags:
``` python
plt.plot(history_losses, c="r", label="loss")
plt.legend()
```
%% Cell type:code id: tags:
``` python
plt.plot(history_acc, c="g", label="Accuracy")
plt.legend()
```
%% Cell type:markdown id: tags:
You should get ~90% of accuracy with the test set.
%% Cell type:code id: tags:
``` python
model.eval()
X_test_var = Variable(X_test)
outputs = model(X_test_var)
y_pred = np.argmax(outputs.data, axis=1)
acc = (y_pred == y_test).mean()
print("Accuracy on test set: {:.2f}".format(acc))
```
File added
"""
Script to check if the gradients computed are correct.
TRUE_VALUES come from pytorch backward with same data as tested in this exercice.
Author: Joao A. Candido Ramos
"""
import numpy as np
TRUE_VALUES = {
'addition': {
'scalar': {
'a': [1.0],
'b': [1.0],
'res': [11.280000686645508]
},
'array': {
'a': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
'b': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
'res': [[5.300000190734863, 7.789999961853027], [7.190000057220459, 10.370000839233398], [10.260000228881836, 16.860000610351562]]
}
},
'subtraction': {
'scalar': {
'a': [1.0],
'b': [-1.0],
'res': [-2.2800002098083496]
},
'array': {
'a': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
'b': [[-0.1666666716337204, -0.1666666716337204], [-0.1666666716337204, -0.1666666716337204], [-0.1666666716337204, -0.1666666716337204]],
'res': [[-1.8399999141693115, -2.130000114440918], [3.070000171661377, 6.490000247955322], [0.0, 0.0]]
}
},
'multiplication': {
'scalar': {
'a': [6.78000020980835],
'b': [4.5],
'res': [30.510000228881836]
},
'array': {
'a': [[0.5950000286102295, 0.8266667127609253], [0.34333333373069763, 0.32333335280418396], [0.8550000190734863, 1.40500009059906]],
'b': [[0.28833335638046265, 0.4716666638851166], [0.8550000190734863, 1.40500009059906], [0.8550000190734863, 1.40500009059906]],
'res': [[6.17609977722168, 14.036799430847168], [10.56779956817627, 16.35420036315918], [26.3169002532959, 71.06490325927734]]
}
},
'division': {
'scalar': {
'a': [0.14749261736869812],
'b': [-0.09789332747459412],
'res': [0.6637167930603027]
},
'array': {
'a': [[0.046685341745615005, 0.03360215201973915], [0.08090615272521973, 0.08591065555810928], [0.0324886292219162, 0.019770659506320953]],
'b': [[-0.02262343093752861, -0.019172193482518196], [-0.20147988200187683, -0.3733128011226654], [-0.0324886292219162, -0.019770661368966103]],
'res': [[0.48459383845329285, 0.5705645084381104], [2.4902913570404053, 4.34536075592041], [1.0, 1.0]]
}
},
'matMul': {
'array': {
'a': [[2.132499933242798, 2.132499933242798], [1.0, 1.0], [3.390000104904175, 3.390000104904175]],
'b': [[1.1399999856948853, 1.1399999856948853], [3.390000104904175, 3.390000104904175], [3.390000104904175, 3.390000104904175]],
'res': [[43.06079864501953, 61.77890396118164], [70.71480560302734, 101.45590209960938]]
}
},
'exp': {
'scalar': {
'a': [90.01712799072266],
'res': [90.01712799072266]
},
'array': {
'a': [[0.9401090145111084, 2.8242433071136475], [28.169523239135742, 763.750244140625], [28.169523239135742, 763.750244140625]],
'res': [[5.64065408706665, 16.945459365844727], [169.0171356201172, 4582.50146484375], [169.0171356201172, 4582.50146484375]]
}
},
'log': {
'scalar': {
'a': [0.2222222238779068],
'res': [1.504077434539795]
},
'array': {
'a': [[0.09633911401033401, 0.05889282003045082], [0.0324886292219162, 0.019770659506320953], [0.0324886292219162, 0.019770659506320953]],
'res': [[0.548121452331543, 1.0402766466140747], [1.6351057291030884, 2.1317968368530273], [1.6351057291030884, 2.1317968368530273]]
}
},
'sin': {
'scalar': {
'a': [-0.2107958048582077],
'res': [-0.9775301218032837]
},
'array': {
'a': [[-0.026422005146741867, -0.15864108502864838], [0.06759634613990784, -0.0907815620303154], [0.06759634613990784, -0.0907815620303154]],
'res': [[0.9873538613319397, 0.30657505989074707], [-0.9140604138374329, 0.8386378884315491], [-0.9140604138374329, 0.8386378884315491]]
}
},
'cos': {
'scalar': {
'a': [0.9775301218032837],
'res': [-0.2107958048582077]
},
'array': {
'a': [[-0.16455897688865662, -0.05109584331512451], [0.15234340727329254, -0.13977298140525818], [0.15234340727329254, -0.13977298140525818]],
'res': [[-0.1585320234298706, -0.9518464803695679], [0.40557804703712463, -0.5446893572807312], [0.40557804703712463, -0.5446893572807312]]
}
},
'tan': {
'scalar': {
'a': [22.50484848022461],
'res': [4.637331962585449]
},
'array': {
'a': [[6.6315460205078125, 0.18395641446113586], [1.0132110118865967, 0.5617601871490479], [1.0132110118865967, 0.5617601871490479]],
'res': [[-6.2281036376953125, -0.32208457589149475], [-2.253722667694092, -1.5396625995635986], [-2.253722667694092, -1.5396625995635986]]
}
},
'sigmoid': {
'scalar': {
'a': [0.010866211727261543],
'res': [0.9890130758285522]
},
'array': {
'a': [[0.12791094183921814, 0.0], [0.0, 0.0], [0.0, 0.0]],
'res': [[0.8494124412536621, 0.9442755579948425], [0.9941182136535645, 0.9997817873954773], [0.9941182136535645, 0.9997817873954773]]
}
},
'tanh': {
'scalar': {
'a': [0.000493466854095459],
'res': [0.9997532367706299]
},
'array': {
'a': [[0.11817395687103271, 0.0], [0.0, 0.0], [0.0, 0.0]],
'res': [[0.9390559196472168, 0.9930591583251953], [0.9999299645423889, 0.9999998807907104], [0.9999299645423889, 0.9999998807907104]]
}
},
'relu': {
'scalar': {
'a': [1.0],
'res': [4.5]
},
'array': {
'a': [[1.0, 0.0], [0.0, 0.0], [0.0, 0.0]],
'res': [[1.7300000190734863, 2.8299999237060547], [5.130000114440918, 8.430000305175781], [5.130000114440918, 8.430000305175781]]
}
},
'softmax': {
'array': {
'a': [[0.016143381595611572, 0.0], [-0.008071689866483212, 0.0], [-0.008071689866483212, 0.0]],
'res': [[0.016412759199738503, 0.0018455189419910312], [0.49179360270500183, 0.4990772306919098], [0.49179360270500183, 0.4990772306919098]]
}
},
'CEL': {
'array': {
'a': [[0.09353624284267426, -0.20153817534446716, 0.10800191760063171], [0.09020509570837021, 0.1296302080154419, -0.21983526647090912], [-0.25339677929878235, 0.13434799015522003, 0.11904877424240112]],
'res': [[1.14438696]]
}
},
}
def get_check_msg(array1, array2):
msg = ""
array2 = np.array(array2, ndmin=2)
# verify shape
if array1.shape == array2.shape:
msg += "\n\t\tShape: OK"
else:
msg += "\n\t\tShape: NOT OK"
# verify content
if np.isclose(array1, array2, atol=1e-07).all():
msg += "\n\t\tContent: OK"
else:
msg += "\n\t\tContent: NOT OK"
return msg
def check_result_and_grads(res, a, b=None, operation="", itype=""):
msg = ""
if b is not None:
msg += "\nCheck operation {}({}, {}):".format(
operation, "a" if itype == "scalar" else "C", "b" if itype == "scalar" else "D")
else:
msg += "\nCheck operation {}({}):".format(
operation, "a" if itype == "scalar" else "C")
msg += "\n\tResult:"
msg += get_check_msg(res.data, TRUE_VALUES[operation][itype]["res"])
msg += "\n\tGradients of {}:".format("a" if itype == "scalar" else "C")
msg += get_check_msg(a.grad, TRUE_VALUES[operation][itype]["a"])
if b is not None:
msg += "\n\tGradients of {}:".format("b" if itype == "scalar" else "D")
msg += get_check_msg(b.grad, TRUE_VALUES[operation][itype]["b"])
print(msg)
"""
Interface for all the functions implemented in functions.py.
Author: Joao A. Candido Ramos
"""
from functions import *
class Functional:
# operations
def add(self, x, y):
return Add(x, y).forward()
def sub(self, x, y):
return Sub(x, y).forward()
def mul(self, x, y):
return Mul(x, y).forward()
def matmul(self, x, y):
return MatMul(x, y).forward()
def div(self, x, y):
return Div(x, y).forward()
def exp(self, x):
return Exp(x).forward()
def log(self, x):
return Log(x).forward()
def sin(self, x):
return Sin(x).forward()
def cos(self, x):
return Cos(x).forward()
def tan(self, x):
return Tan(x).forward()
# activations
def sigmoid(self, x):
return Sigmoid(x).forward()
def tanh(self, x):
return Tanh(x).forward()
def relu(self, x):
return ReLu(x).forward()
def softmax(self, x, dim):
return Softmax(x, dim).forward()
F = Functional()
if __name__ == "__main__":
pass
"""
Definition of different functions with forward and backward.
Author: Joao A. Candido Ramos
"""
import numpy as np
from variable import Variable
class _Function:
def __init__(self, name, x, y=None):
self.name = name
self.x = x
self.y = y
def forward(self):
self.x.add_child(self)
if self.y is not None:
self.y.add_child(self)
result_variable = Variable(self.result)
result_variable.grad_fn = self
return result_variable
def backward(self, grad, retain_graph):
self._backward(grad)
self.x.update_grad(self.dx, child=self, retain_graph=retain_graph)
if self.y is not None:
self.y.update_grad(self.dy, child=self, retain_graph=retain_graph)
self.x.backward(retain_graph=retain_graph)
if self.y is not None:
self.y.backward(retain_graph=retain_graph)
class Add(_Function):
"""Adition of two elements."""
def __init__(self, x, y):
super().__init__("Add", x, y)
self.result = x.data + y.data
def _backward(self, grad):
self.dx = grad
self.dy = grad
class Sub(_Function):
"""Substraction of two elements."""
def __init__(self, x, y):
super().__init__("Sub", x, y)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
self.dy = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Mul(_Function):
"""Element-wise multiplication."""
def __init__(self, x, y):
super().__init__("Mul", x, y)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
self.dy = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Div(_Function):
"""Element-wise divide."""
def __init__(self, x, y):
super().__init__("Div", x, y)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
self.dy = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class MatMul(_Function):
"""Matrice multiplication."""
def __init__(self, x, y):
super().__init__("MatMul", x, y)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
self.dy = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Exp(_Function):
"""Exponential function."""
def __init__(self, x):
super().__init__("Exp", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Log(_Function):
"""Logarithmic function."""
def __init__(self, x):
super().__init__("Exp", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Sin(_Function):
"""Sinus function."""
def __init__(self, x):
super().__init__("Sin", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Cos(_Function):
"""Cosinus function."""
def __init__(self, x):
super().__init__("Cos", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Tan(_Function):
"""Tangent function."""
def __init__(self, x):
super().__init__("Tan", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
# ACTIVATIONS
class Sigmoid(_Function):
"""Sigmoid."""
def __init__(self, x):
super().__init__("Sigmoid", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Tanh(_Function):
"""Tanh."""
def __init__(self, x):
super().__init__("Tanh", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
class Softmax(_Function):
"""Softmax."""
def __init__(self, x, dim):
super().__init__("Softmax", x)
self.dim = dim
x_norm = x.data - np.max(x.data)
exp = np.exp(x_norm)
self.result = exp / np.sum(exp, axis=dim, keepdims=True)
def _backward(self, grad):
# q_i(delta_{i,j} - q_j)
if self.dim == 0:
res = self.result.T
(N, D) = res.shape
grad = grad.T
elif self.dim == 1:
res = self.result
(N, D) = res.shape
else:
raise NotImplementedError(
"Backward for dim > 1 not implemented, Sorry :(")
self.dx = res[:, None, :]
self.dx = np.tensordot(self.dx, self.dx, axes=((1), (1)))
self.dx = self.dx.swapaxes(1, 2)[np.arange(N), np.arange(N)]
diag = np.tile(np.eye(D), (N, 1)).reshape(N, D, D)
diag = res[:, :, None] * diag
self.dx -= diag
self.dx *= -1
# chain rule
self.dx = grad.dot(self.dx)[np.arange(N), np.arange(N)]
if self.dim == 0:
self.dx = self.dx.T
class ReLu(_Function):
"""ReLu."""
def __init__(self, x):
super().__init__("ReLu", x)
#######################################################################
# TODO: Implement the forward pass and put the result in self.result.
# The notbook provide you the formulas for this operation.
#######################################################################
self.result = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def _backward(self, grad):
#######################################################################
# TODO: Implement the derivative dx for this opetation and add the
# result of the chain rule on self.dx.
#######################################################################
self.dx = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
nn.py 0 → 100644
"""
Author: Joao A. Candido Ramos
"""
import numpy as np
from functional import F
from variable import Variable
class Parameters:
"""Parameters is a class that wraps all the parameters of the model.
This class is used in the optimizer.
"""
def __init__(self, model):
self.params = {}
self.model = model
var_list = vars(model)
num = 1
for key, val in var_list.items():
if isinstance(val, Linear):
self.params["{}_W".format(key)] = val.W
if val.bias:
self.params["{}_b".format(key)] = val.b
def get_mode(self):
"""Get the mode of the model."""
return self.model.mode
def zero_grad(self):
"""Clear all parameters variables."""
for key in self.params.keys():
self.params[key].set_defaults()
class Module:
"""Module is an abstract class that all the models must inherit.
Contains basic methods for all type of models.
"""
def parameters(self):
"""Create the wrapper for the parameters of the model and return it."""
params = Parameters(self)
return params
def train(self):
"""Change the mode of the model to train.
The optimizer use it to know if it can update the weights:
mode == train -> it can update.
"""
self.mode = "train"
def eval(self):
"""Change the mode of the model to eval.
The optimizer use it to know if it can update the weights:
mode == eval -> it can not update.
"""
self.mode = "eval"
def __call__(self, X):
"""Enable the call of the class."""
return self.forward(X)
class Linear:
"""Applies a linear transformation to the incoming data: y = XW^T + b.
Pytorch: https://pytorch.org/docs/stable/nn.html#linear
Shapes:
- Input: (N, H_{in}) where H_{in} = in_features
- Output: (N, H_{out} where H_{out} = out_features
Attributes:
weight: the learnable weights of the module of shape
(out_features, in_features). The values are initialized from
Uniform(-sqrt{k}, sqrt{k}), where k = 1/in_features.
bias: the learnable bias of the module of shape (1, out_features).
If bias is True, the values are initialized from
Uniform(-sqrt{k}, sqrt{k}), where k = 1/in_features.
"""
def __init__(self, in_features, out_features, bias=True):
self.in_features = in_features
self.out_features = out_features
self.bias = bias
#######################################################################
# TODO: Initialize the weights accordind to the description above.
# Ton't forget to wrap the data into a Variable.
#######################################################################
self.W = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
if self.bias:
#######################################################################
# TODO: Initialize the bias accordind to the description above.
# Ton't forget to wrap the data into a Variable.
#######################################################################
self.b = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
def __call__(self, X):
"""Computes the forward pass."""
y = None
#######################################################################
# TODO: Use the functional module to compute the first part of the
# linear transfomation -> y = XW.T
#######################################################################
y = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
if self.bias:
#######################################################################
# TODO: If the bias is true add the bias.
#######################################################################
y = None
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
return y
class CrossEntropyLoss:
"""Cross Entropy as in Pytorch with (log) softmax."""
def __init__(self, reduction='mean'):
self.reduction = reduction
def _forward(self, X, y):
"""Compute the forward of this loss, it includes the softmax and the
cross entropy itself.
Formula based of the CrossEntropyLoss of Pytorch:
https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
"""
result = F.log(F.exp(X).sum(1)) - X[range(X.shape[0]), np.ravel(y.data)]
if self.reduction == 'mean':
return result.mean()
elif self.reduction == 'sum':
return result.sum()
elif self.reduction == 'none':
return result
else:
raise RuntimeError("Reduction not known")
def __call__(self, X, y):
"""Call the forward pass.
There is a problem during the backpropagation, with this function!
This function provides a workaround by copying the output of the
network X and backpropagte trough it, than copying the gradients back to
the real X and finaly by changing the grad_fn and the grads of the
result to be the ones of X. It's equivalent of propagating from the
loss to the scores.
"""
X_detach = Variable(X.data)
result = self._forward(X_detach, y)
result.backward()
X.grad = X_detach.grad
result.grad = X_detach.grad
result.grad_fn = X.grad_fn
return result
optim.py 0 → 100644
"""
Author: Joao A. Candido Ramos
"""
class Optimizer:
"""Abstract class for all the optimizers, store the parameters wrapper and
has a method to clear out the parameters.
"""
def __init__(self, parameters, lr):
self.parameters = parameters
self.lr = lr
def zero_grad(self):
"""Clear the parameters.
Ususally call this before a new iteration.
"""
self.parameters.zero_grad()
class SGD(Optimizer):
"""
Applies the SGD update to the weights W = lr * W.grad.
"""
def __init__(self, parameters, lr=1e-3):
super().__init__(parameters, lr)
def step(self):
"""If the model is in train mode update the weights by SGD."""
if self.parameters.get_mode() == "train":
#######################################################################
# TODO: Implement the SGD update mechanism.
# to acces the data of parametes Variables:
# - self.parameters.params[key].data
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
"""
Author: Joao A. Candido Ramos
"""
import copy
import numpy as np
class Variable:
def __init__(self, data):
self.__class__ = Variable
self.data = np.array(data, ndmin=2)
# fix shapes due to ndmin
if np.array(data).shape != self.data.shape:
self.data = self.data.T
self.shape = self.data.shape
self.grad = None
self.grad_fn = None
self.children = []
self.retained_values = {}
self._freed = False
self._fn = ""
def item(self):
"""If Variable is a scalar returns it."""
if self.shape == (1, 1):
return self.data[0, 0]
else:
raise ValueError("only one element tensors can be converted to Python scalars")
def add_fn(self, fn):
"""Add the function that is at the origin of this Variable."""
self.grad_fn = fn
def add_child(self, child):
"""Add a new child to children list, child is an operation where self is parent."""
self.children.append(child)
def remove_child(self, child):
"""Remove child from the list of children."""
self.children.remove(child)
def update_retained_values(self):
"""Updates retained_values, which is a copy of children and grad_fn.
retained_values is used when retain_graph is set to True to not erase
the real children list and grad_fn."""
if self.retained_values == {}:
self.retained_values = {
"children": self.children[:],
"grad_fn": self.grad_fn
}
def zero_grad(self):
"""Sets the grad to zero."""
self.grad = np.zeros(self.shape)
def set_defaults(self):
"""Sets the variable to its defauls options (keeping only the data)."""
self.grad = None
self.grad_fn = None
self.children = []
self.retained_values = {}
self._freed = False
self._fn = ""
def _update_grad_help(self, variable, grad, child, retain_graph):
"""Help function for special cases like .sum(), .mean(), .t(), ..."""
if "_variable" in variable.__dict__.keys():
grad = self._update_grad_help(variable._variable,
grad,
child,
retain_graph)
if variable._fn == "sum":
grad = grad.sum(variable._artefact)
grad = grad.reshape(grad.shape[0], 1 if len(grad.shape) < 2 else grad.shape[1])
grad_to_update = np.ones(variable._variable.shape) * grad
elif variable._fn == "mean":
grad = grad.mean(variable._artefact)
grad = grad.reshape(grad.shape[0], 1 if len(grad.shape) < 2 else grad.shape[1])
grad_to_update = np.ones(variable._variable.shape) * grad
elif variable._fn == "transpose":
grad_to_update = grad.T
elif variable._fn == "items":
grad_to_update = np.zeros(variable._variable.shape)
grad_to_update[variable._artefact] = 1
grad_to_update = grad_to_update * grad
else:
raise ValueError("The function is not Known !")
if variable._variable.grad is None:
variable._variable.grad = grad_to_update
else:
variable._variable.grad += grad_to_update
return grad
def update_grad(self, grad, child, retain_graph=False):
"""Updates the gradients of self.
Args:
- grad (array): the new gradients to update
- child (function): the child from where the gradients come
- retain_graph (bool): specify if we keep the graph for later or not
"""
# for transpose, sum and mean
grad = self._update_grad_help(self, grad, child, retain_graph)
grad = np.ones(self.shape) * grad
if grad.shape[0] != self.shape[0]:
grad = grad.sum(0)[None, :]
if grad.shape != self.shape:
raise ValueError("Shape of gradients and shape of data missmatch.",
"\n\tShape of gradients: {}".format(grad.shape),
"\n\tShape of data: {}".format(self.shape))
if self.grad is None:
#######################################################################
# TODO: Update the current grad (self.grad), if the previous value
# is None. What should be the update ?
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
else:
#######################################################################
# TODO: Update the current grad(self.grad), if the previous value
# is not None. What should be the update ?
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
if retain_graph:
self.update_retained_values()
self.retained_values["children"].remove(child)
else:
self.remove_child(child)
def backward(self, retain_graph=False):
"""Starts the backward pass.
If None of the tests are triggered this should call the backward of the
operation that has made this variable.
Args:
- retain_graph (bool): specify if you want to keep the graph for
later use.
"""
if self.grad_fn is not None:
# create local children and grad_fn accordind to retain graph or not
if retain_graph:
self.update_retained_values()
grad_fn = self.retained_values["grad_fn"]
children = self.retained_values["children"]
else:
grad_fn = self.grad_fn
children = self.children
if self.grad is None:
if self.shape != (1, 1):
raise RuntimeError(
"grad can be implicitly created only for scalar outputs")
self.grad = np.ones(self.shape)
if self._fn == "items":
self.grad = np.zeros(self._variable.shape)
self.grad[self._artefact] = 1
children = []
if not len(children):
#######################################################################
# TODO: Call the backward of the operation that has build this Variable
#######################################################################
pass
#######################################################################
# --------------------------- END OF YOUR CODE ------------------------
#######################################################################
if not retain_graph:
self.grad_fn = None
else:
# check if we are in a leaf
if self._freed:
raise RuntimeError(
"Trying to backward through the graph a second time,"
"but the buffers have already been freed.")
def clone(self):
"""."""
var_cloned = copy.deepcopy(self)
var_cloned.__dict__ = copy.deepcopy(self.__dict__)
return var_cloned
def sum(self, dim=None):
"""."""
var = Variable(self.data.sum(axis=dim))
var.grad_fn = self.grad_fn
var._fn = "sum"
var._artefact = dim
var._variable = self
self.add_child(var)
return var
def mean(self, dim=None):
"""."""
var = Variable(self.data.mean(axis=dim))
var.grad_fn = self.grad_fn
var._fn = "mean"
var._artefact = dim
var.grad = np.ones(self.shape) / self.data.size
var._variable = self
self.add_child(var)
return var
def t(self):
"""."""
var = Variable(self.data.T)
var.grad_fn = self.grad_fn
var._fn = "transpose"
var._variable = self
self.add_child(var)
return var
def __add__(self, other):
"""."""
from functional import F
return F.add(self, other)
def __sub__(self, other):
"""."""
from functional import F
return F.sub(self, other)
def __mul__(self, other):
"""."""
from functional import F
return F.mul(self, other)
def __truediv__(self, other):
"""."""
from functional import F
return F.div(self, other)
def __setitem__(self, pos, item):
"""."""
self.data[pos] = item
def __getitem__(self, pos):
"""."""
if self.shape[0] == 1 and type(pos) == int:
pos = (0, pos)
var = Variable(self.data[pos])
var.grad_fn = self.grad_fn
var._fn = "items"
var._artefact = pos
var._variable = self
self.add_child(var)
return var
def __str__(self):
"""Converts the class to string (e.g. to print the class)."""
data_str = ",\n ".join(str(self.data).split("\n"))
grad_fn_str = ""
if self.grad_fn is not None:
grad_fn_str = ", grad_fn=<{}Backward>".format(self.grad_fn.name)
return "Variable({}{})".format(data_str, grad_fn_str)
def __repr__(self):
"""Uses the string representation of the class when called 'in command line mode'."""
return self.__str__()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment