add files

b47b2617 · Maxxhim · b47b2617 · b47b2617 · b47b2617 · b47b2617
Commit b47b2617 authored 5 years ago by Maxxhim
--- a/ComputationalGraph.ipynb
+++ b/ComputationalGraph.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# TP 2:  Computational Graph\n",
+    "\n",
+    "\n",
+    "\n",
+    "During last TP we asked you to make the backward pass, implementing all the derivatives needed. \n",
+    "As you can expect, doing this every time you make a new model is a little redundant. \n",
+    "The ML libraries allow you to implement models by focusing only on the forward pass, they construct than a graph and compute derivatives from the bottom to the leafs of the graph. This graph is known as \"computational graph\".\n",
+    "\n",
+    "The aim of this TP is to build a computational graph inspired by [pytorch](https://pytorch.org/), and than test it with a simple model (MLP).\n",
+    "The construction of the model, of the loss and the optimizer are also inspired by pytorch.\n",
+    "\n",
+    "The transition to pytorch should be easy in the future.\n",
+    "\n",
+    "**Disclaimer** This code is inspired by how users use Pytorch, it doesn't replace Pytorch and the implemtatin differ form Pytorch. The only goal of this TP is to give an intuition how pytorch and the computational graph work, before starting using it!\n",
+    "\n",
+    "## The Computational Graph\n",
+    "\n",
+    "The computational graph is a graph that specifies the operations done to get a given value. \n",
+    "If C = A + B, the graph will look like:\n",
+    "```\n",
+    "A\n",
+    " \\\n",
+    "  \\\n",
+    "   + -- C \n",
+    "  /\n",
+    " /\n",
+    "B\n",
+    "```\n",
+    "or if F = C * D and E = log(F), the graph will look like:\n",
+    "```\n",
+    "A\n",
+    " \\\n",
+    "  \\\n",
+    "   + -- C\n",
+    "  /      \\\n",
+    " /        \\\n",
+    "B          * -- F -- log -- E\n",
+    "          /\n",
+    "         /\n",
+    "        D\n",
+    "```\n",
+    "\n",
+    "As you can see, the graph is build during the \"forward pass\" and it is easy to see the gradients flow (start on E and end on the leafs A, B and D. \n",
+    "\n",
+    "### Variable\n",
+    "To build this graph in our code, we introduce an object called Variable. This object will look like an array in numpy, it will content data, and have methods like mean, sum, t, etc. Variable has also a grad, a grad__fn, and children field.\n",
+    "\n",
+    "* Variable.grad: store the gradients during the backward pass for the given variable (same shape as Variabale.shape)\n",
+    "* Variable.grad_fn: store the function that has built this Variable (addition, multiplication, etc)\n",
+    "* Variable.children: list of all the operations where the Variable was used. We need Variable.children during backward pass to know if all the children have propagate thier gradients before the current variable compute in turn its gradients.\n",
+    "\n",
+    "### Functions\n",
+    "\n",
+    "Functions cointains all the oprations we can use in your code. Each operation needs a forward and a backward method.\n",
+    "The forward method is simply the __init__ method where you compute and store the result of the operation. \n",
+    "There is 2 backward methods:\n",
+    "* backward (general): Inherited from the _Function parent class, it calls the second _backward method (see below) and updates the gradients of the variables used to build the current one.\n",
+    "* _backward (specific): Is specific to each operation, computes the gradients for its parents. The derivatives are computed according to the specific operation.\n",
+    "\n",
+    "### Functional\n",
+    "\n",
+    "Is simply an interface for all functions defined in functions.\n",
+    "You don't need to touch this file. but take a look at it, because it gives you all the functions you have add.\n",
+    "If you don't use a function from this interface, you will not be able to construct the graph and ptopagate trough it.\n",
+    "\n",
+    "*|!\\* Even the standart operation you can use directly: +, -, *, / use the operation of functional! Take a look at Variable.__add__, Variable.__sub__, Variable.__mul__, Variable.__truediv__ if you doubt.\n",
+    "\n",
+    "\n",
+    "## What we ask you to do:\n",
+    "\n",
+    "Complete the *Fill here* in the following cells of this notebook.\n",
+    "Use latex notation to add your formulas. \n",
+    "Once you have filled the missing parts of an operation, go to function.py and implement the missing parts of it. Tests are provided by saving the gradients using pytorch on the same conditions (same arrays, gradients cleared between operations)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Before starting with the derivatives, let's take a look at variable.py.\n",
+    "The majority of this class is provided to you. We **ask you to describe a little bit this class in your report, \n",
+    "mostly the methods backward and update_grad**. Than fill in the missing parts in this two methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will now create the Variables you will use to test your implementation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Variables Informations:\n",
+      "\n",
+      "Name: a\n",
+      "Data:\n",
+      " [[4.5]]\n",
+      "Shape: (1, 1)\n",
+      "Grad:\n",
+      " None\n",
+      "Grad_fn: None\n"
+     ]
+    }
+   ],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2\n",
+    "\n",
+    "from functional import F\n",
+    "from variable import Variable\n",
+    "\n",
+    "def display_variable_information(name, var):\n",
+    "    print(\"\\nName:\", name)\n",
+    "    print(\"Data:\\n\", var.data)\n",
+    "    print(\"Shape:\", var.shape)\n",
+    "    print(\"Grad:\\n\", var.grad)\n",
+    "    print(\"Grad_fn:\", var.grad_fn)\n",
+    "\n",
+    "# scalars\n",
+    "a = Variable([4.5])\n",
+    "b = Variable([6.78])\n",
+    "\n",
+    "# arrays\n",
+    "C = Variable([[1.73, 2.83], [5.13, 8.43], [5.13, 8.43]])\n",
+    "D = Variable([[3.57, 4.96], [2.06, 1.94], [5.13, 8.43]])\n",
+    "\n",
+    "\n",
+    "print(\"Variables Informations:\")\n",
+    "# uncomment if you want\n",
+    "display_variable_information(\"a\", a)\n",
+    "#display_variable_information(\"b\", b)\n",
+    "#display_variable_information(\"c\", C)\n",
+    "#display_variable_information(\"d\", D)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Variable([[10.    2.83],\n",
+      "          [ 5.13  8.43],\n",
+      "          [ 5.13  8.43]])\n",
+      "Variable([[2.83]])\n",
+      "Variable([[10.    2.83],\n",
+      "          [ 5.13  8.43],\n",
+      "          [ 5.13  8.43]])\n",
+      "None\n",
+      "Variable([[39.95]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(C)\n",
+    "print(C[0,1])\n",
+    "\n",
+    "C[0,0] = 10\n",
+    "\n",
+    "print(C)\n",
+    "\n",
+    "m = C + D\n",
+    "m = F.add(C, D)\n",
+    "\n",
+    "\n",
+    "k = C.sum()\n",
+    "print(k.grad_fn)\n",
+    "print(k)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from check_values import check_result_and_grads\n",
+    "\n",
+    "def clear_variables(*argv):\n",
+    "    \"\"\"Clear all Variables passed in arguments.\"\"\"\n",
+    "    for var in argv:\n",
+    "        var.grad = None\n",
+    "        var.grad_fn = None\n",
+    "        var.children = []\n",
+    "        var.retained_values = {}\n",
+    "        \n",
+    "        "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Addition\n",
+    "\n",
+    "**Given to you as example** \n",
+    "\n",
+    "**Inputs**: $x, y$\n",
+    "\n",
+    "**Operation**: $f(x,y) = x + y$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = 1$\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot 1 = \\frac{\\partial}{\\partial f}$\n",
+    "\n",
+    "+ **w.r.t.** $y$:\n",
+    "\n",
+    "    $\\frac{\\partial f}{\\partial y} = 1$\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot 1 = \\frac{\\partial}{\\partial f}$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = a + b\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, b, operation=\"addition\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = C + D\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, C, operation=\"addition\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Subtraction\n",
+    "\n",
+    "**Inputs**: $x, y$\n",
+    "\n",
+    "**Operation**: $f(x,y) = x - y$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "\n",
+    "+ **w.r.t.** $y$:\n",
+    "\n",
+    "    $\\frac{\\partial f}{\\partial y} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial y} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = a - b\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, b, operation=\"subtraction\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = C - D\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, D, operation=\"subtraction\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Multiplication\n",
+    "\n",
+    "**Inputs**: $x, y$\n",
+    "\n",
+    "**Operation**: $f(x,y) = x * y$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "\n",
+    "+ **w.r.t.** $y$:\n",
+    "\n",
+    "    $\\frac{\\partial f}{\\partial y} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial y} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = a * b\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, b, operation=\"multiplication\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = C * D\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, D, operation=\"multiplication\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Division\n",
+    "\n",
+    "**Inputs**: $x, y$\n",
+    "\n",
+    "**Operation**: $f(x,y) = x / y$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "\n",
+    "+ **w.r.t.** $y$:\n",
+    "\n",
+    "    $\\frac{\\partial f}{\\partial y} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial y} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = a / b\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, b, operation=\"division\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = C / D\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, D, operation=\"division\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Matrix Multiplication\n",
+    "\n",
+    "**Inputs**: $x, y$\n",
+    "\n",
+    "**Operation**: $f(x,y) = x.dot(y)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "\n",
+    "+ **w.r.t.** $y$:\n",
+    "\n",
+    "    $\\frac{\\partial f}{\\partial y} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial y} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial y} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_array = F.matmul(C.t(), D)\n",
+    "res_array.mean().backward()\n",
+    "\n",
+    "check_result_and_grads(res_array, C, D, operation=\"matMul\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exponential\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = e^x$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.exp(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"exp\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.exp(C)\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"exp\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Natural Logarithm\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = ln(x)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.log(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"log\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.log(C)\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"log\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Sinus\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\sin(x)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.sin(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"sin\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.sin(C)\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"sin\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Cosinus\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\cos(x)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.cos(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"cos\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.cos(C)\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"cos\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tangent\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\tan(x)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.tan(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"tan\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.tan(C)\n",
+    "res_array.mean().backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"tan\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Sigmoid\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\frac{1}{1 + e^{-x}}$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.sigmoid(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"sigmoid\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.sigmoid(C)\n",
+    "res_array[0,0].backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"sigmoid\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tanh\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.tanh(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"tanh\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.tanh(C)\n",
+    "res_array[0,0].backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"tanh\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ReLu\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\max(0, x)$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_scalar = F.relu(a)\n",
+    "res_scalar.backward()\n",
+    "check_result_and_grads(res_scalar, a, operation=\"relu\", itype=\"scalar\")\n",
+    "\n",
+    "res_array = F.relu(C)\n",
+    "res_array[0,0].backward()\n",
+    "check_result_and_grads(res_array, C, operation=\"relu\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Softmax\n",
+    "\n",
+    "***The derivative of the softmax is not trivial to computein a vectorized manner, I have done the exercice and give you my implementation of the softmax, feel free to ask me questions about it.***\n",
+    "\n",
+    "***You have to fill bellow the formulas!***\n",
+    "\n",
+    "**Inputs**: $x$\n",
+    "\n",
+    "**Operation**: $f(x) = \\frac{e^{x_i}}{\\sum_i e^{x_i}}$\n",
+    "\n",
+    "**Derivatives**:\n",
+    "+ **w.r.t.** $x$: \n",
+    "    \n",
+    "    $\\frac{\\partial f}{\\partial x} = ...$ *Fill here*\n",
+    "    \n",
+    "    By chain rule:\n",
+    "    $\\frac{\\partial}{\\partial x} = \\frac{\\partial}{\\partial f} \\cdot \\frac{\\partial f}{\\partial x} = ...$ *Fill here*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clear_variables(a, b, C, D)\n",
+    "\n",
+    "res_array = F.softmax(C, dim=0)\n",
+    "res_array[0,0].backward()\n",
+    "\n",
+    "check_result_and_grads(res_array, C, operation=\"softmax\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Cross Entropy Loss\n",
+    "\n",
+    "For the Cross entropy loss we have used  the trick from pytorch that implements direclty the cross entropy loss with the softmax for more stability.\n",
+    "\n",
+    "Take a look [here](https://pytorch.org/docs/stable/nn.html#crossentropyloss).\n",
+    "\n",
+    "You don't have to implement it but make sure you understand what append here, comment it on your report.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nn as nn\n",
+    "\n",
+    "X = Variable([[0.1711, 0.5140, 0.3149], [0.1359, 0.4985, 0.3656], [0.0275, 0.5467, 0.4258]])\n",
+    "y = Variable([1, 2, 0])\n",
+    "\n",
+    "cel = nn.CrossEntropyLoss()\n",
+    "\n",
+    "loss = cel(X, y)\n",
+    "\n",
+    "loss.backward()\n",
+    "\n",
+    "check_result_and_grads(loss, X, operation=\"CEL\", itype=\"array\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## An MLP as example\n",
+    "\n",
+    "Now that you have all the components filled for the graph computational, you will need some additional steps to make a MLP trainable.\n",
+    "\n",
+    "You have to complete the missing parts in ***nn.py*** and in ***optim.py***.\n",
+    "\n",
+    "\n",
+    "First we will generate a simple dataset, each color represents each class.\n",
+    "As you can see, we have 3 classes and each sample has 2 features (the cordinates)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2\n",
+    "\n",
+    "import numpy as np\n",
+    "\n",
+    "import matplotlib\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "import sklearn\n",
+    "import sklearn.datasets\n",
+    "import sklearn.linear_model\n",
+    "\n",
+    "# Display plots inline and change default figure size\n",
+    "%matplotlib inline\n",
+    "matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)\n",
+    "\n",
+    "# Generate a dataset and plot it# Gener\n",
+    "N = 500\n",
+    "np.random.seed(0)\n",
+    "X, y = sklearn.datasets.make_blobs(N)\n",
+    "plt.scatter(X[:,0], X[:,1], s=40, c=y)\n",
+    "\n",
+    "X_train = X[:350]\n",
+    "y_train = y[:350]\n",
+    "X_test = X[350:]\n",
+    "y_test = y[350:]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define and train the MLP\n",
+    "\n",
+    "Follow the todos here and complete the missing parts in ***nn.py*** and ***optim.py***.\n",
+    "\n",
+    "List of things you have to do. You can put a 'x' inside the [ ] when you have done it!\n",
+    "Example: * [x] Example done.\n",
+    "\n",
+    "\n",
+    "\n",
+    "* nn.py:\n",
+    "    * Linear:\n",
+    "        * in init\n",
+    "            * [ ] Initialize the weights.\n",
+    "            * [ ] Initialize the bias.\n",
+    "        * in call:\n",
+    "            * [ ] Implement the linear transformation.\n",
+    "            * [ ] Add the bias.\n",
+    "* optim.py:\n",
+    "    * SGD:\n",
+    "        * in step:\n",
+    "            * [ ] Implement the SGD update mechanism."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from functional import F\n",
+    "from variable import Variable\n",
+    "import nn as nn\n",
+    "from optim import SGD\n",
+    "\n",
+    "np.random.seed(13)\n",
+    "\n",
+    "class MLP(nn.Module):\n",
+    "    def __init__(self, in_features, hidden_size, out_features):\n",
+    "        #######################################################################\n",
+    "        # TODO: define 2 linear layers, one that takes the inputs and outputs\n",
+    "        # values with hidden_size\n",
+    "        # and the second one that takes the values from the first layer and\n",
+    "        # outputs the scores. \n",
+    "        # implement Linear in nn.py before, you need it here.\n",
+    "        #######################################################################\n",
+    "        pass\n",
+    "        #######################################################################\n",
+    "        # --------------------------- END OF YOUR CODE ------------------------\n",
+    "        #######################################################################\n",
+    "\n",
+    "        \n",
+    "    def forward(self, X):\n",
+    "        output = None\n",
+    "        #######################################################################\n",
+    "        # TODO: define your forward pass as follow\n",
+    "        #    1) y = linear(inputs)\n",
+    "        #    2) y_nl = relu(y)\n",
+    "        #    3) output = linear(y_nl)\n",
+    "        # softmax not needed because it's already in cross entropy\n",
+    "        #######################################################################\n",
+    "        pass\n",
+    "        #######################################################################\n",
+    "        # --------------------------- END OF YOUR CODE ------------------------\n",
+    "        #######################################################################\n",
+    "        return output\n",
+    "\n",
+    "\n",
+    "model = MLP(2, 100, 3)\n",
+    "\n",
+    "optimizer = SGD(model.parameters(), lr=1e-3)\n",
+    "loss_fn = nn.CrossEntropyLoss()\n",
+    "\n",
+    "epochs = 1000\n",
+    "batch_size = 50\n",
+    "\n",
+    "history_losses = []\n",
+    "history_acc = []\n",
+    "\n",
+    "for epoch in range(1, epochs+1):\n",
+    "    model.train()\n",
+    "    \n",
+    "    indices = range(X_train.shape[0])\n",
+    "\n",
+    "    train_losses = []\n",
+    "    train_acc = []\n",
+    "    \n",
+    "    for iteration in range(X_train.shape[0]//batch_size):\n",
+    "        batch_indices = np.random.choice(indices, batch_size)\n",
+    "        indices = list(set(indices) - set(batch_indices))\n",
+    "\n",
+    "        X_batch = Variable(X_train[batch_indices])\n",
+    "        y_batch = Variable(y_train[batch_indices])\n",
+    "        \n",
+    "        \n",
+    "        #######################################################################\n",
+    "        # TODO: Add here all the elements you need to train your model for each\n",
+    "        # batch.\n",
+    "        #######################################################################\n",
+    "        \n",
+    "        # you need to clear out the gradients for all the parameters\n",
+    "        pass\n",
+    "        \n",
+    "        # compute the forward pass\n",
+    "        pass\n",
+    "        \n",
+    "        # compute tht loss\n",
+    "        pass\n",
+    "        \n",
+    "        # compute the backward pass\n",
+    "        pass\n",
+    "        \n",
+    "        # optimize\n",
+    "        pass\n",
+    "        #######################################################################\n",
+    "        # --------------------------- END OF YOUR CODE ------------------------\n",
+    "        #######################################################################\n",
+    "\n",
+    "        # keep loss\n",
+    "        train_losses.append(loss.item())\n",
+    "        \n",
+    "        # keep accuracy\n",
+    "        y_pred = np.argmax(outputs.data, axis=1)\n",
+    "        train_acc.append((y_pred[:, None] == y_batch.data).mean())\n",
+    "    \n",
+    "    history_losses.append(np.mean(train_losses))\n",
+    "    history_acc.append(np.mean(train_acc))\n",
+    "    \n",
+    "    # mod allow us to only display in a logaritmic way\n",
+    "    mod = 10**np.floor(np.log10(epoch))\n",
+    "    if epoch % mod == 0:\n",
+    "        print(\"Epoch {:>3}/{:>3}, loss {:.4f}, acc {:.2f}\".format(epoch, epochs, history_losses[-1], history_acc[-1]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualisation\n",
+    "\n",
+    "Now you can visualise for fun the loss and the accuracy of your model during trainning and get the final accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.plot(history_losses, c=\"r\", label=\"loss\")\n",
+    "plt.legend()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.plot(history_acc, c=\"g\", label=\"Accuracy\")\n",
+    "plt.legend()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should get ~90% of accuracy with the test set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.eval()\n",
+    "X_test_var = Variable(X_test)\n",
+    "\n",
+    "outputs = model(X_test_var)\n",
+    "\n",
+    "y_pred = np.argmax(outputs.data, axis=1)\n",
+    "acc = (y_pred == y_test).mean()\n",
+    "\n",
+    "print(\"Accuracy on test set: {:.2f}\".format(acc))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# TP 2:  Computational Graph
+
+
+
+During last TP we asked you to make the backward pass, implementing all the derivatives needed.
+As you can expect, doing this every time you make a new model is a little redundant.
+The ML libraries allow you to implement models by focusing only on the forward pass, they construct than a graph and compute derivatives from the bottom to the leafs of the graph. This graph is known as "computational graph".
+
+The aim of this TP is to build a computational graph inspired by [pytorch](https://pytorch.org/), and than test it with a simple model (MLP).
+The construction of the model, of the loss and the optimizer are also inspired by pytorch.
+
+The transition to pytorch should be easy in the future.
+
+**Disclaimer** This code is inspired by how users use Pytorch, it doesn't replace Pytorch and the implemtatin differ form Pytorch. The only goal of this TP is to give an intuition how pytorch and the computational graph work, before starting using it!
+
+## The Computational Graph
+
+The computational graph is a graph that specifies the operations done to get a given value.
+If C = A + B, the graph will look like:
+```
+A
+ \
+  \
+   + -- C
+  /
+ /
+B
+```
+or if F = C * D and E = log(F), the graph will look like:
+```
+A
+ \
+  \
+   + -- C
+  /      \
+ /        \
+B          * -- F -- log -- E
+          /
+         /
+        D
+```
+
+As you can see, the graph is build during the "forward pass" and it is easy to see the gradients flow (start on E and end on the leafs A, B and D.
+
+### Variable
+To build this graph in our code, we introduce an object called Variable. This object will look like an array in numpy, it will content data, and have methods like mean, sum, t, etc. Variable has also a grad, a grad__fn, and children field.
+
+* Variable.grad: store the gradients during the backward pass for the given variable (same shape as Variabale.shape)
+* Variable.grad_fn: store the function that has built this Variable (addition, multiplication, etc)
+* Variable.children: list of all the operations where the Variable was used. We need Variable.children during backward pass to know if all the children have propagate thier gradients before the current variable compute in turn its gradients.
+
+### Functions
+
+Functions cointains all the oprations we can use in your code. Each operation needs a forward and a backward method.
+The forward method is simply the __init__ method where you compute and store the result of the operation.
+There is 2 backward methods:
+* backward (general): Inherited from the _Function parent class, it calls the second _backward method (see below) and updates the gradients of the variables used to build the current one.
+* _backward (specific): Is specific to each operation, computes the gradients for its parents. The derivatives are computed according to the specific operation.
+
+### Functional
+
+Is simply an interface for all functions defined in functions.
+You don't need to touch this file. but take a look at it, because it gives you all the functions you have add.
+If you don't use a function from this interface, you will not be able to construct the graph and ptopagate trough it.
+
+*|!\* Even the standart operation you can use directly: +, -, *, / use the operation of functional! Take a look at Variable.__add__, Variable.__sub__, Variable.__mul__, Variable.__truediv__ if you doubt.
+
+
+## What we ask you to do:
+
+Complete the *Fill here* in the following cells of this notebook.
+Use latex notation to add your formulas.
+Once you have filled the missing parts of an operation, go to function.py and implement the missing parts of it. Tests are provided by saving the gradients using pytorch on the same conditions (same arrays, gradients cleared between operations).
+
+%% Cell type:markdown id: tags:
+
+
+Before starting with the derivatives, let's take a look at variable.py.
+The majority of this class is provided to you. We **ask you to describe a little bit this class in your report,
+mostly the methods backward and update_grad**. Than fill in the missing parts in this two methods.
+
+%% Cell type:markdown id: tags:
+
+We will now create the Variables you will use to test your implementation.
+
+%% Cell type:code id: tags:
+
+``` python
+# for auto-reloading external modules
+# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
+%load_ext autoreload
+%autoreload 2
+
+from functional import F
+from variable import Variable
+
+def display_variable_information(name, var):
+    print("\nName:", name)
+    print("Data:\n", var.data)
+    print("Shape:", var.shape)
+    print("Grad:\n", var.grad)
+    print("Grad_fn:", var.grad_fn)
+
+# scalars
+a = Variable([4.5])
+b = Variable([6.78])
+
+# arrays
+C = Variable([[1.73, 2.83], [5.13, 8.43], [5.13, 8.43]])
+D = Variable([[3.57, 4.96], [2.06, 1.94], [5.13, 8.43]])
+
+
+print("Variables Informations:")
+# uncomment if you want
+display_variable_information("a", a)
+#display_variable_information("b", b)
+#display_variable_information("c", C)
+#display_variable_information("d", D)
+```
+
+%% Output
+
+    Variables Informations:
+    
+    Name: a
+    Data:
+     [[4.5]]
+    Shape: (1, 1)
+    Grad:
+     None
+    Grad_fn: None
+
+%% Cell type:code id: tags:
+
+``` python
+print(C)
+print(C[0,1])
+
+C[0,0] = 10
+
+print(C)
+
+m = C + D
+m = F.add(C, D)
+
+
+k = C.sum()
+print(k.grad_fn)
+print(k)
+```
+
+%% Output
+
+    Variable([[10.    2.83],
+              [ 5.13  8.43],
+              [ 5.13  8.43]])
+    Variable([[2.83]])
+    Variable([[10.    2.83],
+              [ 5.13  8.43],
+              [ 5.13  8.43]])
+    None
+    Variable([[39.95]])
+
+%% Cell type:code id: tags:
+
+``` python
+from check_values import check_result_and_grads
+
+def clear_variables(*argv):
+    """Clear all Variables passed in arguments."""
+    for var in argv:
+        var.grad = None
+        var.grad_fn = None
+        var.children = []
+        var.retained_values = {}
+
+
+```
+
+%% Cell type:markdown id: tags:
+
+# Addition
+
+**Given to you as example**
+
+**Inputs**: $x, y$
+
+**Operation**: $f(x,y) = x + y$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = 1$
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = \frac{\partial}{\partial f} \cdot 1 = \frac{\partial}{\partial f}$
+
+ **w.r.t.** $y$:
+
+    $\frac{\partial f}{\partial y} = 1$
+
+    By chain rule:
+    $\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = \frac{\partial}{\partial f} \cdot 1 = \frac{\partial}{\partial f}$
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = a + b
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, b, operation="addition", itype="scalar")
+
+res_array = C + D
+res_array.mean().backward()
+check_result_and_grads(res_array, C, C, operation="addition", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Subtraction
+
+**Inputs**: $x, y$
+
+**Operation**: $f(x,y) = x - y$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+ **w.r.t.** $y$:
+
+    $\frac{\partial f}{\partial y} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = a - b
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, b, operation="subtraction", itype="scalar")
+
+res_array = C - D
+res_array.mean().backward()
+check_result_and_grads(res_array, C, D, operation="subtraction", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Multiplication
+
+**Inputs**: $x, y$
+
+**Operation**: $f(x,y) = x * y$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+ **w.r.t.** $y$:
+
+    $\frac{\partial f}{\partial y} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = a * b
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, b, operation="multiplication", itype="scalar")
+
+res_array = C * D
+res_array.mean().backward()
+check_result_and_grads(res_array, C, D, operation="multiplication", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Division
+
+**Inputs**: $x, y$
+
+**Operation**: $f(x,y) = x / y$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+ **w.r.t.** $y$:
+
+    $\frac{\partial f}{\partial y} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = a / b
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, b, operation="division", itype="scalar")
+
+res_array = C / D
+res_array.mean().backward()
+check_result_and_grads(res_array, C, D, operation="division", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Matrix Multiplication
+
+**Inputs**: $x, y$
+
+**Operation**: $f(x,y) = x.dot(y)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+ **w.r.t.** $y$:
+
+    $\frac{\partial f}{\partial y} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial y} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial y} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_array = F.matmul(C.t(), D)
+res_array.mean().backward()
+
+check_result_and_grads(res_array, C, D, operation="matMul", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Exponential
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = e^x$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.exp(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="exp", itype="scalar")
+
+res_array = F.exp(C)
+res_array.mean().backward()
+check_result_and_grads(res_array, C, operation="exp", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Natural Logarithm
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = ln(x)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.log(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="log", itype="scalar")
+
+res_array = F.log(C)
+res_array.mean().backward()
+check_result_and_grads(res_array, C, operation="log", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Sinus
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \sin(x)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.sin(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="sin", itype="scalar")
+
+res_array = F.sin(C)
+res_array.mean().backward()
+check_result_and_grads(res_array, C, operation="sin", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Cosinus
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \cos(x)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.cos(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="cos", itype="scalar")
+
+res_array = F.cos(C)
+res_array.mean().backward()
+check_result_and_grads(res_array, C, operation="cos", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Tangent
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \tan(x)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.tan(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="tan", itype="scalar")
+
+res_array = F.tan(C)
+res_array.mean().backward()
+check_result_and_grads(res_array, C, operation="tan", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Sigmoid
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \frac{1}{1 + e^{-x}}$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.sigmoid(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="sigmoid", itype="scalar")
+
+res_array = F.sigmoid(C)
+res_array[0,0].backward()
+check_result_and_grads(res_array, C, operation="sigmoid", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Tanh
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.tanh(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="tanh", itype="scalar")
+
+res_array = F.tanh(C)
+res_array[0,0].backward()
+check_result_and_grads(res_array, C, operation="tanh", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# ReLu
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \max(0, x)$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_scalar = F.relu(a)
+res_scalar.backward()
+check_result_and_grads(res_scalar, a, operation="relu", itype="scalar")
+
+res_array = F.relu(C)
+res_array[0,0].backward()
+check_result_and_grads(res_array, C, operation="relu", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Softmax
+
+***The derivative of the softmax is not trivial to computein a vectorized manner, I have done the exercice and give you my implementation of the softmax, feel free to ask me questions about it.***
+
+***You have to fill bellow the formulas!***
+
+**Inputs**: $x$
+
+**Operation**: $f(x) = \frac{e^{x_i}}{\sum_i e^{x_i}}$
+
+**Derivatives**:
+ **w.r.t.** $x$:
+
+    $\frac{\partial f}{\partial x} = ...$ *Fill here*
+
+    By chain rule:
+    $\frac{\partial}{\partial x} = \frac{\partial}{\partial f} \cdot \frac{\partial f}{\partial x} = ...$ *Fill here*
+
+%% Cell type:code id: tags:
+
+``` python
+clear_variables(a, b, C, D)
+
+res_array = F.softmax(C, dim=0)
+res_array[0,0].backward()
+
+check_result_and_grads(res_array, C, operation="softmax", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+# Cross Entropy Loss
+
+For the Cross entropy loss we have used  the trick from pytorch that implements direclty the cross entropy loss with the softmax for more stability.
+
+Take a look [here](https://pytorch.org/docs/stable/nn.html#crossentropyloss).
+
+You don't have to implement it but make sure you understand what append here, comment it on your report.
+
+%% Cell type:code id: tags:
+
+``` python
+import nn as nn
+
+X = Variable([[0.1711, 0.5140, 0.3149], [0.1359, 0.4985, 0.3656], [0.0275, 0.5467, 0.4258]])
+y = Variable([1, 2, 0])
+
+cel = nn.CrossEntropyLoss()
+
+loss = cel(X, y)
+
+loss.backward()
+
+check_result_and_grads(loss, X, operation="CEL", itype="array")
+```
+
+%% Cell type:markdown id: tags:
+
+## An MLP as example
+
+Now that you have all the components filled for the graph computational, you will need some additional steps to make a MLP trainable.
+
+You have to complete the missing parts in ***nn.py*** and in ***optim.py***.
+
+
+First we will generate a simple dataset, each color represents each class.
+As you can see, we have 3 classes and each sample has 2 features (the cordinates).
+
+%% Cell type:code id: tags:
+
+``` python
+# for auto-reloading external modules
+# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
+%load_ext autoreload
+%autoreload 2
+
+import numpy as np
+
+import matplotlib
+import matplotlib.pyplot as plt
+
+import sklearn
+import sklearn.datasets
+import sklearn.linear_model
+
+# Display plots inline and change default figure size
+%matplotlib inline
+matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
+
+# Generate a dataset and plot it# Gener
+N = 500
+np.random.seed(0)
+X, y = sklearn.datasets.make_blobs(N)
+plt.scatter(X[:,0], X[:,1], s=40, c=y)
+
+X_train = X[:350]
+y_train = y[:350]
+X_test = X[350:]
+y_test = y[350:]
+```
+
+%% Cell type:markdown id: tags:
+
+## Define and train the MLP
+
+Follow the todos here and complete the missing parts in ***nn.py*** and ***optim.py***.
+
+List of things you have to do. You can put a 'x' inside the [ ] when you have done it!
+Example: * [x] Example done.
+
+
+
+* nn.py:
+    * Linear:
+        * in init
+            * [ ] Initialize the weights.
+            * [ ] Initialize the bias.
+        * in call:
+            * [ ] Implement the linear transformation.
+            * [ ] Add the bias.
+* optim.py:
+    * SGD:
+        * in step:
+            * [ ] Implement the SGD update mechanism.
+
+%% Cell type:code id: tags:
+
+``` python
+from functional import F
+from variable import Variable
+import nn as nn
+from optim import SGD
+
+np.random.seed(13)
+
+class MLP(nn.Module):
+    def __init__(self, in_features, hidden_size, out_features):
+        #######################################################################
+        # TODO: define 2 linear layers, one that takes the inputs and outputs
+        # values with hidden_size
+        # and the second one that takes the values from the first layer and
+        # outputs the scores.
+        # implement Linear in nn.py before, you need it here.
+        #######################################################################
+        pass
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+    def forward(self, X):
+        output = None
+        #######################################################################
+        # TODO: define your forward pass as follow
+        #    1) y = linear(inputs)
+        #    2) y_nl = relu(y)
+        #    3) output = linear(y_nl)
+        # softmax not needed because it's already in cross entropy
+        #######################################################################
+        pass
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+        return output
+
+
+model = MLP(2, 100, 3)
+
+optimizer = SGD(model.parameters(), lr=1e-3)
+loss_fn = nn.CrossEntropyLoss()
+
+epochs = 1000
+batch_size = 50
+
+history_losses = []
+history_acc = []
+
+for epoch in range(1, epochs+1):
+    model.train()
+
+    indices = range(X_train.shape[0])
+
+    train_losses = []
+    train_acc = []
+
+    for iteration in range(X_train.shape[0]//batch_size):
+        batch_indices = np.random.choice(indices, batch_size)
+        indices = list(set(indices) - set(batch_indices))
+
+        X_batch = Variable(X_train[batch_indices])
+        y_batch = Variable(y_train[batch_indices])
+
+
+        #######################################################################
+        # TODO: Add here all the elements you need to train your model for each
+        # batch.
+        #######################################################################
+
+        # you need to clear out the gradients for all the parameters
+        pass
+
+        # compute the forward pass
+        pass
+
+        # compute tht loss
+        pass
+
+        # compute the backward pass
+        pass
+
+        # optimize
+        pass
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+        # keep loss
+        train_losses.append(loss.item())
+
+        # keep accuracy
+        y_pred = np.argmax(outputs.data, axis=1)
+        train_acc.append((y_pred[:, None] == y_batch.data).mean())
+
+    history_losses.append(np.mean(train_losses))
+    history_acc.append(np.mean(train_acc))
+
+    # mod allow us to only display in a logaritmic way
+    mod = 10**np.floor(np.log10(epoch))
+    if epoch % mod == 0:
+        print("Epoch {:>3}/{:>3}, loss {:.4f}, acc {:.2f}".format(epoch, epochs, history_losses[-1], history_acc[-1]))
+```
+
+%% Cell type:markdown id: tags:
+
+## Visualisation
+
+Now you can visualise for fun the loss and the accuracy of your model during trainning and get the final accuracy.
+
+%% Cell type:code id: tags:
+
+``` python
+plt.plot(history_losses, c="r", label="loss")
+plt.legend()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+plt.plot(history_acc, c="g", label="Accuracy")
+plt.legend()
+```
+
+%% Cell type:markdown id: tags:
+
+You should get ~90% of accuracy with the test set.
+
+%% Cell type:code id: tags:
+
+``` python
+model.eval()
+X_test_var = Variable(X_test)
+
+outputs = model(X_test_var)
+
+y_pred = np.argmax(outputs.data, axis=1)
+acc = (y_pred == y_test).mean()
+
+print("Accuracy on test set: {:.2f}".format(acc))
+```
--- a/ComputationalGraph.pdf
+++ b/ComputationalGraph.pdf
--- a/check_values.py
+++ b/check_values.py
+"""
+Script to check if the gradients computed are correct.
+TRUE_VALUES come from pytorch backward with same data as tested in this exercice.
+
+Author: Joao A. Candido Ramos
+"""
+
+import numpy as np
+
+TRUE_VALUES = {
+    'addition': {
+        'scalar': {
+            'a': [1.0],
+            'b': [1.0],
+            'res': [11.280000686645508]
+        },
+        'array': {
+            'a': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
+            'b': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
+            'res': [[5.300000190734863, 7.789999961853027], [7.190000057220459, 10.370000839233398], [10.260000228881836, 16.860000610351562]]
+        }
+    },
+    'subtraction': {
+        'scalar': {
+            'a': [1.0],
+            'b': [-1.0],
+            'res': [-2.2800002098083496]
+        },
+        'array': {
+            'a': [[0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204], [0.1666666716337204, 0.1666666716337204]],
+            'b': [[-0.1666666716337204, -0.1666666716337204], [-0.1666666716337204, -0.1666666716337204], [-0.1666666716337204, -0.1666666716337204]],
+            'res': [[-1.8399999141693115, -2.130000114440918], [3.070000171661377, 6.490000247955322], [0.0, 0.0]]
+        }
+    },
+    'multiplication': {
+        'scalar': {
+            'a': [6.78000020980835],
+            'b': [4.5],
+            'res': [30.510000228881836]
+        },
+        'array': {
+            'a': [[0.5950000286102295, 0.8266667127609253], [0.34333333373069763, 0.32333335280418396], [0.8550000190734863, 1.40500009059906]],
+            'b': [[0.28833335638046265, 0.4716666638851166], [0.8550000190734863, 1.40500009059906], [0.8550000190734863, 1.40500009059906]],
+            'res': [[6.17609977722168, 14.036799430847168], [10.56779956817627, 16.35420036315918], [26.3169002532959, 71.06490325927734]]
+        }
+    },
+    'division': {
+        'scalar': {
+            'a': [0.14749261736869812],
+            'b': [-0.09789332747459412],
+            'res': [0.6637167930603027]
+        },
+        'array': {
+            'a': [[0.046685341745615005, 0.03360215201973915], [0.08090615272521973, 0.08591065555810928], [0.0324886292219162, 0.019770659506320953]],
+            'b': [[-0.02262343093752861, -0.019172193482518196], [-0.20147988200187683, -0.3733128011226654], [-0.0324886292219162, -0.019770661368966103]],
+            'res': [[0.48459383845329285, 0.5705645084381104], [2.4902913570404053, 4.34536075592041], [1.0, 1.0]]
+        }
+    },
+    'matMul': {
+        'array': {
+            'a': [[2.132499933242798, 2.132499933242798], [1.0, 1.0], [3.390000104904175, 3.390000104904175]],
+            'b': [[1.1399999856948853, 1.1399999856948853], [3.390000104904175, 3.390000104904175], [3.390000104904175, 3.390000104904175]],
+            'res': [[43.06079864501953, 61.77890396118164], [70.71480560302734, 101.45590209960938]]
+        }
+    },
+    'exp': {
+        'scalar': {
+            'a': [90.01712799072266],
+            'res': [90.01712799072266]
+        },
+        'array': {
+            'a': [[0.9401090145111084, 2.8242433071136475], [28.169523239135742, 763.750244140625], [28.169523239135742, 763.750244140625]],
+            'res': [[5.64065408706665, 16.945459365844727], [169.0171356201172, 4582.50146484375], [169.0171356201172, 4582.50146484375]]
+        }
+    },
+    'log': {
+        'scalar': {
+            'a': [0.2222222238779068],
+            'res': [1.504077434539795]
+        },
+        'array': {
+            'a': [[0.09633911401033401, 0.05889282003045082], [0.0324886292219162, 0.019770659506320953], [0.0324886292219162, 0.019770659506320953]],
+            'res': [[0.548121452331543, 1.0402766466140747], [1.6351057291030884, 2.1317968368530273], [1.6351057291030884, 2.1317968368530273]]
+        }
+    },
+    'sin': {
+        'scalar': {
+            'a': [-0.2107958048582077],
+            'res': [-0.9775301218032837]
+        },
+        'array': {
+            'a': [[-0.026422005146741867, -0.15864108502864838], [0.06759634613990784, -0.0907815620303154], [0.06759634613990784, -0.0907815620303154]],
+            'res': [[0.9873538613319397, 0.30657505989074707], [-0.9140604138374329, 0.8386378884315491], [-0.9140604138374329, 0.8386378884315491]]
+        }
+    },
+    'cos': {
+        'scalar': {
+            'a': [0.9775301218032837],
+            'res': [-0.2107958048582077]
+        },
+        'array': {
+            'a': [[-0.16455897688865662, -0.05109584331512451], [0.15234340727329254, -0.13977298140525818], [0.15234340727329254, -0.13977298140525818]],
+            'res': [[-0.1585320234298706, -0.9518464803695679], [0.40557804703712463, -0.5446893572807312], [0.40557804703712463, -0.5446893572807312]]
+        }
+    },
+    'tan': {
+        'scalar': {
+            'a': [22.50484848022461],
+            'res': [4.637331962585449]
+        },
+        'array': {
+            'a': [[6.6315460205078125, 0.18395641446113586], [1.0132110118865967, 0.5617601871490479], [1.0132110118865967, 0.5617601871490479]],
+            'res': [[-6.2281036376953125, -0.32208457589149475], [-2.253722667694092, -1.5396625995635986], [-2.253722667694092, -1.5396625995635986]]
+        }
+    },
+    'sigmoid': {
+        'scalar': {
+            'a': [0.010866211727261543],
+            'res': [0.9890130758285522]
+        },
+        'array': {
+            'a': [[0.12791094183921814, 0.0], [0.0, 0.0], [0.0, 0.0]],
+            'res': [[0.8494124412536621, 0.9442755579948425], [0.9941182136535645, 0.9997817873954773], [0.9941182136535645, 0.9997817873954773]]
+        }
+    },
+    'tanh': {
+        'scalar': {
+            'a': [0.000493466854095459],
+            'res': [0.9997532367706299]
+        },
+        'array': {
+            'a': [[0.11817395687103271, 0.0], [0.0, 0.0], [0.0, 0.0]],
+            'res': [[0.9390559196472168, 0.9930591583251953], [0.9999299645423889, 0.9999998807907104], [0.9999299645423889, 0.9999998807907104]]
+        }
+    },
+    'relu': {
+        'scalar': {
+            'a': [1.0],
+            'res': [4.5]
+        },
+        'array': {
+            'a': [[1.0, 0.0], [0.0, 0.0], [0.0, 0.0]],
+            'res': [[1.7300000190734863, 2.8299999237060547], [5.130000114440918, 8.430000305175781], [5.130000114440918, 8.430000305175781]]
+        }
+    },
+    'softmax': {
+        'array': {
+            'a': [[0.016143381595611572, 0.0], [-0.008071689866483212, 0.0], [-0.008071689866483212, 0.0]],
+            'res': [[0.016412759199738503, 0.0018455189419910312], [0.49179360270500183, 0.4990772306919098], [0.49179360270500183, 0.4990772306919098]]
+        }
+    },
+    'CEL': {
+        'array': {
+            'a': [[0.09353624284267426, -0.20153817534446716, 0.10800191760063171], [0.09020509570837021, 0.1296302080154419, -0.21983526647090912], [-0.25339677929878235, 0.13434799015522003, 0.11904877424240112]],
+            'res': [[1.14438696]]
+        }
+    },
+}
+
+
+def get_check_msg(array1, array2):
+    msg = ""
+    array2 = np.array(array2, ndmin=2)
+
+    # verify shape
+    if array1.shape == array2.shape:
+        msg += "\n\t\tShape: OK"
+    else:
+        msg += "\n\t\tShape: NOT OK"
+
+    # verify content
+    if np.isclose(array1, array2, atol=1e-07).all():
+        msg += "\n\t\tContent: OK"
+    else:
+        msg += "\n\t\tContent: NOT OK"
+    return msg
+
+
+def check_result_and_grads(res, a, b=None, operation="", itype=""):
+    msg = ""
+    if b is not None:
+        msg += "\nCheck operation {}({}, {}):".format(
+            operation, "a" if itype == "scalar" else "C", "b" if itype == "scalar" else "D")
+    else:
+        msg += "\nCheck operation {}({}):".format(
+            operation, "a" if itype == "scalar" else "C")
+    msg += "\n\tResult:"
+    msg += get_check_msg(res.data, TRUE_VALUES[operation][itype]["res"])
+
+    msg += "\n\tGradients of {}:".format("a" if itype == "scalar" else "C")
+    msg += get_check_msg(a.grad, TRUE_VALUES[operation][itype]["a"])
+
+    if b is not None:
+        msg += "\n\tGradients of {}:".format("b" if itype == "scalar" else "D")
+        msg += get_check_msg(b.grad, TRUE_VALUES[operation][itype]["b"])
+    print(msg)
--- a/functional.py
+++ b/functional.py
+"""
+Interface for all the functions implemented in functions.py.
+
+Author: Joao A. Candido Ramos
+"""
+
+from functions import *
+
+
+class Functional:
+    # operations
+    def add(self, x, y):
+        return Add(x, y).forward()
+
+    def sub(self, x, y):
+        return Sub(x, y).forward()
+
+    def mul(self, x, y):
+        return Mul(x, y).forward()
+
+    def matmul(self, x, y):
+        return MatMul(x, y).forward()
+
+    def div(self, x, y):
+        return Div(x, y).forward()
+
+    def exp(self, x):
+        return Exp(x).forward()
+
+    def log(self, x):
+        return Log(x).forward()
+
+    def sin(self, x):
+        return Sin(x).forward()
+
+    def cos(self, x):
+        return Cos(x).forward()
+
+    def tan(self, x):
+        return Tan(x).forward()
+
+    # activations
+    def sigmoid(self, x):
+        return Sigmoid(x).forward()
+
+    def tanh(self, x):
+        return Tanh(x).forward()
+
+    def relu(self, x):
+        return ReLu(x).forward()
+
+    def softmax(self, x, dim):
+        return Softmax(x, dim).forward()
+
+
+F = Functional()
+
+if __name__ == "__main__":
+    pass
--- a/functions.py
+++ b/functions.py
+"""
+Definition of different functions with forward and backward.
+
+Author: Joao A. Candido Ramos
+"""
+
+import numpy as np
+
+from variable import Variable
+
+
+class _Function:
+    def __init__(self, name, x, y=None):
+        self.name = name
+        self.x = x
+        self.y = y
+
+    def forward(self):
+        self.x.add_child(self)
+        if self.y is not None:
+            self.y.add_child(self)
+        result_variable = Variable(self.result)
+        result_variable.grad_fn = self
+        return result_variable
+
+    def backward(self, grad, retain_graph):
+        self._backward(grad)
+        self.x.update_grad(self.dx, child=self, retain_graph=retain_graph)
+        if self.y is not None:
+            self.y.update_grad(self.dy, child=self, retain_graph=retain_graph)
+
+        self.x.backward(retain_graph=retain_graph)
+        if self.y is not None:
+            self.y.backward(retain_graph=retain_graph)
+
+
+class Add(_Function):
+    """Adition of two elements."""
+
+    def __init__(self, x, y):
+        super().__init__("Add", x, y)
+        self.result = x.data + y.data
+
+    def _backward(self, grad):
+        self.dx = grad
+        self.dy = grad
+
+
+class Sub(_Function):
+    """Substraction of two elements."""
+
+    def __init__(self, x, y):
+        super().__init__("Sub", x, y)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        self.dy = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Mul(_Function):
+    """Element-wise multiplication."""
+
+    def __init__(self, x, y):
+        super().__init__("Mul", x, y)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        self.dy = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Div(_Function):
+    """Element-wise divide."""
+
+    def __init__(self, x, y):
+        super().__init__("Div", x, y)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        self.dy = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class MatMul(_Function):
+    """Matrice multiplication."""
+
+    def __init__(self, x, y):
+        super().__init__("MatMul", x, y)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        self.dy = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Exp(_Function):
+    """Exponential function."""
+
+    def __init__(self, x):
+        super().__init__("Exp", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Log(_Function):
+    """Logarithmic function."""
+
+    def __init__(self, x):
+        super().__init__("Exp", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Sin(_Function):
+    """Sinus function."""
+
+    def __init__(self, x):
+        super().__init__("Sin", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Cos(_Function):
+    """Cosinus function."""
+
+    def __init__(self, x):
+        super().__init__("Cos", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Tan(_Function):
+    """Tangent function."""
+
+    def __init__(self, x):
+        super().__init__("Tan", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+# ACTIVATIONS
+
+class Sigmoid(_Function):
+    """Sigmoid."""
+
+    def __init__(self, x):
+        super().__init__("Sigmoid", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Tanh(_Function):
+    """Tanh."""
+
+    def __init__(self, x):
+        super().__init__("Tanh", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+
+class Softmax(_Function):
+    """Softmax."""
+
+    def __init__(self, x, dim):
+        super().__init__("Softmax", x)
+        self.dim = dim
+        x_norm = x.data - np.max(x.data)
+        exp = np.exp(x_norm)
+        self.result = exp / np.sum(exp, axis=dim, keepdims=True)
+
+    def _backward(self, grad):
+        # q_i(delta_{i,j} - q_j)
+        if self.dim == 0:
+            res = self.result.T
+            (N, D) = res.shape
+            grad = grad.T
+        elif self.dim == 1:
+            res = self.result
+            (N, D) = res.shape
+        else:
+            raise NotImplementedError(
+                "Backward for dim > 1 not implemented, Sorry :(")
+
+        self.dx = res[:, None, :]
+        self.dx = np.tensordot(self.dx, self.dx, axes=((1), (1)))
+        self.dx = self.dx.swapaxes(1, 2)[np.arange(N), np.arange(N)]
+
+        diag = np.tile(np.eye(D), (N, 1)).reshape(N, D, D)
+        diag = res[:, :, None] * diag
+
+        self.dx -= diag
+        self.dx *= -1
+
+        # chain rule
+        self.dx = grad.dot(self.dx)[np.arange(N), np.arange(N)]
+        if self.dim == 0:
+            self.dx = self.dx.T
+
+
+class ReLu(_Function):
+    """ReLu."""
+
+    def __init__(self, x):
+        super().__init__("ReLu", x)
+        #######################################################################
+        # TODO: Implement the forward pass and put the result in self.result.
+        # The notbook provide you the formulas for this operation.
+        #######################################################################
+        self.result = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+    def _backward(self, grad):
+        #######################################################################
+        # TODO: Implement the derivative dx for this opetation and add the
+        # result of the chain rule on self.dx.
+        #######################################################################
+        self.dx = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
--- a/nn.py
+++ b/nn.py
+"""
+
+Author: Joao A. Candido Ramos
+"""
+
+
+import numpy as np
+
+from functional import F
+from variable import Variable
+
+
+class Parameters:
+    """Parameters is a class that wraps all the parameters of the model.
+
+    This class is used in the optimizer.
+    """
+
+    def __init__(self, model):
+        self.params = {}
+        self.model = model
+        var_list = vars(model)
+        num = 1
+        for key, val in var_list.items():
+            if isinstance(val, Linear):
+                self.params["{}_W".format(key)] = val.W
+                if val.bias:
+                    self.params["{}_b".format(key)] = val.b
+
+    def get_mode(self):
+        """Get the mode of the model."""
+        return self.model.mode
+
+    def zero_grad(self):
+        """Clear all parameters variables."""
+        for key in self.params.keys():
+            self.params[key].set_defaults()
+
+
+class Module:
+    """Module is an abstract class that all the models must inherit.
+    Contains basic methods for all type of models.
+    """
+
+    def parameters(self):
+        """Create the wrapper for the parameters of the model and return it."""
+        params = Parameters(self)
+        return params
+
+    def train(self):
+        """Change the mode of the model to train. 
+
+        The optimizer use it to know if it can update the weights:
+        mode == train -> it can update.
+        """
+        self.mode = "train"
+
+    def eval(self):
+        """Change the mode of the model to eval. 
+
+        The optimizer use it to know if it can update the weights:
+        mode == eval -> it can not update.
+        """
+        self.mode = "eval"
+
+    def __call__(self, X):
+        """Enable the call of the class."""
+        return self.forward(X)
+
+
+class Linear:
+    """Applies a linear transformation to the incoming data: y = XW^T + b.
+
+    Pytorch: https://pytorch.org/docs/stable/nn.html#linear 
+
+    Shapes:
+        - Input: (N, H_{in}) where H_{in} = in_features
+        - Output: (N, H_{out} where H_{out} = out_features
+
+    Attributes:
+        weight: the learnable weights of the module of shape 
+            (out_features, in_features). The values are initialized from 
+            Uniform(-sqrt{k}, sqrt{k}), where k = 1/in_features.
+        bias:   the learnable bias of the module of shape (1, out_features).
+                If bias is True, the values are initialized from
+                Uniform(-sqrt{k}, sqrt{k}), where k = 1/in_features.
+    """
+
+    def __init__(self, in_features, out_features, bias=True):
+        self.in_features = in_features
+        self.out_features = out_features
+        self.bias = bias
+        #######################################################################
+        # TODO: Initialize the weights accordind to the description above.
+        # Ton't forget to wrap the data into a Variable.
+        #######################################################################
+        self.W = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+
+        if self.bias:
+            #######################################################################
+            # TODO: Initialize the bias accordind to the description above.
+            # Ton't forget to wrap the data into a Variable.
+            #######################################################################
+            self.b = None
+            #######################################################################
+            # --------------------------- END OF YOUR CODE ------------------------
+            #######################################################################
+
+    def __call__(self, X):
+        """Computes the forward pass."""
+        y = None
+        #######################################################################
+        # TODO: Use the functional module to compute the first part of the
+        # linear transfomation -> y = XW.T
+        #######################################################################
+        y = None
+        #######################################################################
+        # --------------------------- END OF YOUR CODE ------------------------
+        #######################################################################
+        if self.bias:
+            #######################################################################
+            # TODO: If the bias is true add the bias.
+            #######################################################################
+            y = None
+            #######################################################################
+            # --------------------------- END OF YOUR CODE ------------------------
+            #######################################################################
+        return y
+
+
+class CrossEntropyLoss:
+    """Cross Entropy as in Pytorch with (log) softmax."""
+
+    def __init__(self, reduction='mean'):
+        self.reduction = reduction
+
+    def _forward(self, X, y):
+        """Compute the forward of this loss, it includes the softmax and the 
+        cross entropy itself.
+
+        Formula based of the CrossEntropyLoss of Pytorch:
+        https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
+        """
+        result = F.log(F.exp(X).sum(1)) - X[range(X.shape[0]), np.ravel(y.data)]
+        if self.reduction == 'mean':
+            return result.mean()
+        elif self.reduction == 'sum':
+            return result.sum()
+        elif self.reduction == 'none':
+            return result
+        else:
+            raise RuntimeError("Reduction not known")
+
+    def __call__(self, X, y):
+        """Call the forward pass.
+
+        There is a problem during the backpropagation, with this function!
+        This function provides a workaround by copying the output of the 
+        network X and backpropagte trough it, than copying the gradients back to 
+        the real X and finaly by changing the grad_fn and the grads of the 
+        result to be the ones of X. It's equivalent of propagating from the 
+        loss to the scores.
+        """
+        X_detach = Variable(X.data)
+        result = self._forward(X_detach, y)
+        result.backward()
+        X.grad = X_detach.grad
+        result.grad = X_detach.grad
+        result.grad_fn = X.grad_fn
+        return result
--- a/optim.py
+++ b/optim.py
+"""
+
+Author: Joao A. Candido Ramos
+"""
+
+
+class Optimizer:
+    """Abstract class for all the optimizers, store the parameters wrapper and
+    has a method to clear out the parameters.
+    """
+
+    def __init__(self, parameters, lr):
+        self.parameters = parameters
+        self.lr = lr
+
+    def zero_grad(self):
+        """Clear the parameters.
+
+        Ususally call this before a new iteration.
+        """
+        self.parameters.zero_grad()
+
+
+class SGD(Optimizer):
+    """
+    Applies the SGD update to the weights W = lr * W.grad.
+    """
+
+    def __init__(self, parameters, lr=1e-3):
+        super().__init__(parameters, lr)
+
+    def step(self):
+        """If the model is in train mode update the weights by SGD."""
+        if self.parameters.get_mode() == "train":
+            #######################################################################
+            # TODO: Implement the SGD update mechanism.
+            # to acces the data of parametes Variables:
+            #   - self.parameters.params[key].data
+            #######################################################################
+            pass
+            #######################################################################
+            # --------------------------- END OF YOUR CODE ------------------------
+            #######################################################################
--- a/variable.py
+++ b/variable.py
+"""
+
+Author: Joao A. Candido Ramos
+"""
+import copy
+import numpy as np
+
+
+class Variable:
+    def __init__(self, data):
+        self.__class__ = Variable
+        self.data = np.array(data, ndmin=2)
+        # fix shapes due to ndmin
+        if np.array(data).shape != self.data.shape:
+            self.data = self.data.T
+        self.shape = self.data.shape
+        self.grad = None
+        self.grad_fn = None
+        self.children = []
+        self.retained_values = {}
+        self._freed = False
+        self._fn = ""
+
+    def item(self):
+        """If Variable is a scalar returns it."""
+        if self.shape == (1, 1):
+            return self.data[0, 0]
+        else:
+            raise ValueError("only one element tensors can be converted to Python scalars")
+
+    def add_fn(self, fn):
+        """Add the function that is at the origin of this Variable."""
+        self.grad_fn = fn
+
+    def add_child(self, child):
+        """Add a new child to children list, child is an operation where self is parent."""
+        self.children.append(child)
+
+    def remove_child(self, child):
+        """Remove child from the list of children."""
+        self.children.remove(child)
+
+    def update_retained_values(self):
+        """Updates retained_values, which is a copy of children and grad_fn.
+        retained_values is used when retain_graph is set to True to not erase
+        the real children list and grad_fn."""
+        if self.retained_values == {}:
+            self.retained_values = {
+                "children": self.children[:],
+                "grad_fn": self.grad_fn
+            }
+
+    def zero_grad(self):
+        """Sets the grad to zero."""
+        self.grad = np.zeros(self.shape)
+
+    def set_defaults(self):
+        """Sets the variable to its defauls options (keeping only the data)."""
+        self.grad = None
+        self.grad_fn = None
+        self.children = []
+        self.retained_values = {}
+        self._freed = False
+        self._fn = ""
+
+    def _update_grad_help(self, variable, grad, child, retain_graph):
+        """Help function for special cases like .sum(), .mean(), .t(), ..."""
+        if "_variable" in variable.__dict__.keys():
+            grad = self._update_grad_help(variable._variable,
+                                          grad,
+                                          child,
+                                          retain_graph)
+            if variable._fn == "sum":
+                grad = grad.sum(variable._artefact)
+                grad = grad.reshape(grad.shape[0], 1 if len(grad.shape) < 2 else grad.shape[1])
+                grad_to_update = np.ones(variable._variable.shape) * grad
+            elif variable._fn == "mean":
+                grad = grad.mean(variable._artefact)
+                grad = grad.reshape(grad.shape[0], 1 if len(grad.shape) < 2 else grad.shape[1])
+                grad_to_update = np.ones(variable._variable.shape) * grad
+            elif variable._fn == "transpose":
+                grad_to_update = grad.T
+            elif variable._fn == "items":
+                grad_to_update = np.zeros(variable._variable.shape)
+                grad_to_update[variable._artefact] = 1
+                grad_to_update = grad_to_update * grad
+            else:
+                raise ValueError("The function is not Known !")
+
+            if variable._variable.grad is None:
+                variable._variable.grad = grad_to_update
+            else:
+                variable._variable.grad += grad_to_update
+        return grad
+
+    def update_grad(self, grad, child, retain_graph=False):
+        """Updates the gradients of self.
+
+        Args:
+            - grad (array): the new gradients to update 
+            - child (function): the child from where the gradients come
+            - retain_graph (bool): specify if we keep the graph for later or not 
+        """
+
+        # for transpose, sum and mean
+        grad = self._update_grad_help(self, grad, child, retain_graph)
+        grad = np.ones(self.shape) * grad
+
+        if grad.shape[0] != self.shape[0]:
+            grad = grad.sum(0)[None, :]
+
+        if grad.shape != self.shape:
+            raise ValueError("Shape of gradients and shape of data missmatch.",
+                             "\n\tShape of gradients: {}".format(grad.shape),
+                             "\n\tShape of data: {}".format(self.shape))
+        if self.grad is None:
+            #######################################################################
+            # TODO: Update the current grad (self.grad), if the previous value
+            # is None. What should be the update ?
+            #######################################################################
+            pass
+            #######################################################################
+            # --------------------------- END OF YOUR CODE ------------------------
+            #######################################################################
+        else:
+            #######################################################################
+            # TODO: Update the current grad(self.grad), if the previous value
+            # is not None. What should be the update ?
+            #######################################################################
+            pass
+            #######################################################################
+            # --------------------------- END OF YOUR CODE ------------------------
+            #######################################################################
+
+        if retain_graph:
+            self.update_retained_values()
+            self.retained_values["children"].remove(child)
+        else:
+            self.remove_child(child)
+
+    def backward(self, retain_graph=False):
+        """Starts the backward pass.
+
+        If None of the tests are triggered this should call the backward of the
+        operation that has made this variable.
+
+        Args:
+            - retain_graph (bool): specify if you want to keep the graph for 
+            later use.
+
+        """
+        if self.grad_fn is not None:
+            # create local children and grad_fn accordind to retain graph or not
+            if retain_graph:
+                self.update_retained_values()
+                grad_fn = self.retained_values["grad_fn"]
+                children = self.retained_values["children"]
+            else:
+                grad_fn = self.grad_fn
+                children = self.children
+            if self.grad is None:
+                if self.shape != (1, 1):
+                    raise RuntimeError(
+                        "grad can be implicitly created only for scalar outputs")
+                self.grad = np.ones(self.shape)
+                if self._fn == "items":
+                    self.grad = np.zeros(self._variable.shape)
+                    self.grad[self._artefact] = 1
+                children = []
+            if not len(children):
+                #######################################################################
+                # TODO: Call the backward of the operation that has build this Variable
+                #######################################################################
+                pass
+                #######################################################################
+                # --------------------------- END OF YOUR CODE ------------------------
+                #######################################################################
+                if not retain_graph:
+                    self.grad_fn = None
+        else:
+            # check if we are in a leaf
+            if self._freed:
+                raise RuntimeError(
+                    "Trying to backward through the graph a second time,"
+                    "but the buffers have already been freed.")
+
+    def clone(self):
+        """."""
+        var_cloned = copy.deepcopy(self)
+        var_cloned.__dict__ = copy.deepcopy(self.__dict__)
+        return var_cloned
+
+    def sum(self, dim=None):
+        """."""
+        var = Variable(self.data.sum(axis=dim))
+        var.grad_fn = self.grad_fn
+        var._fn = "sum"
+        var._artefact = dim
+        var._variable = self
+        self.add_child(var)
+        return var
+
+    def mean(self, dim=None):
+        """."""
+        var = Variable(self.data.mean(axis=dim))
+        var.grad_fn = self.grad_fn
+        var._fn = "mean"
+        var._artefact = dim
+        var.grad = np.ones(self.shape) / self.data.size
+        var._variable = self
+        self.add_child(var)
+        return var
+
+    def t(self):
+        """."""
+        var = Variable(self.data.T)
+        var.grad_fn = self.grad_fn
+        var._fn = "transpose"
+        var._variable = self
+        self.add_child(var)
+        return var
+
+    def __add__(self, other):
+        """."""
+        from functional import F
+        return F.add(self, other)
+
+    def __sub__(self, other):
+        """."""
+        from functional import F
+        return F.sub(self, other)
+
+    def __mul__(self, other):
+        """."""
+        from functional import F
+        return F.mul(self, other)
+
+    def __truediv__(self, other):
+        """."""
+        from functional import F
+        return F.div(self, other)
+
+    def __setitem__(self, pos, item):
+        """."""
+        self.data[pos] = item
+
+    def __getitem__(self, pos):
+        """."""
+        if self.shape[0] == 1 and type(pos) == int:
+            pos = (0, pos)
+        var = Variable(self.data[pos])
+        var.grad_fn = self.grad_fn
+        var._fn = "items"
+        var._artefact = pos
+        var._variable = self
+        self.add_child(var)
+        return var
+
+    def __str__(self):
+        """Converts the class to string (e.g. to print the class)."""
+        data_str = ",\n         ".join(str(self.data).split("\n"))
+        grad_fn_str = ""
+        if self.grad_fn is not None:
+            grad_fn_str = ", grad_fn=<{}Backward>".format(self.grad_fn.name)
+        return "Variable({}{})".format(data_str, grad_fn_str)
+
+    def __repr__(self):
+        """Uses the string representation of the class when called 'in command line mode'."""
+        return self.__str__()