I recently got back reviews of a paper in which I used automatic differentiation. Therein, a reviewer clearly thought I was using finite difference, or “numerical” differentiation. This has led me to wondering: **Why don’t machine learning people use automatic differentiation more? Why don’t they use it…constantly?** Before recklessly speculating on the answer, let me briefly review what automatic differentiation (henceforth “autodiff”) is. Specifically, I will be talking about **reverse-mode autodiff**.

(Here, I will use “subroutine” to mean a function in a computer programming language, and “function” to mean a mathematical function.)

It works like this:

- You write a subroutine to compute a function $latex f({bf x})$. (e.g. in C++ or Fortran). You know $latex f$ to be differentiable, but don’t feel like writing a subroutine to compute $latex nabla f$.
- You point some autodiff software at your subroutine. It produces a subroutine to compute the gradient.
- That…

View original post 786 more words

Advertisements