Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?

Justin Domke's Weblog

I recently got back reviews of a paper in which I used automatic differentiation.  Therein, a reviewer clearly thought I was using finite difference, or “numerical” differentiation. This has led me to wondering: Why don’t machine learning people use automatic differentiation more?  Why don’t they use it…constantly? Before recklessly speculating on the answer, let me briefly review what automatic differentiation (henceforth “autodiff”) is. Specifically, I will be talking about reverse-mode autodiff.

(Here, I will use “subroutine” to mean a function in a computer programming language, and “function” to mean a mathematical function.)

It works like this:

  1. You write a subroutine to compute a function $latex f({bf x})$.  (e.g. in C++ or Fortran).  You know $latex f$ to be differentiable, but don’t feel like writing a subroutine to compute $latex nabla f$.
  2. You point some autodiff software at your subroutine.  It produces a subroutine to compute the gradient.
  3. That…

View original post 786 more words


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s