Derivative of a Higher-Order Function
Asked Answered
P

3

5

This is in the context of Automatic Differentiation - what would such a system do with a function like map, or filter - or even one of the SKI Combinators?

Example: I have the following function:

def func(x):
    return sum(map(lambda a: a**x, range(20)))

What would its derivative be? What will an AD system yield as a result? (This function is well-defined on real-number inputs).

Plainspoken answered 26/11, 2008 at 15:18 Comment(0)
S
1

Higher-order functions are discrete. They don't have the Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space.

However, with your clarification on the answer, there are several things that can be said. Symbolically differentiating some higher-order functions would be possible, but only for certain calling patterns that resolve to well-known functions.

Probably, numerical differentiation would be more fruitful, as it can estimate the derivative by repeatedly evaluating the given function.

Also, functions that are totally general - and your example is heading that way, with use of relatively arbitrary functions - eventually you'll hit Turing completeness, which will mean that no amount of cleverness on the part of a symbolic differentiator will be able to automatically differentiate the function.

Sg answered 26/11, 2008 at 15:28 Comment(10)
take a look at the edit - i posted an example. this is for my own curiosity, but i can see systems which need derivatives which would use higher-order functions.Plainspoken
hm... i think you have a flawed argument about turing completeness, there. there are transformations you can do to any arbitrary code - this doesn't mean they are bounded by turing completeness.Plainspoken
Claudiu, Turing completness is a hard bound that prevents you from knowing everything about functions in a sufficiently expressive language. If you have a function which doesn't even return a value (i.e. does not terminate), how can you expect to get its slope at a point?Sg
Turing completeness is basically the hard limit to automatic analysis of code. What would the slope of a function that never returns mean, anyway? In what way would the derivative be correct?Sg
Higher order functions are not inherantly discrete: they are just functions that maps functions to functions. Indeed, the derivative itself is a perfectly valid higher order function. I also don't know what you mean by the "Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space". Every function has a well defined mapping from it's arguments to an output domain - that's what defines a function. The problem here relates more to defining a derivative for the function space; it's not clear what the derivative of a list of items would even be.Fissile
@Adam - what I meant was that functions aren't points in some continuous space. What do you get when you subtract one function from another, and divide it by the distance between the functions? What does that even mean? That's what I meant by discrete, as opposed to continuous locations in some space, where distance has a meaning and slope can be measured.Sg
Sure they are - e.g., they space of continuous functions is Banach (i.e. a topological space, on which continuity is well defined). As they are Banach, they come complete with a norm allowing us to define concepts of "distance" in some way - and even a notion of derivative (in this example, the Frechet derivative). I'll agree that these are pretty abstract definitions, and not what the poster wanted, but circles back to my point - getting the definitions right, i.e. what property does the poster expect of his "derivative"?Fissile
@Adam - and that is useful precisely to the degree that map, sum etc. are continuous functions. In other words, I'm not sure what you're trying to prove, apart from being unhelpful.Sg
I'm trying to help you update your answer without jumping in and editing it (the entire first paragraph is just mathematically wrong), and the questioner update his question with more of what he actually wants.Fissile
Just to clarify, I'm not disagreeing with the general thrust of your answer - just that, as it's accepted, a few specifics could be clearer. I would edit it, but I don't like altering other peoples answers.Fissile
S
4

I would like to disagree with the accepted answer that you cannot usefully differentiate higher rank functions, or that you need to restrict yourself to a particularly small subset of them by demonstrating it in practice.

Using my Haskell 'ad' package, we get:

ghci> :m + Numeric.AD
ghci> diff (\x -> sum (map (**x) [1..20])) 10
7.073726805128313e13

We can extract what was done by abusing a traced numeric type to answer your question of the derivative is:

ghci> :m + Debug.Traced
ghci> putStrLn $ showAsExp $ diff (\x -> sum (map (**x) [1..20])) (unknown "x" :: Traced Double) 
1.0 * (1.0 ** x * log 1.0) + 
1.0 * (2.0 ** x * log 2.0) +
1.0 * (3.0 ** x * log 3.0) +
1.0 * (4.0 ** x * log 4.0) +
1.0 * (5.0 ** x * log 5.0) +
1.0 * (6.0 ** x * log 6.0) +
1.0 * (7.0 ** x * log 7.0) +
1.0 * (8.0 ** x * log 8.0) +
1.0 * (9.0 ** x * log 9.0) +
1.0 * (10.0 ** x * log 10.0) +
1.0 * (11.0 ** x * log 11.0) +
1.0 * (12.0 ** x * log 12.0) +
1.0 * (13.0 ** x * log 13.0) +
1.0 * (14.0 ** x * log 14.0) +
1.0 * (15.0 ** x * log 15.0) +
1.0 * (16.0 ** x * log 16.0) +
1.0 * (17.0 ** x * log 17.0) +
1.0 * (18.0 ** x * log 18.0) +
1.0 * (19.0 ** x * log 19.0) +
1.0 * (20.0 ** x * log 20.0)

With full sharing you get the rather more horrific looking results, that are sometimes asymptotically more efficient.

ghci> putStrLn $ showAsExp $ reShare $ diff (\x -> sum (map (**x) [1..20])) 
      (unknown "x" :: Traced Double)
let _21 = 1.0 ** x;
    _23 = log 1.0;
    _20 = _21 * _23;
    _19 = 1.0 * _20;
    _26 = 2.0 ** x;
    _27 = log 2.0;
    _25 = _26 * _27;
    _24 = 1.0 * _25;
    _18 = _19 + _24;
    _30 = 3.0 ** x;
    _31 = log 3.0;
    _29 = _30 * _31;
    _28 = 1.0 * _29;
    _17 = _18 + _28;
    _34 = 4.0 ** x;
    _35 = log 4.0;
    _33 = _34 * _35;
    _32 = 1.0 * _33;
    _16 = _17 + _32;
    _38 = 5.0 ** x;
    _39 = log 5.0;
    _37 = _38 * _39;
    _36 = 1.0 * _37;
    _15 = _16 + _36;
    _42 = 6.0 ** x;
    _43 = log 6.0;
    _41 = _42 * _43;
    _40 = 1.0 * _41;
    _14 = _15 + _40;
    _46 = 7.0 ** x;
    _47 = log 7.0;
    _45 = _46 * _47;
    _44 = 1.0 * _45;
    _13 = _14 + _44;
    _50 = 8.0 ** x;
    _51 = log 8.0;
    _49 = _50 * _51;
    _48 = 1.0 * _49;
    _12 = _13 + _48;
    _54 = 9.0 ** x;
    _55 = log 9.0;
    _53 = _54 * _55;
    _52 = 1.0 * _53;
    _11 = _12 + _52;
    _58 = 10.0 ** x;
    _59 = log 10.0;
    _57 = _58 * _59;
    _56 = 1.0 * _57;
    _10 = _11 + _56;
    _62 = 11.0 ** x;
    _63 = log 11.0;
    _61 = _62 * _63;
    _60 = 1.0 * _61;
    _9 = _10 + _60;
    _66 = 12.0 ** x;
    _67 = log 12.0;
    _65 = _66 * _67;
    _64 = 1.0 * _65;
    _8 = _9 + _64;
    _70 = 13.0 ** x;
    _71 = log 13.0;
    _69 = _70 * _71;
    _68 = 1.0 * _69;
    _7 = _8 + _68;
    _74 = 14.0 ** x;
    _75 = log 14.0;
    _73 = _74 * _75;
    _72 = 1.0 * _73;
    _6 = _7 + _72;
    _78 = 15.0 ** x;
    _79 = log 15.0;
    _77 = _78 * _79;
    _76 = 1.0 * _77;
    _5 = _6 + _76;
    _82 = 16.0 ** x;
    _83 = log 16.0;
    _81 = _82 * _83;
    _80 = 1.0 * _81;
    _4 = _5 + _80;
    _86 = 17.0 ** x;
    _87 = log 17.0;
    _85 = _86 * _87;
    _84 = 1.0 * _85;
    _3 = _4 + _84;
    _90 = 18.0 ** x;
    _91 = log 18.0;
    _89 = _90 * _91;
    _88 = 1.0 * _89;
    _2 = _3 + _88;
    _94 = 19.0 ** x;
    _95 = log 19.0;
    _93 = _94 * _95;
    _92 = 1.0 * _93;
    _1 = _2 + _92;
    _98 = 20.0 ** x;
    _99 = log 20.0;
    _97 = _98 * _99;
    _96 = 1.0 * _97;
    _0 = _1 + _96;
in  _0

In general automatic differentiation has no problem with higher rank functions. Source-to-source translations may run into a few gotchas depending on the limitations of the particular tool, however.

Semela answered 26/6, 2012 at 2:35 Comment(0)
F
3

A good AD system will breeze through that without any difficulty. If the AD code performs source to source translation then it may have trouble with the sum. But if it's an AD system that works by overloading the arithmetic operators then the AD code won't actually 'see' the sum function, just the + operations that the sum function calls.

Farseeing answered 30/4, 2010 at 19:14 Comment(0)
S
1

Higher-order functions are discrete. They don't have the Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space.

However, with your clarification on the answer, there are several things that can be said. Symbolically differentiating some higher-order functions would be possible, but only for certain calling patterns that resolve to well-known functions.

Probably, numerical differentiation would be more fruitful, as it can estimate the derivative by repeatedly evaluating the given function.

Also, functions that are totally general - and your example is heading that way, with use of relatively arbitrary functions - eventually you'll hit Turing completeness, which will mean that no amount of cleverness on the part of a symbolic differentiator will be able to automatically differentiate the function.

Sg answered 26/11, 2008 at 15:28 Comment(10)
take a look at the edit - i posted an example. this is for my own curiosity, but i can see systems which need derivatives which would use higher-order functions.Plainspoken
hm... i think you have a flawed argument about turing completeness, there. there are transformations you can do to any arbitrary code - this doesn't mean they are bounded by turing completeness.Plainspoken
Claudiu, Turing completness is a hard bound that prevents you from knowing everything about functions in a sufficiently expressive language. If you have a function which doesn't even return a value (i.e. does not terminate), how can you expect to get its slope at a point?Sg
Turing completeness is basically the hard limit to automatic analysis of code. What would the slope of a function that never returns mean, anyway? In what way would the derivative be correct?Sg
Higher order functions are not inherantly discrete: they are just functions that maps functions to functions. Indeed, the derivative itself is a perfectly valid higher order function. I also don't know what you mean by the "Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space". Every function has a well defined mapping from it's arguments to an output domain - that's what defines a function. The problem here relates more to defining a derivative for the function space; it's not clear what the derivative of a list of items would even be.Fissile
@Adam - what I meant was that functions aren't points in some continuous space. What do you get when you subtract one function from another, and divide it by the distance between the functions? What does that even mean? That's what I meant by discrete, as opposed to continuous locations in some space, where distance has a meaning and slope can be measured.Sg
Sure they are - e.g., they space of continuous functions is Banach (i.e. a topological space, on which continuity is well defined). As they are Banach, they come complete with a norm allowing us to define concepts of "distance" in some way - and even a notion of derivative (in this example, the Frechet derivative). I'll agree that these are pretty abstract definitions, and not what the poster wanted, but circles back to my point - getting the definitions right, i.e. what property does the poster expect of his "derivative"?Fissile
@Adam - and that is useful precisely to the degree that map, sum etc. are continuous functions. In other words, I'm not sure what you're trying to prove, apart from being unhelpful.Sg
I'm trying to help you update your answer without jumping in and editing it (the entire first paragraph is just mathematically wrong), and the questioner update his question with more of what he actually wants.Fissile
Just to clarify, I'm not disagreeing with the general thrust of your answer - just that, as it's accepted, a few specifics could be clearer. I would edit it, but I don't like altering other peoples answers.Fissile

© 2022 - 2024 — McMap. All rights reserved.